Episode #005: Security-focused Code Review for Software Developers with Sid Odgers

Episode #005: Security-focused Code Review for Software Developers with Sid Odgers
February 21, 2024

Cosive's Software Development Lead, Sid Odgers, is a cybersecurity expert who spends his days building secure software. In this podcast you'll learn how Sid approaches code review to make sure that all code shipped to production is secure as well as high-quality. The checklist and tips shared here are language agnostic and will be of use to any software developer who wants to get better at security.


Tash: I'm Tash Postolovski and I'm helping out with marketing at Cosive.

Sid: My name's Sid and I'm the Software Development Lead at Cosive.

Tash: In this episode of the Cosive podcast, we'll be talking about security focused code review. What does that mean to you Sid?

Sid: Well, that's the security part of your code review checklist, I guess. I mean when you're reviewing code, you've got a list of things that you're looking for that you don't want to see, and I suppose you've got a list of things you do want to see as well. But typically I find it's best to think of these things as like a denial list or a like you're looking for anti-patterns I suppose that you want to correct. And the security part of a code review is obviously looking at the new code with a security mindset. Certainly from a manual perspective, you're going to be looking at authentication and authorisation being handled properly. Is untrusted random user input sanitised correctly? What mechanisms are in place to guarantee that that's not blindly trusted? Along those lines. You know, are you doing anything to protect against, say, I don't know, SQL injections? Are you using parametrised queries where you should be? Are you blindly logging too much information or are you throwing objects in error scenarios where they might get serialised and sort of leak information that you weren't expecting them to leak? There's a whole bunch of different things you can look for. But yeah, the security side of a code review is just the security part as opposed to looking at it from a performance perspective or from a maintainability perspective. If you're doing a competent code review, you're going to look at any new code from all of those perspectives.

Tash: If an organisation doesn't have a security focus in its code review, what typically happens instead? Or where does security come into play?

Sid: Well, I think a lot of the time there are people who have one ad hoc process in place or another in a lot of cases. So I don't want to generalise too much here, but in a lot of cases, the security process is a bolt on to the side of the software development processes that are in place that have traditionally existed and sort of are there to support, I guess, a more general code quality model where developers are typically concerned about the engineering of the software and that it works along all of the various happy paths. But in, you know, in the past perhaps haven't been quite as concerned about, you know, sort of more pure security issues. If it works, it works sort of thing. And then, you know, you have code produced by that model that from time to time might get thrown over the fence to an internal security team or what have you. But, and again, I mean, it depends on the on the model that the organisation in question is following. But in a lot of cases, you can sort of trail behind the leading edge there by some months. I mean, obviously in other cases you're still talking about having a code review be a quality gate for release. So that's probably okay. But, you know, it depends on the processes in place at the organisation in question. So it's very much a case of how long is a piece of string.

Tash: What are some of the things that you would suggest software developers look at in code review? Is there a checklist that you have in your mind that you typically run through when looking at code?

Sid: Well, yeah. I mean, from the top, I typically look at it from a few different, I suppose, perspectives. I mean, the big one at the top is, is this code reasonable and maintainable? And if it's not, then we can forget about looking at anything more in depth because the whole thing is going to have to get reworked. And if it passes the the reasonableness test, I suppose then you start drilling down into things like security performance where that's an issue. Although performance in a lot of cases is not the issue people seem to think it is. There's really only a few places where you actually even care about performance. And then you look at security as part of that more in depth. So I guess the second pass of looking at the code and yeah, like I was saying before, there's sort of a, I guess a list of low hanging fruit there, things like SQL injections, bad error handling, user input validation is obviously a big one because in the absence of user input, all software is going to do pretty much exactly the same thing under all circumstances. So without user input, you can't really trigger any vulnerabilities really. And by user, just to be clear here, by user, I mean anything that's an externality to the system, not necessarily an actual person sitting at their PC, but it might be an interaction with a remote system. It might be like an actual flesh and blood user. It might be a scheduled task that, hypothetically speaking, reads from a file on disk. These are all user input and they all need to be treated as fundamentally untrusted. So we need to look at that as well. We need to make sure that we're applying the appropriate level of skepticism to anything that comes in from a user. Those are just some examples. Anyway, there the the obvious low hanging fruit bits and pieces.

Tash: You mentioned earlier, not logging too much. Can you expand on that a little bit?

Sid: So in a lot of cases, when something that's worth logging happens, it's because we're reacting to an exceptional condition. So we've hit an error or for some other reason we've generated and thrown an exception. Typically, in a lot of cases, what's going to be catching that exception is then just going to return some generic error to the user of the system, and that might include the exception message or it might not. And then in a lot of cases, developers will then just blindly log these exceptions or not necessarily realising what that's going to look like when it's serialised into a log file. There could be all kinds of things there that you don't want. I mean, one example there, and it's not terribly common, I wouldn't think, but you could potentially have secret data being stored as part of a user object which is stored in an exception. And if that serialises all the way down, you're going to wind up with secret material in your log file. That's sort of one example anyway. You can wind up logging things that you didn't necessarily want to be logged because obviously the exception has no idea that it's got secret data in it and the serialiser has no idea that it's not meant to be serialising the entire object. And so everyone with the best of intentions winds up writing things to log files that they shouldn't. So yeah, the moral of that story there is don't blindly log exceptions.

Tash: You mentioned also user input and that user input can really be a lot of different things. But I think a lot of software developers typically think of it as the user typing in data to the screen. What are some things to look out for there? I feel like SQL injection is sort of the the one that everybody knows about, but are there other things to be mindful of?

Sid: Well, SQL injection is sort of, I guess, the exemplar of that entire class of issues where you've got some input that you have no way of trusting and you are passing that into a system which requires some level of trust in order to do the correct thing. I mean, another good example of that would be cross-site scripting type attacks where you can as an adversary, you know, sort of collude with the system to get some document persisted to a database somewhere and then subsequently rendered into the page, viewed by, you know, another visitor to the site or what have you, which will then be able to, you know, for example, run JavaScript on that user system. So that's another good example of not validating input properly. I mean, in that example, I mean typically you'd want to do some kind of HTML entity encoding on the input stream. A third good example I guess is taking user input and then blindly feeding that into part of a regular expression, because you can write regular expression fragments that are very time and space consuming on a CPU. So you could potentially allow your users to perpetrate denial of service against your server in that situation. So there's three.

Tash: What about authentication and endpoints, code relating to those? What should people look out for?

Sid: Well, I think if you're looking at authentication, like the actual act of processing somebody's credentials and turning that into a yes no determination on whether somebody has a valid identity on the system or not, I think if you're implementing that yourself, you're doing it wrong. Nobody should be writing their own authentication code from first principles unless they have very specific needs and battle tested code that's in the public domain or is open source doesn't already exist because for pretty much every framework out there, somebody's already done the hard work and all of the low hanging bugs have already been found. So unless you have some very specific needs, you should not be rolling your own authentication mechanisms. As far as authorisation for endpoints and things like that goes. Typically, I would suggest relying on a declarative type approach rather than an imperative approach. So you'd have your endpoints decorated or annotated or wrapped or whatever your language supports with some pragma or other that says this endpoint requires the user to be a member of this group or what have you. And again, the framework you're using should enforce that at runtime. And the beauty of doing it that way is that you can then get sort of automated tooling to yell at you about any endpoints that you haven't decorated to say, Hey, this requires this user type or this group membership or what have you. And if you haven't done that, then you can default to just denying access to everybody, so closed by default rather than open by default.

Sid: And again, I mean, most of the authorisation code you need is going to be out there in some form for whatever framework you're using, and there's a 99% chance you're not going to need to rewrite that in any way, shape or form. And if that's the case, you shouldn't. People wind up with getting a little bit "not invented here" about some of the most unexciting parts of their code base. And that's one of the things that leads to security trouble down the line. I mean, it's nowhere near as awesome and sexy to use some off the rack authentication and authorisation system than it is potentially to roll your own and, you know, see if you can do it more elegantly or any number of other reasons why people might do that. But it's almost always going to give you an inferior result because the code that's already out there has been looked at by orders of magnitude more people than are ever going to eyeball yours. And chances are all of the obvious bugs have already been found for you. And that applies as well to, you know, anything involving low level cryptography. Primitives like you should never be implementing, for instance, AES, on your own unless you have very, very specific business rules, specific requirements and needs to do that. Otherwise, just use a library because there are a lot of things you can get wrong and you're not going to get everything right.

Tash: Can the OWASP top ten serve as a checklist when doing security focused code review, or does it need to be more than that?

Sid: Well I think it's a good place to start if you're writing a web application. I mean, if you're writing anything that isn't a web app in line with what the OWASP guidelines are all about, then you can certainly take inspiration from it because there's going to be a lot of applicability more broadly. But in either case, I think, yeah, it's a great place to start, but I think you want to go further.

Tash: What role do you think that automated tools like Snyk and things like that, there are a lot out there now and every programming language has plug ins and packages that do some form of automated security review. Do you think they help? Do you think they're enough on their own?

Sid: They definitely help and they're definitely not enough on their own. Static analysis tooling can reason about a lot of common mistakes that can be made and can be proven to have been made statically. So at compile time with no understanding of how the code is going to run. That's good because you can prove something's a problem in that context. It's going to be... It's always going to be a problem. It's not sort of input dependent or context dependent, but by the same token, that's all you're going to catch. You're not going to shake out potentially scary dynamic behaviours using static analysis. But that's fine. That's not what it's for. It's one level in. So I guess a multifaceted approach like so we have some static analysis tooling that we use as part of a quality gate for pull requests. And that's, you know, the code passing those checks is a necessary but not sufficient condition for the PR to be considered of acceptable quality. And that's how I'd recommend using that is just use the output from those sorts of tools as one data point and work around the occasional false positive you get. Most of them have got pretty good ways of letting you mark a potential risk as not being, you know, as being more benign and then do your more in depth reviews and code quality audits and all of that stuff. In addition to that, it's sort of wanted like one extra layer of sanity checking, I suppose, and they're good to have, but you just can't rely on them as being the whole thing.

Tash: To wrap up Sid, what are some signs that code might be particularly sensitive or be worth an extra level of security scrutiny when you're looking at a pull request?

Sid: Well, okay, so there's a couple of things in that. Like you're either operating in a more sensitive domain, so if you're in the part of the code that handles authorisation or in the part of the code that hypothetically speaking builds SQL queries or renders HTML out of the database or something like that, then you're probably going to want to pay a little bit more attention. And another one is if you're using a language with sort of safety or security escape hatches, then anytime you say, you know, if you're writing rust, if you're in an unsafe block, then obviously you want to be looking at that and 100 times more closely than regular safe code. Or again, same thing in TypeScript. Anytime you're holding on to the any type directly or you're looking at a code with a bunch of ignore in it, that's code you want to be looking at in far more depth than the rest of the code base as well. So yeah, any time that somebody is holding one of those escape hatches, then you are definitionally losing some of the protection the language gives you. So you need to be a little bit more watchful under those conditions.

Tash: Thanks Sid. Well, this has been a really interesting and helpful discussion, so thanks for chatting with me.

Sid: All good. Not a problem.

Tash: Producing secure code is one of our areas of expertise at Cosive. We can help review your existing code or work with you to introduce more of a security focus into your current code review process. You can find out more at cosive.com.