February 21, 2024

The interview in this post is taken from Episode #004 of the Cosive Podcast.

Tash: I’m Tash, and I’m helping out with marketing at Cosive.

Chris: My name is Chris, and I’m the CTO at Cosive.

Tash: Today we’re going to be talking about ChatGPT and its applications for CTI. Chris, I know this is something you’ve been really geeking out about lately. Can you give us a really brief summary of what ChatGPT is?

Chris: I’ll try to keep it brief. There’s so much to talk about. They call it a “large language model”, but I think most of us would probably call it an AI chatbot. The amazing thing about it, is that it’s just very general purpose. I’ve got friends in various professions, in psychology, in UX, in all sorts of fields, that are really excited about this. Certainly in my experiments, I’ve found out that even though it is very generic, there are a lot of applications that miraculously work for niche disciplines like Cyber Threat Intelligence (CTI).

Tash: So you were playing around with ChatGPT - what got you thinking about its applications for CTI in particular?

Chris: CTI has this continual problem. We have this holy grail that we want to do machine-to-machine representation of threat intelligence. And it might be things like “Such and such threat actor ran a campaign, they used these certain techniques”, which, in the last few years we’ve taken to using the MITRE ATT&CK taxonomy to describe using identifiers for the sort of tricks than attacker was using. All of these things have identifiers.

So it’s a lovely theory that we can structure threat intel packages in this very highly structured format, which might be STIX, it might be as a MISP package. The problem is taking the concepts which come out of the observations of analysts typically, but could be out of an automated system, and turning from that human wisdom, into a machine readable format. That’s always been a big problem.

One of the classics is a cyber security vendor or threat intelligence organisation puts out a threat intelligence report, and classically these have been done as PDFs. The PDF might run for 5 pages, 10 pages, 40 pages… And it can go into a lot of depth about the technical indicators we see, like IP addresses, hashes, and domain names, and all those things you want to go look for in your logs. But the bit that these automated systems to date have always struggled with is really trying to understand it as a whole. So… what’s the name of this threat actor? That sounds easy in theory. A human can pick it, because in the report it says “This threat actor they’re now calling XYZ”, but for machines to date, it’s been really hard to extract that information.

A really exciting application of things like ChatGPT has been getting it to help parse these reports and turn all that unstructured text into something with structure, to put it into a STIX package, or to put it into a MISP package, or mark up all of those ATT&CK IDs, let’s say.

Tash: And that was a recent experiment that you did and wrote a blog post about, which was trying to extract ATT&CK IDs using ChatGPT. What was that like?

Chris: Yeah, so, I was pretty blown away. This was an early experiment I did, and I want to say this was about 3 - 4 days into playing around with ChatGPT. I did the same thing as everybody else. I got it to write me a story sounding like a grizzled 1930s detective, all that fun, and sometimes, quite frankly, terrifying stuff it can put out.

But then I started thinking about, well, a problem I’ve been thinking about recently, before this chatbot came along, was that I want to be able to take this PDF report we were talking about, I want to be able to work out, what are all the ATT&CK techniques? So I can do MITRE ATT&CK, that’s getting a lot of traction in a lot of organisations. That’s how they want to do everything. Now, basically, it’s for what we’re seeing for our defences, for what we’re seeing as threats, can we use ATT&CK to see if our logs and alerts can start to cover the scenarios we think we’re most at risk of.

So that’s where it started. And I was really blown away with how successful my initial experiments were. I wrote all of this down in the blog post, and it didn’t take me very long to realise that this was absolutely viable.

What amazed me even more was that ChatGPT is not designed, by any measure, to understand what ATT&CK is. It doesn’t know what ATT&CK is. It knows a whole bunch of information it’s slurped into a machine learning model, but for the purposes of an analyst, it seems to speak ATT&CK and understand the concepts of it well enough that I can use it to help me make some judgments, at least initial judgments that I as a human analyst am going to review, and also just to do all the scutwork of marking up this report. Things like putting the ATT&CK technique tag into the right place. If I decide I want to do it in front of a passage of text describing the technique rather than at the end, I can just say, “Hey bot, can you put it at the front, not the end?” And it’ll say “Sure, here it is again reworked just the way you asked.” And all that tedious stuff, are things that as an analyst, I’d have to sit down and work out for myself and copy and paste, or write a script, at the very least just to do that data manipulation work. It’s really, really good at it.

Tash: Are there any other experiments that you’ve been running in the CTI space with ChatGPT?

Chris: The next one… and this is preliminary, it’s taking me a while because it’s a bit hit or miss, is my next area I really want to get a workable solution for is STIX document generation.

We were talking within Cosive, as a bit of a gimmicky idea, around doing a hoodie, and what we could put on the hoodie to denote what we do at Cosive. One idea was, well, we’re very big on things like STIX as a standard, could we map Cosive as an identity the same way you might as a threat actor? And then we could link all the services Cosive provides, whether it’s software or infrastructure we do, and then link it as a STIX mindmap. It would neatly explain what we do, and also signal to the right people how we’re doing it.

The thought here was, could I just describe the map I wanted? So literally my prompt was something along the lines of: “There is an organisation which I want you to represent as an identity entity in STIX. The organisation offers two infrastructure tools, CloudMISP and Smokeproxy. It also offers two tools, one’s called Phishfeeder, one’s called Antifraud. You can see where I’m going with this. I’m basically explaining in English how these things link together.

It basically does it. Which again, is really impressive, given it doesn’t know what STIX is. But it just spat out this JSON that looked very much like STIX.

Now… the devil is in the details though, because you see all these little glitches. The first time it did this for me, it didn’t wrap it in a STIX bundle, which is a thing that keeps all of this together. A lot of STIX validators won’t process it without a bundle. But you can literally say “Take that JSON and put it in a STIX bundle”. So it spits out new JSON that looks like it’s wrapped in a valid STIX bundle. But you have to look closer again. You find out that all those UUIDs are not actually valid to the specification of how to build a version 4 UUID. I even had a case where it was spitting out version 1 UUIDs, I knew they were version 1, because the 13th digit in one of these values would be a 1 or a 4 depending on which version of UUID it’s using. It swore black and blue that it was a version 4 UUID. It even explained that it was, because the 13th digit was a 4, but I was looking straight at it, at the 1.

Eventually I said, well, here are 10 UUID version 4 values. “This is what I want you to do from now on.” To its credit, from that point on, it would use correct values. So you can coach it, but you have to be on your toes. Because it will totally, confidently give you answers that are utterly wrong.

We’re in a very early stage of this technology being public at the moment. But that’s the big takeaway for me so far, that you have to double and triple check what it says, no matter how confidently it says it.

Tash: Right. It sounds like it is quite often confidently incorrect, which is kind of the most dangerous way to be incorrect. Also, I mean, the drawbacks of that can be somewhat limited if you yourself supervising the tool are an expert in the area. But it still does require that expert supervision to make sure that’s not occasionally spitting out nonsense that looks believable.

Chris: Generally, my thinking so far is that you want to use it for certain types of operations where you’re not building bridges, and you’re not trying to do anything too precise. You want to check the output, but yeah, you definitely have to have that skeptical mindset. Which is not a bad thing to train on! You’ve basically got a junior analyst who’s trying to gaslight you at every turn. It really makes you question your assumptions, it makes you question every fact, so it’s kind of a good exercise on a certain level.

We have to be mindful that it’s very early days. It’s literally been 11 days since I started paying with this. So very early days for this space, and no doubt there’ll be a huge influx of investment and interest in competitors to it as well, which are all going to have their own subtleties.

But naturally, the thinking has gone to: “Well, what jobs is this going to automate?” As of today, I don’t see that it can automate… just judging what we have today, it can’t automate what an analyst does, because it can’t do it reliably yet. Who knows how that changes. But right now, I see it as a super productivity tool. You can tell it to do things like “Write me a report about a vulnerability called XYZ. It’s critical severity. The vendor name is blah. Go.” And it’ll spit out what is a pretty good template to start with. And maybe it would have taken you half an hour, or an hour, to craft the basic template if you’ve never done it before. And here you go, it’s got something for you to refine on.

It’s got all of this promise. It’s sure to change and grow over time, so I think any threat intel shop is going to be starting to think about how they’re going to use this, and how much they’re going to allow it to go fully automated. Like I said, I’m skeptical at this point on it. How much do they allow it to start feeding in as an automated input to human analyst processes?

Tash: Do you think that AI, like ChatGPT or other tools that might emerge, could change the role of the CTI analyst pretty fundamentally within the next 10 years?

Chris: Yeah, I think that’s inevitable. And not just for threat intel of course, but I think all knowledge workers and beyond.

But I think for what it means in the future for analysts, it’s really hard to predict how this is going to look even in 2 years. If I’m being… I think how I’m thinking about it right now is that the way the human brain works, based on what we know so far about the chatbot, we still process information a particular way, we still know what’s happening in our organisations. Something like ChatGPT has no idea. So I think for an analyst, maybe the direction to take your career is going to be under the assumption that this is going to be a standard thing. Everyone’s going to be able to describe a threat and spit out the STIX 2.1 very easily, or mark up with MITRE ATT&CK tags, but the bot can’t apply all these learnings into your organisation as far as we know.

Maybe there’s going to be a private tenancy chatbot down the line, that you could train on a whole bunch of your own corporate data. But that would all be theoretical at the moment. I can absolutely see that there would be demand for it, if you could guarantee where that data is going, so that it understands your organisation.

I still contend that analysts are going to understand the organisation, the politics, the budgetary constraints, all the day-to-day things that we humans pick up listening to conversations, reading emails, being in the organisation for five years, and just knowing how it works and how people think. Using what a chatbot like this can generate, helping it form outputs of knowledge, but then us, the humans applying the knowledge and making strategic recommendations.

I can’t yet see that we’re going to get replaced. For that, I might eat my words in five years, but that’s what I see as the most likely outcome right now.

Tash: Well, thanks Chris. This has been a really interesting discussion. It’s a fascinating area. Super new, and undoubtedly we’ll need to catch up again in a few months and see how things are shaking out then. Thanks again for chatting on this.

Chris: No worries. Thank you.