OpenAI released a public beta of their ChatGPT bot late last week. To introduce what we’ll focus on in this post, I’ll let ChatGPT do the honours. As it will be throughout this post, the bold text is my prompt, and the following text is ChatGPT’s response.
That’s already amazing, and we haven’t even gotten started yet. The first thing you’ll notice is that it seems to know that ATT&CK is a framework, and it describes how ChatGPT is a good tool for a case involving natural language analysis.
Let’s give it a try and see how well it does. Bear in mind:
For a sample, let’s start with US-CERT Alert AA22-320A since it’s helpfully marked up with human-considered ATT&CK IDs already so we can compare what ChatGPT comes up with.
Let’s take this snippet from the original report:
In February 2022, the threat actors exploited Log4Shell [T1190] for initial access [TA0001] to the organization’s unpatched VMware Horizon server. As part of their initial exploitation, CISA observed a connection to known malicious IP address 182.54.217[.]2 lasting 17.6 seconds.
The actors’ exploit payload ran the following PowerShell command [T1059.001] that added an exclusion rule to Windows Defender [T1562.001]
I’ll strip out all the marked ATT&CK IDs and feed this as a prompt to ChatGPT.
Hmm, it says it can’t handle CTI reports. Note that it knew what CTI is and what it means too! Let’s try this another way without mentioning CTI reports specifically:
That certainly validates it can begin to perform this type of analysis!
Next, it’s getting a little tricky to see what it’s mapped so far in the text - maybe it can help with that?
That’s pretty good so far! Note I didn’t have to quote the initial report this time - it understood we’re still talking about that same text extract.
Hmm, it didn’t tag the use of Powershell though, so:
Nice. I’d prefer that ID is right after the mention of PowerShell though, so:
Very nice. I’m having trouble keeping up with the changes at this rate, maybe it can help?
Now let’s get it more report-ready by adding handy links:
Note I didn’t tell it how it make those links - it just knew (!). In other experiments, it will also handle links to sub-techniques just fine.
You could also present this as Markdown, which it tried to perform syntax highlighting on. Take a look at this:
And so “that’s not right, fix it” just…. worked. Wow, okay.
How GOOD is this extraction and analysis, though? It picked T1190 for the Log4Shell activity, just like the human analysts - amazing.
However, in contrast to the human analysts at CISA who classified the Windows Defender exclusion rules as T1562.001: Defender Evasion, it’s picked T1089. What does T1089 exactly describe? Is it a good fit?
Given it’s tampering with Windows Defender, this seems like a solid choice, even though the human analysts of the report didn’t use it.
Note that in one of my previous experiments, it somehow got confused between the name and ID of a technique. You definitely want to be independently verifying its suggestions and claims at this stage!
Can we lead it to suggest the same technique ID that CISA used?
Hmm, not quite. It’s quite incredible how well it’s processing our prompts and correctly implementing the intent behind them, but sometimes we might have to try another tack. For now, let’s see what it makes about the merits of each ID:
Great explanation - both are solid fits really. Let’s say we wanted switch to the CISA-assigned ID in this case:
What you would have noticed too is that all along but especially here is that it understands context brilliantly. When I said “make that replacement”, it worked out exactly what I meant.
Early on in my experiments, ChatGPT would suggest ATT&CK IDs that are “typically” seen with an attack like the one described, but weren’t explicitly mentioned in the activity described in the report.
One example was a CTI report snippet explaining that credentials were obtained, then applying T1193: Spearphishing Attachment since that is (in its words) “often” how credentials are obtained. After asking to explain how it arrived at that ID and pointing out that there are other techniques for obtaining credentials that may well have been employed here, I asked ChatGPT to drop the ID as drawing too long a bow. It cheerily complied and dropped the technique ID from its active list.
I found that saying things like “extract the ATT&CK IDs from this text which can be DIRECTLY inferred” has helped with that problem greatly. That in itself is mind-blowing.
This whole experience very much reminded me of discussing ideas with analyst colleagues on the most appropriate way to classify or describe something. You certainly wouldn’t trust ChatGPT to just make technique ID assignments and then publish it to the world automatically - you’d have another analyst in the loop. However, this is the same thing you want to do when it’s all humans in the loop. ATT&CK IDs have a lot of room for interpretation, after all.
A few other random limitations I’ve noticed:
Let’s leave it there, but I think this illustrates how much potential is here.
It seems only appropriate that ChatGPT does the honours….
Hmm, that’s a little too formal…
What this means for threat analysts (and every knowledge worker job!) is a massive topic for another time, but it’s fascinating to experiment with what it can accomplish versus similar ATT&CK-specific tools around today.