Running Your SOC Playbooks as Code: Getting Started

Running Your SOC Playbooks as Code: Getting Started
February 21, 2024

This article is an edited transcript of Cosive Managing Director Kayne Naughton’s keynote presentation at AusCERT2019, a large cyber security conference in Australia. You can watch the recorded presentation in full here. (Cover photo by Manuel Nägeli.)

You know when you run into someone you haven’t seen in for a while, and you’re like: “How’s that car you’re rebuilding?” And then for the next two hours they excitedly tell you about it? That’s pretty much what I’m like with SOAR at the moment (Security Orchestration, Automation, and Response).

I’ve been living and breathing SOAR for the last two or three months. It’s a really interesting area, and probably the only thing in security that I think everyone should do. Every company that has at least one security person: if you’re not looking at automation and orchestration, you’re probably letting yourselves down.

But first, who am I?

We’re Cosive. We’re a consultancy largely based out of Melbourne, but also New Zealand and Tasmania. We started out doing mainly threat intelligence, but we’ve since moved more into doing consultancy around SOCs. We do a lot of bespoke software development, and we’ve been doing some work in the traditional intelligence space recently, working with AUSTRAC and ACIC. We do a lot of development for our customers where they have security tools that don’t quite play nicely together. And that’s really what SOAR comes down to, in many ways.

(When talking about a general field, I think it’s important to mention any potential biases or conflicts of interest. We started working with Phantom really early on, before they were acquired by Splunk. We also support, integrate and extend Open Source tooling, like Stackstorm. Though we started having those commercial relationships because of we thought those things were cool, rather than the other way around.)

In terms of what we’ve done in the orchestration space, I’ve written integrations for Swimlane, Phantom, StackStorm, and WALKOFF. In terms of traditional orchestration, I’ve written client stuff in Puppet, Ansible, NiFi, Chef, CFEngine, and Luigi. I’ve orchestrated through most of the orchestration engines now, and I’ve got a pretty reasonable view on how they all fit together. With a lot of this stuff, you won’t find the really sharp edges until you’ve been using it for a while, so I think I’ve got a bit of a feel for it.

What most organisations are doing

In terms of what is most likely for your organisation now, unless you’ve reached the next stage:

  1. A machine goes ding. There are a lot of people who will sell you new machines with nice blinky lights. They’ll cost you somewhere in the range of $100,000 each. You plug them into a rack and they generate alerts, assuming you’ve plugged them in correctly. That’s cool and all, but I’d almost guarantee that when you buy one new blinky light box, you don’t get one new staff member to help you support said blinky light box. As your bosses, and even potentially yourselves, buy more and more things, you get more, and more, and more alerts. And you’re expected to try to find efficiencies in order to deal with them. And that is (unfortunately for most organisations) not all that easy. You get to a point where there’s only so much blood you can get out of a stone.
  2. Human looks. A human will look at what the alert is all about.
  3. Human logs into 3 different systems to gather context about what the alert means. So… a user had malware. Who’s the user? What groups are they in? What permissions do they have? What’s the malware? What did the malware end up doing? How did the malware get there?
  4. A human responds. If you are fortunate (and don’t take it as a given this is the case), you are working out of an incident response system, or a ticketing system. A lot of people use Request Tracker, which is as old as the hills, or Jira, or maybe ServiceNow. But there are also a lot of teams that are working out of inboxes. So it’s really about whether you or your colleague opened the email first, and whether you move it to the folder where you double-handle something. And you’re probably not getting a lot of statistics about what you did.

This is what most people look like now. I’ve noticed that a lot of organisations I’ve talked to think that it’s just them. They think it’s their little secret they’re ashamed of; that they work this way, and that everyone else must be doing all this awesome stuff, because they see these presentations about amazing new technologies. But almost everyone is working this way. The major banks have been doing this a long time. Very few organisations are as regimented as they think they should be, and would like to be.

What ‘Nirvana’ looks like

Nirvana, from my perspective, looks like:

  1. A machine goes ding.
  2. Machine looks.
  3. The machine logs into the 3 different systems to gather context. It then throws the context back to a human.
  4. Human approves a response. The human looks at all this context, uses their own judgment (because we still don’t know how to give judgment to computers, and there will always be exceptions to the rule), and a human can give a response. For example: “Yep, go and do the thing you’re recommending.”
  5. Machine responds. You’ll also have an audit log, where you can see exactly what happened, and when it happened.

Reasons to strive for Nirvana

Here are some reasons why you should orchestrate:

  • Not every tool talks to every other tool. Having a specialised middle person that knows how to talk to everything makes sense, instead of everything that you use knowing how to talk to Jira to lodge a ticket. That sucks. Maybe they change the API slightly, then everything has to change. You’re better off having one “octopus” in the middle that touches everything.
  • Consistency of process. If you’ve automated or orchestrated a process, it’s done the same way every time, and you know exactly how long everything took. You don’t have the ability for different analysts to apply their own unique take. Maybe Bob goes and uses one particular site to look up WhoIs data, whereas Glenda uses a different site. And maybe they get different responses. Maybe one’s better than the other, but they have their personal preference. When you start orchestrating and automating, it’s consistent. And if you do want to change, you change, and in doing so you change the process. You don’t have different people on different shifts, doing things in different ways, with different outcomes.
  • Having approval chains is really useful. I’ve had people who work for me in a SOC who misread an alert. The advice was very unclear. But they effectively blocked the IP addresses belonging to LinkedIn rather than the impersonator addresses, because all of the context was jumbled together. It’s easy to do. However, if you’re running an orchestration, then you can have a two-stage approval process. If you want to block something on your corporate proxies, you probably need a Tier 3 or a management person to look at it and go: “Yeah, I agree with that.” If it’s mitigating malware on a box, maybe Level 1 or Level 2 can do it.
  • It is faster and cheaper to document something for a computer to repeat a process than it is to write a giant Word document for a human and deal with all the changes around it. When you change the instructions for a computer, it obeys them. If you have people in your organisation who open up the instruction document every time they perform a task, and read it point-by-point, good job you… but there’s very few of them out there. Most people will vary quite quickly from the written standard.
  • When you add extra context, or an extra step, or an extra check, or an extra whatever, to an orchestration playbook, it’s there. You’ve done the work. It might take slightly longer to run. But if you have people on your team and you expect them to do seven things and you add another two, and then it’s nine, all of a sudden they’ve got 28% more things to do every day. That’s pretty rough. Whereas computers don’t care.

Orchestration vs. Automation

Automation is when you take a thing that you do, and you teach a computer how to do it. Like a robot playing guitar. If you want to get it to play the ‘Stairway to' Heaven’ guitar solo, you can teach a robot how to do that. That is automated.

However, orchestration involves looking at multiple systems at the same time and making them work nicely together, like an orchestra.

You’ll find that most organisations have a reasonable amount of automation somewhere at this point, or automation tooling, but it’s having all those individual things work together (orchestration) that’s tough.

I think orchestration is obviously the right thing to do. But I think one of the main reasons that people aren’t doing it, or haven’t achieved it, can be seen in actual musical orchestration. Hans Zimmer sits down at a piano… and writing a song is pretty hard. You need a melody, you need a rhythm, all of this stuff takes a lot of effort to put together. It’s quite challenging. But you can divide the piano parts into sections and have different instruments playing the bass notes, or the melody. Maybe the cello plays the bass notes, and the violins play the melody. So the difference between playing something on a piano and having four people play it, in terms of writing the music, is surprisingly small.

Then when you start writing for a chamber orchestra, you’ve just got more of each instrument. The hard bit is sitting down at a piano and writing the song. After that, you leverage all of the knowledge and the systems you’ve put together to expand it out. The difference between writing for a chamber orchestra and a full orchestra is not that much, even though you’re writing for 200 people instead of 20. It’s just more of the same instruments.

The hardest bit you will ever do, when it comes to orchestration, is writing the first couple of playbooks. From that point, you build on it, and it snowballs.

The difference between SOAR and Infra Orchestration

  • Security tends to be more event driven, and generally deal with more events. Disk fulls don’t happen all the time in most people’s fleets, whereas security events or security warnings are a little bit more common.
  • There are more different systems to talk to, usually. If you’re standing up infrastructure on AWS, there are not that many bits and pieces that you need to plug into. Whereas security is probably touching everything.
  • There are more zones of control.
  • There are more externalities. You’re probably going outside for context more often. If you’re standing up a system inside your organisation, then you’re probably really just looking at your configuration database, looking at the instructions for the box, what its name should be, and so on. If you’re doing security response, you’re thinking about: this IP address in Panama, how long has it had that domain name? What other things are over here? There’s a lot more external stuff you need to care about.
  • Even though systems are different, they are also quite complementary. I wouldn’t stand up a box using your security orchestration stuff, or possibly even reformat it. Maybe you can rely on your existing processes inside your organisation to do that.

One thing I don’t see too often in traditional orchestration is the ability to debounce an alert. If you have a warning about a certain domain five times, coming through different systems, you don’t want to run the same workflow five times, trying to do the same thing, all within ten seconds. That is something that a lot of the traditional tooling doesn’t support that well.

Hand-off points

Unless you work in a very small organisation, or one that just had a major security incident and everyone thinks that security is super important, you will have a zone of responsibility, and a zone of control, and other people will have their own. As a former Sysadmin I can totally appreciate this. You need to work with that. If you want to have a desktop re-imaged, you probably don’t want to have your security tooling going and doing that out from underneath your IT outsourcer.

(Though if you’re paying per ticket to an IT outsourcer, you can probably make a really good use case for SOAR. If you pay $40 per ticket you create, or $30, or $20, add up how many of those you can automate so they never go to the outsourcer. That’s how much money you can ask for your budget. You can say: “I’m not costing you anything. I’m saving you $15k, and I want to use $15k for the time to do this. Let’s do it.”)

Be careful of hand-off points and don’t try to ride roughshod over your IT people, because that will usually go very poorly for you, or them, or your relationship.

SOAR Options

There are lots. I realised when looking at some of the licensing stuff that some of this is a little bit more fuzzy than I thought. In terms of Open Source, StackStorm is out there, it’s being used a lot. It’s from Extreme Networks, who do a lot of traditional networking stuff and ended up owning StackStorm. They appear to be good custodians of it, so that’s nice. (I always look for that in an Open Source project: who’s paying for it? Is it just a passionate individual doing it on their own time? Because passionate individuals burn out. Passionate individuals is no way to write software that a bunch of really large companies rely on to do what they do. There needs to be some sort of model that lets people spend time on it, I think.) StackStorm is something that we use a fair bit, and recommend. It’s got a good engine. It’s got good bones.

There’s also NSA’s WALKOFF. I don’t know for sure, but WALKOFF certainly feels like a skunkworks proof of concept. I would imagine the NSA doesn’t use it at really large scale. If you compare it to some of the other projects they’ve open sourced, like ghidra and NiFi, those look like really engineered projects. WALKOFF not so much. But they’re re-architecting some of the backend stuff, so it could be all right. But once again, those two are free. Though StackStorm has a thing you can pay for, which is a drag and drop tool to join your actions together.

Closed and Commercial

Phantom, Demisto, and Swimlane are the main options commercially for SOAR. They all have a similar approach, with different features and integrations. (Demisto is almost halfway between closed source and commercial and Open Source. Even though the engine is closed source and commercial, all of the integrations, playbooks, and everything that they publish, are under an MIT license. They also publish a standard for their playbooks, so if you wanted to take all of the hard work that Demisto and their partners have done in terms of integration and coming up with examples, you could cross-port them into one of the other engines. I’ll talk a little bit about that later. It would be an option, and it would be legal: the MIT license means you can do what you want, how you want, even commercially. I’m not a lawyer, but that’s the short version.)

Platforms with SOAR Options as Features

A lot of other platforms have SOAR as a feature.

Most services that generate alerts have some kind of orchestration, because when you generate alerts, you want to do something with them. Also, this is my own personal view. These companies might tell you that they are a dedicated SOAR solution, but from my perspective, I believe it to be an additional element of what they do, rather than their core intent.

What is best for you?

It really depends on…

  • The integrations that you need. You might be a CISCO shop, or a Palo Alto shop, or some other kind of shop. Maybe everyone has a Mac laptop and you all wear berets to work, so you don’t need to worry about Windows. There are a bunch of things that might impact what works best for you.
  • Existing vendors. This is not my techy view; this is my very pragmatic “business guy” view. If you have a very strong pre-existing relationship with one of the vendors, they will probably be more than happy to cut you a really good deal on this stuff.
  • Your level of engineering talent. StackStorm and WALKOFF are pretty cool, but you need a certain level of in-house engineering to utilise them, which I’ll talk about in a bit. If you don’t have that, you’re probably going to need to rely on support more heavily.
  • Whether you are CapEx vs. OpEx. Maybe you can buy something upfront and do it. Maybe you don’t have that, but you’ve got some guerrilla budget in terms of your staffing that you can use to build something. Also, office politics play a part. Your relationship with your vendors is quite often not purely technical, and you might need to look at what’s good, and what’s bad, and the direction of the wind inside your IT function.

Next: Read part 2 of this series on security orchestration.