This article will explain the steps to build a modern fraud detection system for organisations using the ELK Stack (Elasticsearch, Logstash, Kibana).
The ELK Stack is a powerful open source stack giving you feature-rich log analysis while leveraging open source.
While fraud detection and prevention is one of the biggest challenges faced by modern organisations, the good news is that the ELK stack, if properly configured, is up to the challenge. That's because it includes robust anomaly detection features suitable for pinpointing potential fraud.
This comprehensive guide focuses on equipping you with the knowledge and techniques necessary to detect and prevent fraud using the ELK stack.
Yes! Many large organisations are successfully using ELK for fraud detection and prevention.
As you may have noticed, there's a missing step 0, which is collecting data for ingestion with Logstash. You'll need a means to collect fraud telemetry data from your systems. A common strategy for data collection is to use a lightweight fraud detection agent like Antifraud.
The 3 stages of fraud detection
In general, fraud detection can be thought of as a "pipeline" incorporating three separate stages:
Fraud detection features in the ELK Stack
Here are some of the ELK stack's features that make it a great fit for handling automated analysis and alerting for fraud teams:
The ELK stack's robust capabilities in data aggregation, real-time monitoring, customisation, machine learning, scalability, integration, compliance support, and historical data analysis make it a fit-for-purpose toolchain for fraud detection and prevention.
It’s important to note that the ELK stack's performance is only as good as the quality of the data it ingests.
Without enough relevant data on your users' behaviour and devices, it will be difficult to pinpoint possible fraud because your analysts may lack essential context needed to make decisions about risk.
The good news? This problem is easily solved if you use a fraud detection agent purpose-built to collect a multitude of varied fraud signals for easy ingestion into ELK.
A fraud detection agent is a lightweight script that runs in the background of user sessions (typically in the browser or embedded in an app), quietly gathering and sending fraud telemetry to your log analysis platform.
While there are many fraud detection tools on the market combining both data collection and analysis capabilities, these are often not suitable for organisations already effectively using the ELK stack.
Rather than paying for and learning a new tool, it makes sense to work with the log analytics platform you've already invested time, energy, and labour into.
We suggest pairing your existing ELK setup with a fraud detection agent designed to integrate painlessly with ELK, such as Antifraud.
A fraud risk assessment involves reviewing historical data to understand which types of fraud pose the biggest risks to your organisation.
Consider questions like:
What types of fraud has your organisation typically faced most often?
How have these types of fraud usually been identified? What were the signals and data that allowed the fraud to be discovered? (Or, if the fraud wasn't discovered until reported by a victim, what signals or data were missing that might otherwise have allowed your team to detect the fraud?)
A fraud risk assessment will help you allocate your fraud prevention efforts and investment with the types of fraud that present the greatest risk to your organisation.
The collection stage of your fraud prevention pipeline is typically handled by a fraud detection agent. This agent gathers relevant fraud detection data from your website or app, and sends this data to Logstash in JSON format.
It’s makes sense to collect data that relates to your top fraud risks. For example, if account takeover (ATO) attacks are a risk for your organisation, then collecting user interaction behaviour and device signals will be particularly useful.
The steps to configure your fraud detection data source will depend on the method you’re using to collect fraud data.
You are likely already using Splunk for log analysis, so once you ship data from your fraud detection agent into Splunk you’ll need to verify the new data is being ingested correctly alongside your existing logs.
Now that you’ve verified that Splunk is correctly ingesting your fraud detection data, it’s time to leverage Splunk’s fraud detection and response capabilities to set your fraud prevention program in motion.
Anomaly detection is the process of finding outliers in your data set. These outliers can present themselves in two different ways:
An anomaly detection example:
Imagine a user with an account at an online computer and electronics retailer. Once or twice a year the user makes a purchase, usually under $500, from an IP address located in Melbourne, Australia.
Kibana will begin to associate these features (purchase habits and location) with the user.
One day, logs are ingested for the same user showing a purchase of over $10,000 of computer equipment from an IP address in Johannesburg.
This is likely to be picked up as an anomaly because these features don’t match the typical pattern associated with that user.
What is less clear is the cause of this anomaly. Does this indicate an account compromise, or has the user simply travelled to Johannesburg on business?
This is where oversight from human fraud analysts is essential. Often, this will mean contacting the user to verify the behaviour is legitimate.
This example can also be extended to demonstrate the power of combining multiple behavioural and device signals together to form a full portrait of the user.
For example, if the account was not only being used for an unusually large purchase from an unfamiliar location, but also using an unfamiliar browser and device, and a faster than typical typing speed, the scales may tip toward a possible account compromise.
Ultimately, this is why fraud detection software that combines together multiple different device and behavioural signals is especially helpful. Building a risk profile based on a broad range of signals can help human analysts to more precisely differentiate legitimate but unusual behaviour from potentially fraudulent activity.
How to configure anomaly detection in Splunk
Splunk have created their own app that provides functionalities to create, train, and apply anomaly detection models to your data without requiring your team to have an ML or data science skill set.
One of the benefits of this app is that it uses an anomaly detection algorithm called ADESCA which is well-suited for use with time series data (such as logs).
To get started, first, download and install the Splunk App for Anomaly Detection from Splunkbase.
Next, create a new job using the app. Add your fraud detection dataset and select the field you want to mark for anomaly detection. You can also configure the detection sensitivity level for this field. For stable fields that don’t change often (such as the user’s operating system) you may want a high sensitivity. For fields with a large amount of variance, such as time, you may want to select a lower sensitivity.
The best way to check the appropriateness of the sensitivity level you’ve selected is to click ‘Detect Anomalies’ and review the resulting data, observing how many false positives are generated.
Note that while false positives are typically much more visible than missed detections, missed detections are just as important to consider--if not more so.
Ideally, you will run a test detection on a known dataset where you've previously identified fraudulent activity. This will help you avoid both missed detections and false positives.
Finally, you can ‘Save Job’ and schedule it to run at set intervals from the Job Dashboard.
Splunk UBA
For more complex anomaly detections you may want to consider Splunk’s User Behaviour Analytics (UBA) product, which can stitch multiple anomalies together to accelerate the detection of common fraud profiles. This tool automates aspects of fraud detection which might otherwise require custom development using ML techniques.
Machine Learning Toolkit (MLKT)
Splunk also offers a free Machine Learning Toolkit app where you can configure your own custom machine learning pipelines and detections for fraud detection, such as outlier detection. However, using this app will require knowledge of ML techniques.
It's easy to create alerts based on detected anomalies and outliers, either in real-time as they come in, or on a scheduled basis (batching anomalies together). As a general rule, fraud detection and response is best done in real-time where possible.
There are three main aspects to consider when configuring an alert:
A fraud detection alerting use case
A common field for anomaly detection in banking is transaction amount.
That's because most of us make transactions of a similar size, at a similar cadence. For example, these might include our rent or mortgage payments, utility bills, or recurring subscriptions.
Fraudulent transactions often deviate from the user’s typical transaction pattern - in particular, they may be much larger than the user’s typical transaction volume, as fraudsters attempt to quickly move large amounts of money out of the account. This makes fraudulent transactions a good candidate for anomaly detection.
Imagine that we have set up anomaly or outlier detection on the "transaction amount" field. Next, we could create two different alert rules based on how much the outlier deviates from what we expect for the user:
Alert 1: For outliers less than two standard deviations from the mean, this alert will trigger a Splunk message intended for human analyst review.
Alert 2: For outliers greater than two standard deviations from the mean, this alert will trigger a script that sends an SMS to the user notifying them of the transfer.
As you can see, the power and flexibility of Splunk alerts means they’re capable of forming the basis of both your manual and automated fraud response strategy.
Talk to us about fraud detection with Splunk
We are a full service consultancy with deep experience building fraud detection and response workflows using Splunk.
Reach out to us for a no-obligation initial chat to discuss your fraud prevention goals and get advice on the best way to leverage Splunk as part of your fraud detection program.
We can also provide you with more information on Antifraud, our fraud detection agent designed to integrate seamlessly with Splunk.