What is a Malware Repository?

A malware repository is a database or archive designed to safely store samples of malware. Learn how malware repositories are used for research and analysis, threat intelligence, for educational purposes, and for testing and benchmarking. Plus, learn about the best open source and commercial malware repositories available today.

A malware repository is a purpose-built database or file archive used to safely and securely store malware samples. Malware repositories are used by security researchers, analysts, and reverse engineers to store and study malware and, ultimately, better understand the malware they're dealing with.

What are the main use cases of a malware repository?

One day, while analyzing traffic logs, a security researcher notices unusual outbound connections from a document that initially looks innocuous. The researcher, Paige, analyses the network traffic and begins to suspect that the document may be malicious (a.k.a. a maldoc). However, the organisation's security tools haven't flagged the document as suspicious, suggesting that it may be a new or sophisticated variant of a commodity maldoc, or a targeted maldoc crafted specifically to infiltrate Paige's organisation.

Paige decides to upload the maldoc sample to a repository in order to gain more information about it, since repositories usually provide access to a suite of automated analyses when uploading samples. The type of repository Paige chooses for this will depend on whether the maldoc is sensitive or appropriate to share with the broader security community.

When to use a public malware repository
In this first scenario, let's say that the maldoc is a fake PDF invoice that does not contain any PII or data that could be traced back to Paige's organisation. Since the maldoc appears to be generic the sample may be suitable to upload to a public repository like VirusTotal.

When to use a private malware repository
In this scenario, let's say that the maldoc is a fake PDF invoice addressed by name to the CEO of Paige's organisation. This is an example where malware is considered sensitive: it contains information about your organisation or systems, your defences and security controls, or even data about your employees, customers, or vendors. This is a particular risk with unique malware and maldocs targeted to your organisation. In this case, Paige may want to share the sample with other trusted members of her internal security team, but not with the broader security community. This kind of sensitive, targeted maldoc sample is best uploaded to a private malware repository.

Automated analysis

Malware repositories like VirusTotal (public) and MalwareZoo (private) run the sample through dozens of automated analysis tools, returning a detailed report of the file's analysis. In this case, the analysis identifies the file as a trojan that communicates with command-and-control (C2) servers. The report includes IoCs and signatures that Paige can use to detect and prevent similar threats in future. From example, Paige could implement network rules to block traffic to the identified C2 servers.

What are the technical features of a malware repository?

Malware repositories typically have two main technical features:

  1. Isolation. Malware samples can be dangerous if poorly handled. For this reason, malware repositories are highly secure and isolated environments with no direct exposure to the internet or external networks. This isolated architecture helps to prevent accidental release or execution.
  2. Encryption. Malware samples are stored in encrypted format to prevent unauthorized access. This also provides protection if the repository's storage media is compromised, since the malware cannot be extracted and used maliciously in its encrypted form.

What's the difference between a malware repository and a malware sandbox?

A malware repository is designed to store, manage, and catalog multiple malware samples. It includes functionality like secure storage, metadata indexing for easy search and retrieval, and strict access controls to ensure that only authorised individuals can access the malware samples. Repositories generally also offer analysis tools or links to external analysis resources.

On the other hand, a malware sandbox is an isolated environment specifically designed to safely execute and analyse malware without putting the host system or network at risk. The primary purpose of a sandbox is to observe malware in a safe, controlled setting.

To use a real-world analogy, a malware repository is more like a library, while a malware sandbox is more like a study desk. Books are stored and organised in the library, then taken to a library study desk for reading.

What kind of analyses can be conducted via a malware repository?

Malware repositories typically support a comprehensive suite of analysis services. As an example, Assemblyline 4 (an open-source malware repository) comes bundled with 40 analysis services. 10 additional community-created services are also available. These services fall into a few main categories:

  • Anti-virus and malware detection. Services in this category focus on identifying known malware and viruses using various detection techniques.
    Dynamic analysis. These services analyse the behaviour of files in a simulated environment to observe malicious activities.
  • Static analysis. Static analysis services examine the contents of files without executing them, looking for suspicious patterns or characteristics.
  • File extraction and unpacking. Aimed at decompressing files and extracting embedded content for further analysis.
  • Metadata and artefact extraction. These services extract metadata and artefacts from files, often necessary to understand the context and origin of the data.