Technische Universität Braunschweig

The Drebin Dataset

The dataset contains 5,560 files from 179 different malware families. The samples were collected in the period of August 2010 to October 2012. You can find more details on the dataset in the paper describing Drebin and the corresponding evaluation.

We have packaged the malware samples in chunks of 1000 applications:  000102030405

Family Labels

Additionally, we provide the SHA256 hash of all malware samples in the dataset and corresponding AV family labels. The labels have been created manually by unifying the output of different AV scanners:  download


For reproducing our experiments, we also provide all features extracted from each of the 123,453 benign applications and 5,560 malicious applications. Each feature is prefixed with a string indicating the feature set.  download

Dataset Splits

For the evaluation we split the dataset into a known and a unknown partition and repeat this procedure 10 time and average result. The lists of SHA256 hashs for each split can be downloaded here: download