The Drebin Dataset
The dataset contains 5,560 files from 179 different malware families. The samples were collected in the period of August 2010 to October 2012. You can find more details on the dataset in the paper describing Drebin and the corresponding evaluation.
Additionally, we provide the SHA256 hash of all malware samples in the dataset and corresponding AV family labels. The labels have been created manually by unifying the output of different AV scanners: download
For reproducing our experiments, we also provide all features extracted from each of the 123,453 benign applications and 5,560 malicious applications. Each feature is prefixed with a string indicating the feature set. download
For the evaluation we split the dataset into a known and a unknown partition and repeat this procedure 10 time and average result. The lists of SHA256 hashs for each split can be downloaded here: download