Hash Database Help

Overview

Hash databases are used to quickly identify known good and known bad files using the MD5 or SHA-1 checksum value. Autopsy uses three types of hash databases to help the investigator reduce the number of files that they have to look at.

The NIST National Software Reference Library (NSRL) contains hashes of files that are found in operating systems and software distributions. These files are known to be good in that they came from trusted sources and are typically on authorized systems. When processing files in the image, this database can be used to ignore files because they are assumed to be known and therefore uninteresting. The location of this database is configured when Autopsy is installed. The NSRL must be obtained from NIST at www.nsrl.nist.gov.

The Ignore Database is a database that the investigator must create. It is similar to the NIST NSRL in that it contains files that are known to be good and can be ignored if the user chooses to do so (only applicable when in File Type Category Analysis). Examples of files in this category include system binaries for standard builds. See Database Creation for information on creating this database. Its location is configured when the host is created and can be edited in the host configuration file.

The Alert Database is a database that the investigator must create. It contains hashes of known bad files. These are the files that an investigator wants to know about if they exist on the system. Examples of this include rootkits or unauthorized photographs. When using the File Type Category Analysis, these files will be saved in a special file. See Database Creation for information on creating this database. Its location is configured when the host is created and can be edited in the host configuration file.

Database Uses

Autopsy uses the hash databases in three ways.

Database Creation

Currently, Autopsy will only allows one to look entries up in a hash database. It does not allow one to easily create a database, but this will describe the process (it is quite simple).

Autopsy uses the hfind tool from The Sleuth Kit to do the lookups. This tool requires the database to be indexed so that it can perform a fast lookup using a binary search algorithm (instead of a slower sequential search that a tool like grep would do). When ever a hash database is updated (or created), it must be indexed. This can be done in Autopsy in the Hash Database Manager (Note that the database must already be configured though).

The NIST NSRL obviously does not have to be created, but it does have to be indexed before it is used.

To make a hash database, we will create a file with the same format as the md5sum command uses. This is just the MD5 hash, some white space, and the file name. For example:
    c4a6761b486de3c6abf7cf2c554289e5     /bin/ps

Make a file with this format for every line (it does not have to be sorted). For example, if you have a trusted system then you can make a hash database of its system binaries using the following:
    # md5sum /bin/* /sbin/* > bin-md5.db

After creation, hash databases must be indexed and sorted (this includes the NSRL). Databases will be indexed by using the hfind tool. The NSRL database would be indexed with:
    # hfind -i nsrl-md5 PATH_TO_NSRL/NSRLFile.txt

A database made by md5sum would be indexed with:
    # hfind -i md5sum bin-md5.db

Or, if Autopsy has this file configured from when the host was added, then it can be re-indexed from the Hash Database Manager.

Autopsy Configuration

The alert and ignore databases are stored in the host configuration file with the headers of 'exclude_db' and 'alert_db'. For example:
    alert_db    '/usr/local/hash/bad.db'

These entries can be edited at any time.

The NSRL database is configured in the conf.pl file in the directory where Autopsy was installed. It has the $NSRLDB variable. For example:
    $NSRLDB = '/usr/local/hash/nsrl/NSRLFile.txt';

It can be edited, added, and removed at any time (but you must restart Autopsy).

References

Issues 6 and 7 of
The Sleuth Kit Informer discussed hash databases.
Brian Carrier