The Sleuth Kit Informer

Brian Carrier

Issue #4
May 15, 2003



Few people would disagree that data reduction is critical to file system forensics. The fourth edition of The Sleuth Kit Informer contains the second article in a series of three on the 'sorter' tool. This article shows how to create custom rule sets so that an investigator can more quickly identify suspect data. Feedback and suggestions are always appreciated (carrier at

What's New?

The Sleuth Kit and Autopsy were used in part of a 1-day Linux-based Forensics course and lab that I taught during the 2-week SEARCH Advanced Internet Investigations Course in April.

Paul Bakker has asked for testers for a new indexed search function for Autopsy and The Sleuth Kit. Email him if you are interested (bakker at

Did You Know?

UFS and EXTxFS file systems are organized into groups, each of which has its own set inodes and data blocks. All files in a directory allocate inodes and blocks from the same group. So, if you have a deleted file name but not the corresponding inode and blocks, then you can restrict your manual file recovery to just that group to find the data more efficiently.

To identify the group, look at the inode for the parent directory (i.e. the '.' entry in the directory). In Autopsy, goto the metadata mode for the directory and the group number will be given. Use the 'Image Details' mode to identify the inode and block range for the group to examine. The block range can be given to 'dls' to extract just the allocated space from the group.

Creating Custom sorter Rule Sets (Part 2 of 3)


Last months article, 'Sorting Out The Sorter', showed how to run the 'sorter' tool from the command line. 'sorter' is a tool that sorts files based on their file type and also does hash database lookups and extension matching. This article is the second in a series of three and shows how to customize the tool. The last article in the series will describe how the tool was designed and how it works.

The current trend in "customizing" an application is to apply different skins so that its appearance is different. In the command line forensic tool world though, it means that we are going to create custom rule sets. Rule sets are used by sorter to define what category a file should belong in and what extensions a file should have. Any file that is not defined in one of the rule sets will be placed in the 'unknown' category and any unknown extensions will be alerted with an Extension Mismatch.

This article shows how to move the files from the 'unknown' category to a specific category and how to add extensions to a file type to reduce Extension Mismatch errors. As with all open source software, support from The Sleuth Kit users is needed to improve it. When new rule sets are created, please send them to me (carrier at so that they can be included in the next distribution.

Loading Configuration Files

Before we discuss how to create custom rule sets, lets cover how the rule sets are loaded. 'sorter' currently comes with seven configuration files. Each file contains rule sets that define categories and extensions. The files are located in the 'share/sorter/' directory where The Sleuth Kit was installed. Two of the seven configuration files are used, one generic and one operating system specific. The 'default.sort' file contains generic rule sets that are consistent across platforms and is used each time sorter is run. Examples include graphic images, executables, and documents. Five files are for the file systems that The Sleuth Kit supports:

Each file system has been matched with its parent operating system and contains definitions that are specific to that OS.

Finally, there is a configuration file called 'images.sort' that contains only the rule sets needed to save graphic images.

The generic configuration file is loaded first, followed by the operating system specific configuration file. If an additional configuration file was specified on the command line with the '-c' flag, then it will be loaded last. If two files have overlapping rule sets, the last one to be loaded will be used. In other words, the custom rule sets have priority over the OS rule sets, which have priority over the generic rule set.

For example, the following would load 'default.sort' and 'windows.sort':

     # sorter -f ntfs -d save_dir -m "C:/" hda1.dd

The next example would load 'default.sort', 'windows.sort', and 'foo.sort':

     # sorter -f ntfs -d save_dir -m "C:/" -c /usr/local/forensics/sorter/foo.sort hda1.dd

The '-C' flag can be used to load only one configuration file. An example of this is with the 'images.sort' file. When it is specified with '-C', then only graphic files are saved and everything else is ignored. The following only loads 'images.sort' and not 'default.sort' or 'windows.sort':

     # sorter -f ntfs -d save_dir -m "C:/" -C /usr/local/sleuthkit/share/sorter/images.sort hda1.dd

Category Rule Sets

Do you have too many entries in your 'unknown.txt' file? Creating category rule sets can move many of those entries to the correct category. Recall that 'sorter' runs the 'file' command on each file and uses the category rule sets to examine the 'file' output and identify which category the file belongs to. A category rule set has the following form:


The CATEGORY_NAME is replaced with the category name. It must be a valid directory name and can only contain letters and numbers. Examples include 'images', 'text', and 'exec'. The FILE_REGULAR_EXPRESSION is a Perl regular expression that is used on the output of the 'file' command. When writing the regular expression, the ^ and $ symbols can be used to define the boundary locations. The comparison will be done case insensitive. Examples include:

     category text HTML document text
     category text ASCII(.*?)text

The first entry matches any 'file' output that has the string "HTML document text" in it. The final entry allows the file type of "ASCII English text" as well as just "ASCII TEXT". If you have never used Perl regular expressions before, refer to a Perl book for details or use 'man perlre'.

When 'file' does not know the file type, it calls it 'data'. The 'default.sort' file has a rule set for the following:

     category data ^data$

This saves files with a type of 'data' in the 'data' category, but does not save file types such as 'Oracle database'.

Internally, the 'sorter' tool uses a Perl hash for the rules. The key to the hash is the regular expression and the value is the category. Therefore, categories can have multiple rules. If two categories have the same regular expression though, a warning will be generated.

Note that it is very easy to create regular expressions for two categories that will match the same output. For example:

     category text (.*?)text
     category documents Rich Text Format

The first expression was originally designed to match "ASCII text", "ISO-8859 text", "HTML document text" and save them to the 'text' category. Rich Text documents though are not standard text and should be saved to the 'documents' category. Unfortunately, there is no guarantee which the "Rich Text Format" string will match first. The solution to this is to provide specific rule sets:

     category text ASCII(.*?)text
     category text ISO\-8859(.*?)text
     category text HTML document text
     category documents Rich Text Format

There is a special category called 'ignore' that is used for any file type that is not needed. Any file in this category is ignored and will not fill up the unknown category. As an example, the 'windows.sort' file has a commented out rule set for 'raw G3 data'. There are many files of this type and it was unknown what category they should belong to:

     # category ignore raw G3 data, byte\-padded

Extension Rule Sets

Now that the 'unknown' category has been made smaller by creating custom category rule sets, it is time to remove the false positive Extension Mismatch errors. This is done with extension rule sets. Recall that 'sorter' uses the output of the 'file' command and the extension rule sets to identify a list of valid extensions for the file. An extension rule set has the following format:


The EXTENSION_LIST is replaced with a comma separated list of valid extensions for the file type. The FILE_REGULAR_EXPRESSION has the same properties as with category rule sets previously described. It is also used against the 'file' output. An example includes:

     ext txt,log ASCII(.*?)text

This will ensure that any file type that matches 'ASCII(.*?)text' has a '.txt' or '.log' extension. The extension is compared case insensitive. Unlike category rule sets, the extension rule sets can have multiple entries for the same file type. For example:

     ext txt,log ASCII(.*?)text
     ext c,cpp,h,js ASCII(.*?)text

This will check ASCII files for 'txt', 'log', 'c', 'cpp', 'h', and 'js'. If the file does not have any of these extensions, it will be flagged as an Extension Mismatch. You can also have the same extensions for different file types:

     ext txt,log ASCII(.*?)text
     ext txt,log ISO\-8859(.*?)text

In many cases, you will want to have the category regular expression to be very general and the extension regular expressions to be more specific. For example:

     category images image data
     ext jpg,jpeg,jpe JPEG image data
     ext gif GIF image data

This allows any image to be placed into the 'images' category, but requires a GIF to have '.gif'.

Steps to Adding Rule Sets

At some point, you will go through the 'unknown.txt' file and come across an entry for a known file type. For example:

       Macromedia Flash data, version 5
       Image: hda1.dd Inode: 11211395

The first step is to identify if an existing category rule set needs to be modified to accept this file type. In the above example, none can be found and the 'file' output is fairly specific so a new rule should be generated. The regular expression should be general enough so that similar Flash files are also identified.

     category video Macromedia Flash

The extension rule set is similar:

     ext swf Macromedia Flash

If other extensions are known, then they could be added. As this file type is fairly generic across platforms, it could be added to the 'default.sort' file (and emailed to me so that it is included in future versions...).

Isolating a Specific File Type

Some investigations focus on a specific file type and may require a custom rule set. The 'images.sort' file that comes with 'sorter' is an example of a rule set that only saves data about graphic images. Its contents include:

     category images image data
     ext jpg,jpeg,jpe JPEG image data
     ext gif GIF image data
     ext tif TIFF image data
     ext png PNG image data

     category images bitmap data
     ext bmp PC bitmap data

These entries were copied from the 'default.sort' file. Any 'file' output that does not match the above rules, will be placed in the 'unknown' category. As this is likely not desired, the '-U' flag can be given to ignore the 'unknown' category.

In summary, the steps are as follows:

  1. Identify the required rule sets for the target files or create custom ones.
  2. Create a configuration file with the rule sets from step 1.
  3. Run 'sorter' with -C and -U.
  4. Read the output.


'sorter' allows one to reduce the amount of data they need to process in an investigation. Every system is different and will have different types of files. The rule sets in 'sorter' allow one to customize its behavior to efficiently identify suspect data.

Updates to the standard rule sets will be provided with new Sleuth Kit releases. In between releases, feel free to send rules to the sleuthkit-users email list on Source Forge.

Copyright © 2003 by Brian Carrier. All Rights Reserved