Brian Carrier
Issue #5
June 15, 2003
This issue of The Sleuth Kit Informer is different than previous issues, which focused on how to use tools. This issue dives into the architectural details of one of The Sleuth Kit tools and documents the procedures it uses. This allows a non-programmer to understand the process and allows others to read the source code along with the written description.
Two of the Daubert Guidelines for scientific evidence in the US courts are for published procedures of the theories and techniques used and that there be general acceptance of them. It is still under debate if digital evidence falls under scientific evidence, but it is important to start documenting the procedures used with digital forensics. This allows for a better general understanding and consistency between different tools.
As always, comments and feedback are welcome (carrier at sleuthkit.org).
The Sleuth Kit 1.62 and Autopsy 1.73 were released on June 10, 2003. The Sleuth Kit had some bug fixes and 'mactime' can now export data in a comma delimited format (see Did You Know?). Autopsy has a new mode called "Event Sequencer" that allows one to sort events from an incident to help identify what the attacker did. Autopsy also saves the search results to a file so that it can be easily recalled.
Have you ever tried to attach a timeline from 'mactime' to an incident report? Or, make any graphs of file activity? It is much easier now with the '-d' flag. Using the '-d' flag forces 'mactime' to output the timeline with the fields separated by a comma instead of white space and each line has its own time entry. This makes it easy to import into a spread sheet, delete the columns that are not needed and attach to a report.
# mactime -b body -d > tl.txt
If you want to graph the activity on a daily or hourly basis to find trends, then use the '-i' flag to get a summary index. With the '-d' flag, the data can be imported into a spread sheet and a bar graph made.
# mactime -b body -d -i day sum.day.txt > tl.txt
or
# mactime -b body -d -i hour sum.hour.txt > tl.txt
In the past two issues of The Sleuth Kit Informer we have shown how to use the 'sorter' tool from the command line and how to create custom rule sets for new file types or to identify only specific file types. The last part in the series on the 'sorter' tool focuses on its internal design.
The goal of documenting the design in this level of detail is to allow users who do not read Perl to learn exactly how the program works. This will allow "acceptance" of the procedure (as noted in the Daubert Guidelines for scientific evidence) and general knowledge that can be applied to other analysis techniques. For those who have forgotten what the 'sorter' tool actually does, refer back to Issue #3 of The Sleuth Kit Informer. This description applies to 'sorter' that is distributed with The Sleuth Kit v1.62 (June 10, 2003).
The first steps involve setting up the environment for the tool. This includes reading the arguments from the command line and verifying that they are compatible. The second major step verifies that the needed binaries from The Sleuth Kit can be found.
The configuration files are processed next. If the exclusive configuration file flag, '-C', was given then only that file is loaded. Otherwise, the 'default.sort' file is loaded first, followed by the operating system file (windows.sort for example), followed by a local customization of the operating system (windows.sort.lcl for example). If a file was specified with the '-c' flag, it is loaded last. As will be shown, the configuration file loaded last will override conflicts with previous files.
The 'read_config()' procedure loads the configuration files. For categories, the category name is changed into lowercase and examined to see if it is one of the reserved names. A hash variable is used to store the category information (%file_to_cat). The key to the hash variable is the regular expression for the 'file' output and the value is the category name. If the regular expression already exists in the hash variable, a warning is produced and the existing category is replaced with the new category. For example:
$file_to_cat{'image data'} = 'images'
For extension rules, the extensions are first changed into lowercase. Similar to the category rules, the extension information is also stored in a hash variable (%file_to_ext). The 'file' regular expression is the key to the hash variable and the value is a list of valid extensions separated by commas. If there are already extensions for the regular expression, the new ones are added to the end of the list. For example:
$file_to_ext{'JPEG image data'} = 'jpg,jpeg,jpe'
Lastly, the category files are opened and initialized if the '-l' listing flag was not specified. The file handles are stored in a hash variable that has the category name as the key.
After the tool has been setup, it is time to process and sort the data. A loop cycles through each image that is specified on the command line, sets up some static values, and calls the 'analyze_img()' method.
The 'analyze_img()' method runs 'fls -rp' on the image to get a listing of all allocated and deleted files with the full path and the corresponding meta data address. The file listings are processed and any entry that is not a regular file or has a type of "-" is ignored (such as directories and devices). The 'analyze_file()' method is then called for each file.
After the 'fls' output is processed, 'ils' is run to identify the unused meta data structures. These are sometimes called "Lost Files". The 'ils' tool is given the '-m' flag so that part of the name is found for NTFS and FAT files. Any files that have a size of 0 or missing definitions are ignored. Valid files are processed with the same 'analyze_file()' method that was called for the 'fls' output.
After all of the files and unallocated meta data structures have been sorted, the open file handles are closed. A summary of the number of categories and files in each category are also displayed.
This has given a fairly low-level overview of the process that 'sorter' uses. The ordering of analysis steps is important because, for example, if the exclude hashes were checked before the alert hashes then a hash that happens to be in both would not be alerted. This was likely not good reading for a summer day on the beach, but was hopefully informative. Suggestions for how to improve this process are welcome.
The Sleuth Kit: http://www.sleuthkit.org/sleuthkit
The Sleuth Kit Informer Issue #3: http://www.sleuthkit.org/informer/sleuthkit-informer-3.html
The Sleuth Kit Informer Issue #4: http://www.sleuthkit.org/informer/sleuthkit-informer-4.html