The Sleuth Kit Informer

`The Sleuth Kit Informer`

http://www.sleuthkit.org/informer
http://sleuthkit.sourceforge.net/informer

Brian Carrier

Issue #1
February 15, 2003

Introduction
What's New?
A High-Level Design Overview of Autopsy and TASK
Placing HTML in Jail

Introduction

Welcome to the first issue of The Sleuth Kit Informer. My goal is to provide monthly updates to increase awareness, knowledge, and documentation about TASK, Autopsy, and other related tools. Topics will range from practical uses of the tools to low-level tool design.

As this is the first issue, it is going to start with the basics of TASK and Autopsy. The first section covers the history, high-level design, and interaction of TASK and Autopsy. The second section describes how Autopsy sanitizes HTML pages from a suspect system in order to protect the analysis station. The March issue will examine the new case management features of Autopsy 1.70. Please feel free to send comments and suggestions (carrier at sleuthkit.org).

What's New?

On January 31, 2003, the latest versions of TASK and Autopsy were released. TASK 1.60 included two new tools and some bug fixes. New tools include:

hfind: Allows one to perform hash database binary search lookups from databases created by md5sum and the NIST NSRL.
sorter: Organizes the allocated and unallocated files in a file system image into categories and utilizes hash databases.
ifind: New features that will map the path of a file to its meta data address.

Two major bugs were fixed as well with the NTFS code. One reported incorrect time values on some files and the other caused 'MFT Magic' errors. It is recommended that all users upgrade.

Autopsy 1.70 contains a much-improved graphical interface. With some help from Samir Kapuria, Autopsy now has less of an HTML 101 look to it. Check out the screen shots on the website for more details. Autopsy also has more advanced Case Management features including multiple host support with multiple time zones and clock skews (which will be discussed in the next issue) and integrates the new tools from TASK.

A High-level Design Overview of Autopsy and TASK

History

The design of TASK and Autopsy is based on the designs of their predecessors, namely The Coroner's Toolkit (TCT) and TCTUTILs. To best understand TASK and Autopsy, it is important to understand their roots, so their history will be given first.

In 2000, Dan Farmer and Wietse Venema released the first version of The Coroner's Toolkit (TCT). TCT is a collection of UNIX-based command line tools for collecting data from a live system and analyzing FFS and EXT2FS file systems.

The data acquisition tools in TCT include 'grave-robber' and 'pcat', which allow one to copy important data from a live system. File system analysis tools include 'ils' and 'icat' that allow one to analyze the inode structures of the UNIX file systems and the 'mactime' tool makes timelines of file activity. The 'unrm' tool extracts the unallocated data blocks from the file system and the 'lazarus' tool analyzes "chunks" of data and guesses the content type to help recover deleted files.

TCT was groundbreaking since the general public finally had access to quality digital forensic tools that were free and open source. It also provided UNIX investigators with native forensic tools. To make a complete investigation easier though, additional tools were needed. One limitation of TCT was that the file system tools only operated at the block and inode layer. Therefore, there was no notion of file and directory names during a post-mortem analysis. Another limitation was platform dependence. The analysis system had to be the same OS as the system being analyzed. For example, if a Solaris system was being investigated then the analysis platform also had to be Solaris, and not Linux for example. Lastly, support for common file systems such as FAT and NTFS did not exist.

To fix the first limitation, no file name support, I developed TCTUTILs. TCTUTILs was an add-on to TCT and provided tools that would list the file and directory name structures in the file system and map structures in different layers of the file system. For example, identifying which file has allocated a given inode and which inode has allocated a given block.

At the same time, Autopsy 1.00 was released. Autopsy was developed because TCT and TCTUTILs were all command line and were tedious when analyzing an entire file system image. Autopsy was HTML-based and would execute the appropriate TCT and TCTUTILs commands, parse the output, and present in an HTML browser.

At @stake, development of the tools continued and The @stake Sleuth Kit (TASK) was released. TASK 1.00 merged the file system analysis tools in TCT and TCTUTILs into one package and added new features. Mainly, platform dependence was removed so that, for example, one could analyze a Solaris system on any UNIX system that TASK compiled on. The second feature of version 1.00 was support for the FAT file system. Version 1.50 added support for the NTFS file system and the recent release of 1.60 added two new tools for using hash databases and sorting files based on the file type.

Autopsy 1.50 was upgraded to support the platform independence of TASK 1.00, Autopsy 1.60 was upgraded to support TASK 1.50, and the new Autopsy 1.70 was upgraded to support TASK 1.60 and got a major facelift.

Several other tools have been based on TCT. Knut Eckstein has ported TCT and TCTUTILs to work on HP-UX. Rob Lee wrote 'mac-daddy', which is variation of 'grave-robber', but only makes file activity timelines. I wrote 'mac-robber', which is similar to 'mac-daddy', but written in C instead of Perl.

TASK Design

As previously mentioned, the design of the file system tools in TASK are based on the original design of TCT. The basic principle of the tools is to manually process the file system. Typically, when a file system is mounted, the Operating System (OS) reads the different on-disk structures itself. With TASK, and other forensic tools, this is done by the tool itself and native OS support is not needed. Therefore, even though Solaris does not have NTFS support in its kernel we can still analyze an NTFS image. File systems flag certain data that should not be displayed to a user under normal circumstances. Since we want to see all possible data, even flagged data will be displayed.

The goal of TASK is to provide the investigator with all data in the file system. TASK is a data extraction tool that analyzes a large file system image and outputs specific parts of it in an easier to use format (since most people can't manually parse the contents of a 1024-byte NTFS MFT structure). In some cases, you will get more information than you need, but it is all presented because someday you may need it.

To provide access to all data, a layer-based model is used.

Data Unit Layer: This is where the file and directory content is stored. In general, disk space is allocated from here when the OS needs it. The data units are a fixed size and most file systems require it to be a power of 2, 1024- or 4096-bytes for example. The data units are called different things in different file system types, such as fragments and clusters, but they serve the same function.
Meta Data Layer: This is where the descriptive data about files and directories are stored. This layer includes the inode structures in UNIX, MFT entries in NTFS, and directory entry structures in FAT. This layer contains information such as last access times, permissions, and pointers to the data units that were allocated by the file or directory. This layer completely describes a file, but it is typically given a numeric address that is difficult to remember.
File Name Layer: This is where the actual name of the file or directory is saved. In general, this is a different structure than the meta data structure. The exception to this is the FAT file system. The file names are typically saved in the data units that the parent directory allocates. The file name structure contains the name and the address of the corresponding meta data structure.
File System Layer: This is where all of the other file system specific data are stored. For example, how big each data unit is and how many inode structures there are. This layer contains the values that identify how this file system is different than another file system of the same type. Management structures, such as bitmaps are also in this category.

There are currently 11 file system tools that operate at these levels. Each has a name based on what layer it is in and its function in that layer. The first letter of the name corresponds to the layer it belongs to:

First Letter Layer

d Data Unit

i Meta Data

f File Name

fs File System

First Letter	Layer
d	Data Unit
i	Meta Data
f	File Name
fs	File System

The ending of the tool name corresponds to its function:

ENDING FUNCTION

ls Lists information in the layer

cat Displays content in the layer

stat Displays details about a given object in the layer

find Maps other layers to its layer

calc Calculates "something" in the layer

ENDING	FUNCTION
ls	Lists information in the layer
cat	Displays content in the layer
stat	Displays details about a given object in the layer
find	Maps other layers to its layer
calc	Calculates "something" in the layer

That gives us the following 11 tools:

Data Unit: dls, dcat, dstat, dcalc
Meta Data: ils, icat, istat, ifind
File Name: fls, ffind
File System: fsstat

New to the 1.60 version of TASK is the 'hfind' tool. This allows one to do fast lookups from a hash database. Details of it will be covered in a future issue. All tools that work on hash databases will begin with an 'h'.

The 'mactime' tool creates timelines of data. It uses data that is generated by the extraction tools, namely 'fls' and 'ils'. It does not generate new data, it just presents the data in a format that is more useful in some circumstances. The 'mactime' tool has changed from the TCT version since it does not read the local password and group files and also displays the date of the week and the inode value for each entry.

Also included with TASK are some application-level tools. For hashing, the MD5 and SHA-1 binaries are given. The latest version of 'file' is also given so that file types can be quickly identified.

The 'sorter' tool uses the extraction tools, such as 'fls' and 'icat', and application level tools, such as 'file' and 'md5', to organize a file system image by the content type of the files. For example, all graphical images are identified. The 'sorter' tool itself knows nothing about file systems, it just knows how to run the TASK tools.

In conclusion, TASK is modular and provides one with access to all layers of the file system. They are useful when learning about different file system structures. Future development work will focus on integrating the tools and reducing the amount of data for certain situations. An example of this is the recent 'sorter' tool.

Autopsy Design

Autopsy has been designed to be an efficiency tool to TASK (and previously TCT and TCTUTILs). Autopsy knows nothing about file system structures or hash database lookup algorithms. It just knows what tools exist, what they need for input, how to parse the output, and how to display the output with HTML hyperlinks.

In its early days, Autopsy was a CGI script that required a separate HTTP-server to run, Apache for example. But, before the first public release, a mini-server was built into it. The internal design of Autopsy still reflects its CGI roots. Autopsy can be connected to by any HTML browser and a random cookie value in the URL and an access control IP address are used to restrict access.

Many people ask why Autopsy is separate from TASK. The main reason is flexibility. There will always be circumstances when a GUI does not do something that you need it to. Therefore, the command line tools are separate so that one is not restricted to using a GUI.

For example, a recent analysis involved a file system image that was not complete and many of the automated functions that Autopsy, and other Windows-based tools, provided would generate an error and stop. By using the command line tools though, we were able to restart when the tools aborted and collect additional data.

The separate tool design also allows for a different interface to be developed for the tools. Similarly, other command line tools, such as foremost, can (and will) be integrated.

Autopsy provides the user with access to the same layers as TASK uses and also gives the higher-level functionality, such as timelines and file type sorting. The case management details will be covered in the next issue.

In conclusion, Autopsy tries to make life easier by performing the tedious operations that TASK can require. Take, for example, the case when a keyword is found in a given fragment. Autopsy will execute the needed mapping tools to identify the file name that allocated it and provide the allocation status of the fragment automatically. Future versions will integrate additional tools and make the process more efficient.

Placing HTML In Jail

One of the key, and best known, concepts for digital forensics and incident response is to never trust the suspect system. For this reason, trusted CDs of binaries are used when the system is still running and the system is typically shutdown and rebooted into a trusted kernel before an acquisition is performed. The same logic is applied to executable files on suspect systems. They are only run in controlled environments so that they do not destroy the analysis station.

This concept has also been applied in Autopsy when viewing HTML files, from a browser cache for example. HTML files can do two things to cause havoc during an investigation. First, they could contain malicious scripts (java script or java for example) that could cause harm to a vulnerable analysis system. Secondly, the HTML could cause a browser to connect to an external site for an image or other file. This could alert someone about the investigation (although in an ideal world, all forensic labs are air-gapped from the Internet).

By default, Autopsy will not interpret file content. For example, when an HTML file is selected, then the ASCII tags and text will be displayed and not the formatted document (even though you are using an HTML browser). It does this by setting the content-type to text instead of html.

For certain file types, namely HTML and images, Autopsy will allow the user to view the interpreted data using a "view" link. This link will open the interpreted file content in a "cell". The "cell", by default, will parse the HTML code and make the scripts and links ineffective. This is performed by modifying the HTML in the following ways:

SRC= is changed to SRC=AutopsySanitized
HREF= is changed to HREF=AutopsySanitized
<script is changed to <AutopsySanitized-script
BACKGROUND= is changed to BACKGROUND=AutopsySanitized

The web-server part of Autopsy is configured to replace image requests that have 'AutopsySanitized' with a place holder graphic. This allows the investigator to see that an image should exist in that location without connecting to an external site. Also, if the investigator clicks on a URL, Autopsy will report that it does not allow the investigator to follow external links while in the cell. After verifying the content of the HTML page, the user can exit the cell by selecting the "Normal" button and it can be exported with the "Export Contents" button.

It is good practice to disable the scripting languages in HTML browsers on analysis stations. Autopsy will warn you if your browser has scripting enabled. The sanitizing that Autopsy performs when viewing HTML pages adds an additional layer of protection in case scripting was accidentally left on, there is a security vulnerability in browser, or the system is connected to the Internet. After the investigator has identified the page as safe, it can be viewed in its native format.

Contents