Brian Carrier
carrier at sleuthkit dot org
Issue #19
March 15, 2005
In the last issue of the Informer, I mentioned that I was no longer going to make the text version because it took a lot of time to manually convert between the two. Alexander Ehlert e-mailed me to tell me that lynx can be used to dump an HTML page to text, so I will be using that for this and future issues (until I find a better document management system).
New versions of TSK and Autopsy are being released soon. TSK v2 has many new features including disk and split image support (as discussed later in this issue), autodetect file system types, and a new internal design. There were also several new features added to existing tools. The new disk_sreset tool was added to remove an HPA from an ATA disk and the diskstat tool was renamed to disk_stat as an attempt to make the tool names more clear. Autopsy has been updated to version 2.04 and it supports the new disk and split images.
The 5th Annual Digital Forensic Research Workshop (DFRWS) announced its Call for Papers in January. One of the areas that we are interested in is general tool design or testing, so it could be a good place to publish papers on tools based on TSK/Autopsy or other open source tools.
The Sleuth Kit Informer is looking for articles on open source tools and techniques for digital investigations (computer / digital forensics) and incident response. Articles that discuss The Sleuth Kit and Autopsy are appreciated, but not required. Example topics include (but are not limited to):
http://www.sleuthkit.org/informer/cfp.html
Version 2 of The Sleuth Kit (TSK) has (finally) added support for image files other than only raw partitions. TSK now supports raw disk and split images and future versions will support non-raw and compressed formats. This article describes how to use the new features and a high-level description of how it was implemented.
There are two new things that you must consider with TSK. One is the image file format and the second is the offset location of a specific partition or file system in a disk image. Accordingly, there are also two new command line flags. The -i flag is optional and is used to specify the image file format. If it is not given, then the tool will try to detect the format. The second flag is -o and it is used to specify the offset where a specific partition starts.
When specified, the image type argument is a list of one or more format types separated by commas. Currently, the argument needs only one type, but future versions may require multiple types. The currently supported types are raw and split and a basic image would use the arguments -i raw or -i split. In the future, the tools may support ACME Company's image file format with embedded data and if the image file is split among several files you would use -i acme,split.
The offset argument is, by default, in sectors. For example, to specify that the partition starts at sector 63 you would use -o 63. If you want to specify the offset using a different block size, then the block size can be given with the format of offset@blocksize. For example, to specify that the partition starts at block 1000 and each block is 2,048 bytes then you would use -o 1000@2048.
The location of the image file names for each command has not changed. If split images are used, then the names must be given in the sorted order. For example: fls image.dd.01 image.dd.02 image.dd.03 image.dd.04 .... You can use the * wildcard to specify a large number of files: fls image.dd.*.
Here is an example with all possible arguments:
# fls -i split -o 63 -f ntfs disk1.dd.*
Here is an example using the new autodetect features:
# fls -o 63 disk1.dd.*
If you have a raw partition image, then you can skip the -o argument (and take advantage of the new file system type autodetect feature):
# fls part1.dd
There is a new tool to help with the image file formats. The img_stat tool will display details about the image file. Example information includes the sector ranges of each split image file and other embedded data will be shown for future file formats. The -t flag can be used to determine the file format type.
For those interested in code-level information about the new image support, this section will fill you in. The new features were added by creating a new imgtools library. This library is used to read the data from the image. The file system code never knows which image format is used.
Before the file system is processed, the image file is opened using the img_open function. This function determines the format type and initializes an IMG_INFO data structure. That data structure is passed to the file system and media management code and is used to read all data. The imgtools library contains all of the code to read the image files.
Version 2 of TSK has finally introduced disk and split image support, which will make setting up a case much easier. This article has described the basics of using the new features and tools.
Often when analysing hard disk images, the image may be provided in a slightly different format to the expected partition dd image. This may happen because the image was split into multiple files, or it might be that the image was acquired using Encase (TM) which uses its own proprietary image file format.
Many forensic tools require the image to be in a specific format. For example previous versions of the Sleuthkit required the image to be an uncompressed partition images, for example that obtained using the dd command line::
dd if=/dev/hda1 of=image.dd
If the raw disk was used, i.e. /dev/hda, the investigator was forced to use dd to "slice" the original image into partitions depending on the partition table (Note that the 63 sector skip is normally found from the partition table, using sfdisk, mmls or a similar tool)::
dd if=disk_image.dd of=partition_image.dd bs=512 skip=63
If the original disk was very large to start with, this was a time consuming operation. It would be nice to have an abstraction layer which converts between the different formats of images (a partition image vs. a disk image) on the fly without requiring to copy the image again.
This functionality becomes even more desirable when considering the analysis of images which have been stored using compression. For example, the popular forensic package Encase(tm) stores images in a proprietary format called `The Expert Witness Compression Format`[1]. This format provides compression as well as splitting large images into manageable parts. By providing a transparent abstraction layer it is possible to enable any tool to automatically support the image format.
The PyFlag[2] forensic package used to have an IO Subsystem patch for the Sleuthkit which enabled it to operate on a number of different file formats. Although the Sleuthkit is an excellent tool, it soon became obvious that the same functionality was also required of other tools, like strings, sfdisk etc.
Modifying the source code of an application resulted in an increased amount of code maintenance required to retrofit the IO subsystem patch as each version of the Sleuthkit was released. The developers of PyFlag had to find a better way. Ideally the tool would have to involve no source code modification, and allow arbitrary programs to handle the supported file formats transparently.
The obvious solution to this problem was an abstraction layer based on library hooking techniques.
When a program wishes to perform an IO operation on a file (for example open, read or write the file), it is very rare that the program issue the kernel system call directly. In fact, most programs will call the C library's open(), read() and write() calls as required. Since most programs are dynamically linked rather than statically compiled, the linking of the C library code is done during run time, by the dynamic linker.
Most dynamic linker implementations (and in particular the GNU libc dynamic loader) allow a library to be loaded first, before loading other system libraries. Also, if a library provides a required symbol, the linker will stop searching for that symbol in other libraries. This property allows a library to "hook" a library function by simply masking the library function with a locally defined function.
An example serves to illustrate the technique. Assume we have the following program, written in pseudo C code::
main() { fd=open("somefile",O_RDONLY); read(fd,buffer,SIZE); close(fd); }
When this program is executed, it calls the C library's open function (which actually does the system call). The program then reads some data from the filehandle, by calling the C library's read function, and finally calls the library's close function to close the filehandle.
In the glibc implementation of the dynamic loader (The one used in most Linux systems), the environment variable LD_PRELOAD specifies to the linker that the named library should be loaded before any other libraries. If the desired symbol is present within the named library it will mask other functions with the same name present in other libraries.
In our case, we wish to hook the open(), read() and close() functions, hence we need to create a shared object (a library - we shall call it the hooker object) with these functions defined. After setting LD_PRELOAD to the location of the hooker object we have created, our library will trap all calls to the specified function::
External program ---> Hooker object ---> real libc functions
The result of this is that as far as the external program is concerned, it is operating on a simple partition image as would have been obtained using dd. In practice however, the hooker object is able to read more complex images, emulating a simple partition image to the external program.
The PyFlag iohooker tool implements this technique. Not only does it hook open, read, write etc, but also hooks the stream functions fopen, fread, fwrite etc. It currently supports many different external programs, such as dd, sfdisk, all Sleuthkit executables, strings and many more.
IOHooker is distributed in two components. The main component is a shared object called libio_hooker.so. In order to control this object, environment variables are set by a wrapper program: iowrapper.
For the purposes of demonstration we download the `binary version of PyFlag`[3]. We untar the distribution in our home directory, and change directory into it.
The first step, prior to being able to use the iowrapper is to set the LD_LIBRARY_PATH environment variable. This is required to allow the dynamic linker to find libio_hooker.so. If we fail to set this properly, the linker can not run the iowrapper::
~/pyflag$ ./bin/iowrapper -h ./bin/iowrapper: error while loading shared libraries: libio_hooker.so: cannot open shared object file: No such file or directory
After setting the LD_LIBRARY_PATH environment variable, we are able to run the iowrapper normally::
~/pyflag$ export LD_LIBRARY_PATH=`pwd`/libs/ ~/pyflag$ ./bin/iowrapper This program wraps library calls to enable binaries to operate on images with various formats. NOTE: Ensure that libio_hooker.so is in your LD_LIBRARY_PATH before running this wrapper. Usage: ./bin/iowrapper -i subsys -o option prog arg1 arg2 arg3... -i subsys: The name of a subsystem to use (help for a list) -o optionstr: The option string for the subsystem (help for an example) -f wrapped filename: All wrapped filenames will start with this string. This is useful for programs that need to open other files as well as the target file (for example /usr/bin/file needs to open magic files as well). Loading library now for hooking
The final message "Loading library now for hooking" confirms that the hooker object is properly initialised and ready. Let us first check to see what IO Subsystems are supported by the iowrapper::
~/pyflag$ ./bin/iowrapper -i help Loading library now for hooking Available Subsystems: standard - Standard Sleuthkit IO Subsystem advanced - Advanced Sleuthkit IO Subsystem sgzip - Seekable Gzip format ewf - Expert Witness Compression format raid - Raid 5 implementation Unhandled Exception(IO Error): No such IO subsystem: help
Each subsystem requires specific options that make sense for it. The Advanced filesystem, allows users to specify arbitrary offsets, as well as multiple split image sets. We can get a more detailed explanation of these options::
~/pyflag$ ./bin/iowrapper -i advanced -o help Loading library now for hooking Advanced io subsystem options offset=bytes Number of bytes to seek to in the image file. Useful if there is some extra data at the start of the dd image (e.g. partition table/other partitions) file=filename Filename to use for split files. If your dd image is split across many files, specify this parameter in the order required as many times as needed for seamless integration A single word without an = sign represents a filename to use
For our first example, we use the Sleuthkit's fls tool to list the files present in partition 6 of a hard disk image. The fls tool does not provide the option of selecting an offset into the image for the start of the filesystem, hence we need to wrap it. First we calculate the offset where the partition starts::
/pyflag# sfdisk -uS -l /tmp/test.dd Disk /tmp/test.dd: cannot get geometry Disk /tmp/test.dd: 0 cylinders, 0 heads, 0 sectors/track read: Inappropriate ioctl for device Warning: The partition table looks like it was made for C/H/S=*/255/63 (instead of 0/0/0). For this listing I'll assume that geometry. Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /tmp/test.dd1 63 96389 96327 de Dell Utility /tmp/test.dd2 * 96390 19647494 19551105 7 HPFS/NTFS /tmp/test.dd3 19647495 58733639 39086145 c W95 FAT32 (LBA) /tmp/test.dd4 58733640 117210239 58476600 5 Extended /tmp/test.dd5 58733703 59328044 594342 82 Linux swap /tmp/test.dd6 59328108 117210239 57882132 83 Linux
The start of partition 6 is at 59328108 sectors * 512 bytes = 30375991296. We can therefore use the wrapper to force fls to read the file system located at that offset::
~/pyflag$ ./bin/iowrapper -i advanced -o offset=30375991296,filename=/tmp/test.dd fls \ -f linux-ext3 foobar Set file to read from as /tmp/test.dd d/d 11: lost+found d/d 32769: etc l/l 12: cdrom d/d 131073: var ... d/d 3211272: opt d/d 3555336: initrd l/l 16: vmlinuz
Note that as far as fls is concerned it is opening and reading the file foobar. It does not realise that foobar does not exist, since the wrapper provides it with valid data.
For the next example, we used Encase(tm) to create an evidence file of a floppy disk. The file command is unable to determine what is stored inside the image, due to it being encoded in the proprietary EWF format::
~/pyflag$ file test.e01 test.e01: data ~/pyflag$ hexdump -C test.e01 | head 00000000 45 56 46 09 0d 0a ff 00 01 01 00 00 00 68 65 61 |EVF...ÿ......hea| 00000010 64 65 72 00 00 00 00 00 00 00 00 00 00 b2 00 00 |der..........²..| 00000020 00 00 00 00 00 a5 00 00 00 00 00 00 00 80 00 10 |.....¥..........|
Lets wrap the hexdump program to show the contents of the raw image::
~/pyflag$ ./bin/iowrapper -i ewf -o filename=test.e01 hexdump -C test.e01 | head 00000000 eb 3c 90 4d 53 44 4f 53 35 2e 30 00 02 01 01 00 |ë<.msdos5.0.....| 00000010 02 e0 00 40 0b f0 09 00 12 00 02 00 00 00 00 00 |.à.@.ð..........| 00000020 00 00 00 00 00 00 29 fc 02 29 08 4e 4f 20 4e 41 |......)ü.).no na| 00000030 4d 45 20 20 20 20 46 41 54 31 32 20 20 20 33 c9 |me fat12 3É|
From this hexdump it looks like the image is that of a FAT 12 floppy disk. To confirm we can run the file command over the image. Since file opens other files other than the image (it needs to open the magic file), we need to prevent the hooker from hooking those other files (otherwise when the file program tries to open its magic file, it will be getting the image instead). To this end we can use the -f flag to restrict hooking only to files of a given name::
~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 file test.e01 test.e01: x86 boot sector, code offset 0x3c, OEM-ID "MSDOS5.0", root entries 224, sectors 2880 (volumes <=32 mb) , sectors/fat 9, serial number 0x82902fc, unlabeled, fat (12 bit)
Sleuthkit's fls can be used on this Encase image::
~/pyflag# ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 ./bin/fls -f fat12 test.e01 r/r 9: gunzip.exe r/r 11: Hiew.exe r/r 12: tar.exe r/r 22: cygwin1.dll ..
Finally we wish to extract the Encase image into a standard dd image. We wrap dd and redirect the output to a file::
~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 dd if=test.e01 > /tmp/test.dd
Sometimes we wish to analyse a live unix system remotely. This may be so we can quickly see if the system is compromised, without having to acquire the entire image first. We can use our forensic tools to examine the remote raw device by using the remote IO subsystem.
.. note:: This type of analysis is quite fragile because the system is still live, and using its file system. The forensic tools are accessing the raw device while it is being modified which makes it susceptible to race conditions. For example, if a file is removed just as the forensic utility is accessing its directory inode inconsistant data may be obtained.
The ramifications of this is that forensic tools may crash, or provide inconsistant results. It is impossible, however, for the IO subsystem to alter the live system in any way (since the raw device is opened as read only).
One of the common problems with accessing a remote system is authentication and encryption. Access to the raw device over the network could easily lead to a root compromise by disclosing sensitive system information (e.g. the shadow file). The problem of authentication and encryption is best left to dedicated programs, such as Secure Shell (ssh). This is the approach taken by the remote access IO subsystem. The only requirements on the live system are an ssh server, and the remote_server program (which may be compiled staticly).
These are the steps required to access remote raw devices over the network:
The following is an example of a session which might be run on a remote target machine::
~/pyflag$ ./bin/iowrapper -i remote -o host=target,\ server_path=/path/to/remote_server,device=/dev/hda \ mmls -t dos foo DOS Partition Table Units are in 512-byte sectors Slot Start End Length Description 00: ----- 0000000000 0000000000 0000000001 Primary Table (#0) 01: ----- 0000000001 0000000062 0000000062 Unallocated 02: 00:00 0000000063 0000096389 0000096327 Dell Utilities FAT (0xde) 03: 00:01 0000096390 0019647494 0019551105 NTFS (0x07) 04: 00:02 0019647495 0058733639 0039086145 Win95 FAT32 (0x0C) 05: 00:03 0058733640 0117210239 0058476600 DOS Extended (0x05) 06: ----- 0058733640 0058733640 0000000001 Extended Table (#1) 07: ----- 0058733641 0058733702 0000000062 Unallocated 08: 01:00 0058733703 0059328044 0000594342 Linux Swap / Solaris x86 (0x82) 09: 01:01 0059328045 0117210239 0057882195 DOS Extended (0x05) 10: ----- 0059328045 0059328045 0000000001 Extended Table (#2) 11: ----- 0059328046 0059328107 0000000062 Unallocated 12: 02:00 0059328108 0117210239 0057882132 Linux (0x83)
We can now list the contents of the windows partition::
~/pyflag$ ./bin/iowrapper -i remote -o host=target,\ server_path=/path/to/remote_server,device=/dev/hda,\ offset=0000096390s fls -f ntfs foo d/d 12763-144-4: Documents and Settings d/d 6672-144-3: DRIVERS d/d 6941-144-6: I386 r/r 6915-128-3: IO.SYS d/d 62628-144-5: LDIR r/r 6916-128-3: MSDOS.SYS d/d 16844-144-1: My Music r/r 6671-128-3: NTDETECT.COM r/r 6670-128-3: NTLDR d/d 13231-144-4: Program Files ...
In the above analysis we use the following parameters:
host The host we should try to log on to. server_path The path to the remote_server program. This program must reside on the remote machine. device The raw device to export offset An offset to use on the remote device. This can be speficied in sectors (s), kilobytes (k) or meganbytes(m) depending on the suffix.
.. note:: This analysis would easily reveal to us if there are hidden files or directories, even in cases where kernel level rootkits are installed. This is because most kernel level rootkits trap system calls accessing files on the filesystem, but do not filter access to raw devices. Since fls is reading the filesystem structures on the raw device, it is independant of the kernel's filesystem driver or filesystem related system calls.
Although it is conceivable that rootkits can filter the raw device to hide files, this will dramatically increase the complexity of the rootkit.
Library hooking is a powerful technique which enables a wrapper to be inserted between an arbitrary executable, and the image. PyFlag has developed an image abstraction layer which allows arbitrary programs to automatically support a variety of forensic image formats transparently.
The remote IO subsystem allows for the remote access and analysis of raw devices by forensic tools, making it possible to detect some kernel level rootkits remotely.
[1] The Expert Witness Compression Format: http://www.asrdata.com/SMART/whitepaper.html
[2] PyFlag: http://pyflag.sourceforge.net/
[3] binary version of PyFlag: http://pyflag.sourceforge.net/Downloads/index.html