The Sleuth Kit Informer
                                      
                     http://www.sleuthkit.org/informer 
                                      
                               Brian Carrier
                        carrier at sleuthkit dot org
   
                                 Issue #23
                                May 17, 2006
     _________________________________________________________________
   
Contents

     * Introduction
     * What's New?
     * Call For Papers
     * Expert Witness and AFF Support 
     * An Introduction To The libewf Expert Witness Library 
     _________________________________________________________________
   
Introduction

   In the 23rd issue of the Informer, we look at some of the new features
   in version 2.04 of The Sleuth Kit. Specifically, we focus on new image
   file formats. The first article is about how to use TSK with the
   formats and the second is about the library that was developed by
   Joachim Metz and Robert-Jan Mora to support the Expert Witness Format.
     _________________________________________________________________
   
What's New?

   New versions of The Sleuth Kit (TSK) and Autopsy were released on May
   11, 2006. TSK 2.04 included bug fixes and new features including
   support for the EWF and AFF image file formats, ISO 9660 file system,
   and an updated error handling system. There is also a new 'img_cat'
   tool to display the raw contents of an image file. Autopsy 2.07 adds
   support for the new TSK file formats and file systems. It also adds a
   hex view for files.
   
   The DFRWS 2006 Challenge was released and it is on file carving. The
   data set contains files and fragments and the challenge is to develop
   tools and techniques to recover as many files as possible with few
   false positives.
   
   http://www.dfrws.org/2006/challenge/
     _________________________________________________________________
   
Call For Papers

   The Sleuth Kit Informer is looking for articles on open source tools
   and techniques for digital investigations (computer / digital
   forensics) and incident response. Articles that discuss The Sleuth Kit
   and Autopsy are appreciated, but not required. Example topics include
   (but are not limited to):
     * Tutorials on open source tools
     * User experiences and reviews of open source tools
     * New investigation techniques using open source tools
     * Open source tool testing results
       
   More details can be found at:
   http://www.sleuthkit.org/informer/cfp.html
     _________________________________________________________________
   
Expert Witness and AFF Support

   By: Brian Carrier 
   
   The 2.04 release of The Sleuth Kit (TSK) included support for two new
   image file formats. This integration was relatively straight forward
   because of the designs of the image layer in TSK [1] and the libraries
   that were developed to support the formats. Articles in this and
   future issues of the Informer will describe the details of the image
   formats and the libraries. This article describes the basics and how
   to use the formats in TSK and Autopsy.
   
  Expert Witness Support
  
   The Expert Witness format (EWF) is the basis of the image file format
   created by EnCase and other tools. Support for it was added to TSK
   using libewf [2] by Joachim Metz and Robert Jan Mora of Hoffmann
   Investigations, which is described in the next article in this issue
   of the Informer.
   
   EWF files have a signature in the header that can be used to identify
   them and therefore TSK can automatically detect EWF files when they
   are given as input and the image file format type is not given. If you
   want to specify the image file format, then provide the '-i ewf' flag
   to the command line. The normal '-o' flag can still be used to specify
   sector offsets in the image file. For example, if you wanted to
   specify the image file type and list the files in the partition at
   sector 63 then you would use:
    # fls -i ewf -o 63 disk1.E01

   The libewf library will convert the sector offset to the correct
   location in a compressed image file. Using EWF with TSK will be no
   different from using raw files.
   
   The acquisition of large disks may cause multiple EWF files to be
   created if the acquisition tool had a maximum file size set. For
   example, it is common to use a maximum file size of 2 GB if the files
   are going to be stored on a FAT32 drive. TSK and libewf support
   segmented EWF files, but all of the files must be specified on the
   command line. For example, you could use either of the following to
   list the files in the file system that starts in sector 63:
    # fls -o 63 disk1.E01 disk1.E02

    # fls -o 63 disk1.E0?

   The 'img_stat' tool for an EWF file currently displays the size of the
   file and its MD5 hash value. Future releases will likely display
   additional metadata that are stored in the file.
   
  AFF Support
  
   Support for the Advanced Forensic Format (AFF) was provided by AFFlib
   [3] from Simson Garfinkel and Basis Technology (disclaimer: I work for
   Basis Technology). More details than what are included in this article
   will be provided in a future issue of the Informer and the AFFlib
   website describes the image format in detail.
   
   AFF is an image file format that has been designed to be open and
   extensible. Its data structures are documented and the format consists
   of a collection of name and value pairs, which are called segments.
   Some segments contain the evidence data and other segments contain
   metadata. The value in each of the segments can be compressed.
   
   There are three different types of AFF files: AFF, AFD, and AFM. All
   of which use the same data structures. The AFF format uses a single
   file to store all of the segments. The AFD format can have multiple
   files and they are all stored in a directory, whose name has an
   extension of '.afd'. This format is used when a maximum file size is
   needed and each of the files contains a collection of segments. The
   AFM format stores the data in one or more raw files and stores the
   metadata in a separate file that has the standard segment
   structure.  The AFM format allows metadata to be stored about the
   data and it allows the data to be imported into tools that support
   the raw format.  With AFM, there is a file with a '.afm' extension
   and the raw files have the same base name and extensions that are
   numbers.

   TSK can automatically detect the types of AFF, AFD, or AFM files.
   Or, you can specify them on the command line with '-i aff', '-i
   afd', or '-i afm'. When any of the three types are used, only one
   file name must be specified on the command line. With AFF, only the
   AFF file name must be given. With AFD, only the directory name must
   be given and the tools will find the files in the directory. With
   AFM, only the AFM file must be given and the tools will locate the
   associated raw files. The 'img_stat' tool will display the MD5 and
   SHA-1 values, the image size, and many other segments that could be
   defined by the 'aimage' tool, which is the acquisition tool in
   AFFlib.

  Conclusion

   Support for AFF and EWF should be transparent to users because TSK
   can automatically detect the image format types. Because of the work
   of Joachim, Robert, Simson, and others, TSK now can be used in more
   situations. Future versions of TSK will include tools to verify the
   integrity of the image files and the 'img_stat' tool will display
   more metadata from the files.

  References

   [1] The Sleuth Kit Informer Issue #19:
   http://www.sleuthkit.org/informer/sleuthkit-informer-19.html

   [2] libewf: http://www.uitwisselplatform.nl/projects/libewf/

   [3] AFFlib: http://www.afflib.org
     _________________________________________________________________

An Introduction To The libewf Expert Witness Library

   By: Joachim Metz, Robert-Jan Mora
   
   Hoffmann Investigations <forensics at hoffmannbv.nl> [1]
   
  Description
  
   The Expert Witness Compression format (EWF) is used by EnCase
   (Guidance) and FTK (AccessData) to create bit-copies. EWF currently is
   the de-facto (widely used) evidence file standard used within the
   forensic community. This file format itself is not fully documented
   and could not be used natively within the Sleuth Kit.
   
   The Sleuth Kit now includes support for reading Expert Witness Format
   (EWF) image files. This was accomplished using libewf, which is an
   open source C library that we developed. In addition to the library,
   the libewf distribution also comes with tools to export data from EWF
   files (ewfexport), show the metadata stored in the EWF files
   (ewfinfo), and verify the integrity of the EWF files (ewfverify). This
   article describes the history of the library, its capabilities, and an
   overview of how to use it in a program.
   
  A short history
  
   In 2005, Michael Cohen used the Expert Witness format specification
   [4] to create a library called `iowrapper' that allowed PyFlag [7] to
   read EnCase 4 evidence files. The 'iowrapper' tool uses library
   hooking techniques to allow programs to access data in image file
   formats [6]. The 'iowrapper' tool supports EnCase 5 evidence files
   only when the default chunk size is used. Although 'iowrapper' was a
   big breakthrough we preferred native EWF support within the Sleuth
   Kit. Therefore, we created libefw for current and future open source
   tools.
   
   In the beginning, libewf was developed in C and for i386 Linux
   machines. We initially used Michael Cohen code as a reference, but
   early on decided to do a complete rewrite of his implementation.
   
   It became apparent that both the Expert Witness specification and the
   code in 'iowrapper' were not complete for current versions of the
   format. For example, image files created by EnCase 5 contained new
   metadata that was not previously documented.
   
   In our tests we analyzed different versions of EWF evidence files. We
   found differences regarding the original specification. Therefore we
   concluded that the original specification by Andrew Rosen [4] was
   outdated.
   
   The more noteworthy differences are described in this article. A
   document containing the detailed specification of the format has been
   released on the project site [3]. This document contains findings and
   assumptions about the EWF format. Please note that this is a working
   document and may contain errors. If you find errors or if you have
   additional information regarding the format, please send us an e-mail.
   
  Specification
  
   EWF can span the bit-copy over multiple files, called segment files
   (E01, E02, etc). The data within the file is stored in little endian
   ordering. The format uses zlib for compression.
   
   Each segment file starts with an 8 byte signature:
   0x455646090d0a0ff00. The first three bytes of this signature are "EVF"
   in ASCII. This signature is unique to EWF files.
   
   After the signature there is a part that contains the segment file
   number. Within the files there are different sections specified. Each
   section starts with a header, which will be referred to as section
   start for clarity. The section start contains fields that represent:
     * the section type
     * the size of the section (64 bit)
     * the offset of the next section (64 bit)
     * a cyclical redundancy check (CRC) of the data within the section
       start
       
   These sections contain different types of information regarding the
   EWF format. These different sections are marked by the section type in
   the section start. These sections are:
     * the "header" and "header2" sections, which contain the case data
     * the "volume" or "disk" and "data" sections, which contain
       information about the acquired media
     * the "sectors" section, which contains the actual bit-copied data
     * the "table" and "table2" sections, which contain information about
       the bit-copied data in the sectors section
     * the "next" and "done" sections, which are used to mark the end of
       a segment file
     * the "hash" section, which contains a MD5 hash of the bit-copied
       data
     * the "error2" section, which contains information about the read
       errors during acquisition
       
   The original specification only contained the "header", "volume",
   "table", "table2", "next" and "done" sections. A short overview of two
   important sections is provide below.
   
    The header sections
    
   Both the header and header2 sections contain information regarding the
   case, like the examiner name, case number, evidence number, and the
   password hash.
   
   The header sections consist of a compressed text string.
     * In the header section the string contains ASCII values.
     * In the header2 section the string contains UTF16 Unicode values.
       
   These strings contain tab and line separated values. The values are
   represented by different identifiers. For example, the values in the
   header section within EnCase 4 and 5 contain the following:
     * Case number (identified by c)
     * Evidence number (identified by n)
     * Unique description (identified by a)
     * Examiner name (identified by e)
     * Notes (identified by t)
     * The EnCase (software) version used to acquire the media
       (identified by av)
     * The platform/operating system used to acquire the media
       (identified by ov)
     * Acquired date (identified by m)
     * System date (identified by u)
     * Password Hash (identified by p)
       
   Only the first segment file contains the header section. The header
   section is the first section in the segment file after the signature
   and the field part. There is a difference of which header section is
   used within different versions of Encase. Also the data within the
   header sections differs per version. This can be found in the detailed
   specification.
   
    The hash section
    
   The hash section contains a MD5 hash of the acquired data. In Encase 4
   and 5, these sections also contain additional data (16 bytes) of which
   we are still uncertain what it represents.
   
   The hash section is found in the last segment file, before the done
   section. The hash section is an optional section so it can be left
   out. The hash only allows verification of the acquired data, not the
   verification of the additional metadata like the case information
   within the header. For example the hash could also be calculated over
   the data in the headers-, media information- and tables sections.
   
  Evidence integrity?
  
   Setting a password within EnCase does not always protect the data in
   the segment files. The password is only used by the EnCase
   application. So only EnCase may read the password hash from the header
   sections. Within the libewf implementation in the Sleuth Kit, the
   password is ignored.
   
   The EWF format does allow for protection against several forms of
   corruption. But it does not enforce protection against manipulation.
   The most recent version of EnCase still allows for the hash value to
   be optional.
   
   This allows for ways of manipulating the evidence. The following is a
   theoretical approach to manipulate the data. We think it is possible
   but we have not actually verified it. This requires further analysis,
   but I think people should be aware of the fact that one cannot rely
   entirely on the integrity features provided by the EWF format.
   
   One could adjust the case information in the header sections. The
   header contains several values that could be of interest of
   manipulation. Method of approach:
     * extract the payload of a header section
     * decompress the payload
     * adjust the values
     * compress the modified data
     * write the compressed data back to the file
       
   The amount of data that needs to be modified is dependent on the EWF
   version. E.g. for Encase 5 the header data should be adjusted in both
   the header and header2 sections.
   
   If the new header is larger than the previous one, all section offsets
   in the first segment file need to be corrected. If the new header is
   smaller then it is highly probable that simple padding should suffice.
   But most of the time manipulation of a single value could suffice. For
   example:
     * One could adjust the password hash, which would render the file
       unreadable in EnCase.
     * One could adjust the acquisition date. Which is just a date string
       in the header section and a time stamp string in the header2
       section.
       
   This is not very serious, unless any value is given to the additional
   case data. But how will the evidence hold up in court if any of these
   values have been manipulated?
   
   The data of specific chunks could be manipulated. The data within a
   chunk can be adjusted as long as the CRC of the specific chunk is
   recalculated and also adjusted. This approach is more difficult for
   compressed chunks, because one should also adjust the offsets in the
   table sections and the offsets of the sections. An approach of
   manipulating an uncompressed chunk is to:
     * read a chunk from the file
     * adjust the data
     * calculate the new CRC
     * write the data and CRC back to the file
       
   However if a hash section has been specified then the manipulation can
   be detected. But that is no problem because the hash section is
   optional, so can be removed completely. This is very easy to do:
     * find the start offset of the hash section in the last segment file
     * remove it
     * adjust the offset in the done section to point to itself, the
       value can be obtained from the section prior to the former hash
       section
       
   Apart from removing the hash section, in theory it is possible to
   adjust the MD5 hash directly in the hash section. If one is able to
   determine the new MD5 hash for the manipulated data all one has to do
   is to write it to the hash section.
   
   Another, but more difficult approach is to determine data which has
   the identical MD5 hash as the original data (collision). The only
   manipulation required is that of chunks and their CRC values.
   Collision attacks for MD5 hashes are possible.
   
   If you want to be thorough, you can also hash the individual segment
   (EWF) files using tools like md5sum and/or sha1sum to be able to
   verify the integrity of the segment files.
   
  The library
  
   Because we found differences in the EWF image files that were created
   by different programs, we analyzed the image files from EnCase version
   1.991, 2.17a, 3.21b, 4.22, and 5.04a as well as from FTK Imager 2.3.
   
   We also tested several different features for each program including
   different compression levels, password settings, hash settings, disk
   drive sizes, and both split and single evidence files. Extra attention
   was paid to EnCase 5 and additional test cases included different
   block sizes and error block sizes.
   
   These test cases allowed us to determine the details of the different
   versions of the image file format. From this work, libewf can read and
   verify:
     * Single and multiple segment EWF files
     * Non compressed and compressed EWF files
     * EWF files with or without password
     * EWF files with or without a hash
     * EWF files with chunk size differentiating 32k
     * EWF files containing large images (tested up to 300 Giga bytes)
       
   Libewf was developed to support multiple platforms so that it could be
   integrated into The Sleuth Kit. Therefore, it supports different
   endian orderings, 64-bit systems, and other platforms besides only
   Linux.
   
   Libewf supports the following platforms and compilers:
     * Linux Fedora Core 4, i386, gcc 4.0.2
     * Linux Fedora Core 5, x86_64, gcc 4.1.0
     * Linux buntu 5.10, i386, gcc 4.0.2
     * Mac OS-X Tiger, G4 Motolora, gcc
     * Cygwin/Windows XP, i386, gcc
     * FreeBSD 6.0, i386, gcc 3.4.4
     * OpenBSD 3.8, i386, gcc 3.3.5
     * NetBSD 3.0, i386, gcc 3.3.3
     * SunOS 5.11 / Solaris 11, i386, gcc 3.3.2
     * Debian 3.1, i386, gcc 2.9.5 and gcc 3.3
       
    Library Features
    
   Libewf provides an easy to use interface that allows programs to open
   EWF files, read data from them, and verify the data integrity. When
   reading image files, libewf performs various integrity checks. It
   validates:
     * that the structure of the segment files is correct and that no
       offset or size values have been corrupted;
     * that the CRC value for the sections is correct and that no file
       structure data was corrupted;
     * that the CRC value for the media data is correct
     * that the offset values of the media data in the table and the
       table2 sections are equal and that the table sections are not
       corrupt
       
   When a CRC error is encountered in one of the chunks, a warning
   message is displayed. If other validation errors occur though, the
   code terminates. One of the future goals is to implement a more user
   friendly method of error handling. In addition to providing access to
   the data stored in the image format, the library also provides access
   to the metadata. Example metadata includes case information and MD5
   values.
   
    Source and dependencies
    
   The libewf source code is organized into several files. The following
   naming convention is used:
     * Files that start with ewf_ contain the implementation of EWF
       specific definitions, such as the layout of the different
       sections,
     * The remaining files contain code regarding the workings of the
       library.  For example, the main file handling code is in file.c
       and the reading code is in file_read.c.
       
   The "libewf.h" file should be included when using libewf with other
   tools.
   
   Libewf is dependent on two libraries, which are found on most systems
   per default:
     * zlib for (un)compression support
     * openssl libcrypto for MD5 calculation support
       
    The clockwork
    
   In this section we take a look at how to develop tools using libewf
   and describe some of its internal clockwork.
   
   To read from an image file, the image file must first be opened using
   libewf_open, which will dynamically create an instance of the
   LIBEWF_HANDLE structure. This structure is used by other libewf
   functions to store and retrieve data. Some functions allocate memory
   in the structure and therefore it should be closed with libewf_close.
   
   The libewf_open function builds a dynamic array of the segment files
   in a split image, referred to as the segment table, and verifies:
     * That each file contains the EWF file signature
     * That the segment number in the field part is specified only once
       
   After each segment file has been added to the segment table the offset
   table is built. The offset table is a dynamic array containing the
   offset of every chunk within all the segment files. This process:
     * Verifies that no segment files are missing.
     * Builds a list of the section in each segment.
     * Verifies the CRC value of each section. Currently it warns you if
       a CRC is invalid but keeps processing.
     * Compares backup copies of data in the image format to ensure that
       they are consistent, however the code does not currently correct
       for errors in one of the structures.
       
   To read the entire media to a file descriptor, for example to export
   the media data, the function libewf_read_to_file_descriptor can be
   used. This function reads the data from the EWF files, and writes this
   to a file descriptor. This function allows for a callback function to
   be specified which can be used to provide for a progress indicator as
   in ewfexport.
   
   For tools like TSK, reading data from random locations is required.
   For this purpose, the libewf_read_random function can be used. The
   function will convert an offset value to the segment file that it is
   located in.
   
   The libewf_read_random caches the data of a single chunk to improve
   the performance of a large amount of small reads. As mentioned before
   currently libewf only warns about a CRC mismatch within a chunk. It
   will only do so when reading the actual chunk from file, not the
   cached version.
   
   Libewf provides other functions to access the MD5 stored within the
   EWF files and to calculate the MD5 hash over the media data. The
   ewfverify tool uses this functionality.
   
   To access the stored MD5 hash value, the function libewf_data_md5hash
   can be used. To calculate the MD5 hash value, the function
   libewf_calculate_md5hash can be used. Both return a string containing
   the MD5 hash value.
   
  The future
  
   The immediate future for libewf entails publishing it on line. We have
   created a project page for it on the uitwissel platform (translated
   exchange platform). The link to the project page and working document
   can be found below in the references. The source code will be uploaded
   shortly.
   
   The future goals regarding libewf are:
     * Implement write support
     * Implement friendlier error handling, not immediate termination
     * Figuring out more of the unknown values in the new versions of the
       format.
     * More data verification to test that the data sections match the
       volume or disk section
     * To allow overriding certain validation checks to be able to
       process corrupted or manipulated image files
       
   We would appreciate feedback on the specification we published on EWF
   and libewf, both the successes and the failures. We are also
   interested in feedback and reports on the EWF files created by other
   tools that we have not been able to test and analyze.
   
  References
  
   ] Hoffmann Investigations: [12]http://www.hoffmannbv.nl/
   
   [2] Libewf project website:
   https://www.uitwisselplatform.nl/projects/libewf/
   
   [3] EWF specification:
   https://www.uitwisselplatform.nl/docman/view.php/53/62/ewf.pdf
   
   [4] Expert Witness Compression Format Specification:
   http://www.asrdata.com/SMART/whitepaper.html
   
   [5] Forensic Wiki EnCase specification:
   http://www.forensicswiki.org/wiki/EnCase
   
   [6] Sleuth Kit Informer article about iowrapper:
   http://www.sleuthkit.org/informer/sleuthkit-informer-19.html#hook
   
   [7] PyFlag: http://pyflag.sourceforge.com/
     _________________________________________________________________
   
   Copyright (c) 2006 by Brian Carrier. All Rights Reserved
   This work is licensed under the Creative Commons
   Attribution-NonCommercial-ShareAlike 2.5 License.
   Creative Commons License