This page provides some tips about how you might implement some commonly needed modules. As outlined in File Analysis vs. Post-Processing Pipelines, modules can be written for file analysis pipelines or post-processing pipelines. The main difference between the two types of module is how often and when they are run. File analysis modules are run on each individual file and post-processing modules are run once, after all the files in an image have been analyzed. In general, if your module will only examine a very small subset of the files in an image, then it is more efficient to make it a post-processing module. If your module will analyze every file in an image, then it should be a file analysis module.

MD5 Hash Calculation Module

It is common to calculate the MD5 hash for each file in an image. Because we want it to calculate the hash of every file, an MD5 hash calculation module would be a file analysis module.

Libraries for calculating MD5 hash sums usually do not have setup or cleanup requirements, so we most likely would write initialize() and finalize() functions that do little more than return a status of TskModule::OK.

The run() function of the module would need code to read the contents of the file using TskFile::read(), and to push the file contents into the hash calculation functions of the MD5 library. The calculated MD5 value would be added to the image database by making a call to TskFile::setHash().

NOTE: The source code for a hash calculation module that produces either or both MD5 and SHA-1 hashes is included with the framework.

MD5 Hash Lookup Module

It is also common to look up the MD5 hash values of files in hash databases to identify known files and detect duplicate files. Because this kind of lookup is done for all files, it would be implemented in a file analysis module.

The initialize() function of a hash lookup module would probably open the hash database the module uses, and the module's finalize() function would close the database connection.

The run() function would get the MD5 hash of the given file using TskFile::getHash(). It would then look up the hash value in the hash database. Hits would be posted to the blackboard as TSK_HASHSET_HIT artifacts with a TSK_SET_NAME attribute identifying the hash database. The module would also use TskImgDB::updateKnownStatus() to set the status of the file on hits, and if the module was doing known good lookups, a hit would make the function return TskModule::STOP to terminate further analysis of the file.

NOTE: The source code for an MD5 hash lookup module that uses TSK's hash database support is included with the framework.

Registry Analysis Module

It would be common for a disk image analysis system to have a module that reads registry data. Because there are only a few registry hives on a Windows system, the module would probably be implemented as a post-processing module.

The initialize() and finalize() functions of a registry analysis module would probably simply return TskModule::OK.

The report() method would use TskImgDB::getFileIds() with a condition argument formulated to get the file ids of the registry hive files. The file ids would then be used to get TskFile objects to support analysis of the hives.

The results of the registry analysis might be used to generate a report.

NOTE: The source code for a registry analysis module that uses the open source RegRipper tool is included with the framework.

Zip File Extraction Module

A disk image analysis system would probably need a module to extract files zipped up in archive files. Because a zip file extraction module would need to examine each file in an image to determine if it was an archive file, it would be a file analysis module.

A zip file extraction module would probably have initialize() and finalize() functions that immediately return a status of TskModule::OK.

The module's run() function would need to check the given file to see if it was a zip file. If so, the module would need to process the file and then use calls to TskFileManager::addFile() and Scheduler::schedule() to add and schedule analysis for the extracted files.

NOTE: The source code for a zip file extraction module that uses the ZipArchive class in the Poco Zip library is included with the framework.