Autopsy User Documentation  3.1
Graphical digital forensics platform for The Sleuth Kit and other tools.
Keyword Search

Autopsy ships a keyword search module, which provides the ingest capability and also supports a manual text search mode.

The keyword search ingest module extracts text from the files on the image being ingested and adds them to the index that can then be searched.

Autopsy tries its best to extract maximum amount of text from the files being indexed. First, the indexing will try to extract text from supported file formats, such as pure text file format, MS Office Documents, PDF files, Email files, and many others. If the file is not supported by the standard text extractor, Autopsy will fallback to string extraction algorithm. String extraction on unknown file formats or arbitrary binary files can often still extract a good amount of text from the file, often good enough to provide additional clues. However, string extraction will not be able to extract text strings from binary files that have been encrypted.

Autopsy ships with some built-in lists that define regular expressions and enable user to search for Phone Numbers, IP addresses, URLs and E-mail addresses. However, enabling some of these very general lists can produce a very large number of hits, many of them can be false-positives.

Once files are in the index, they can be searched quickly for specific keywords, regular expressions, or using keyword search lists that can contain a mixture of keywords and regular expressions. Search queries can be executed automatically by the ingest during the ingest run, or at the end of the ingest, depending on the current settings and the time it takes to ingest the image.

Search queries can also be executed manually by the user at any time, as long as there are some files already indexed and ready to be searched.

Keyword search module will save the search results regardless whether the search is performed by the ingest process, or manually by the user. The saved results are available in the Directory Tree in the left hand side panel.

To see keyword search results in real-time while ingest is running, add keyword lists using the Keyword Search Configuration Dialog and select the "Use during ingest" check box. You can select "Send messages to inbox during ingest" per list, if the hits on that list should be reported in the Inbox, which is recommended for very specific searches.

See (Ingest) for more information on ingest in general.

Once there are files in the index, the Keyword Search Bar will be available for use to manually search at any time.

Keyword Search Configuration Dialog

The keyword search configuration dialog has three tabs, each with it's own purpose:

To create a list, select the 'New List' button and choose a name for the new Keyword List. Once the list has been created, keywords can be added to it. Regular expressions are supported using Java Regex Syntax. Lists can be added to the keyword search ingest process; searches will happen at regular intervals as content is added to the index.

List Import and Export
Autopsy supports importing Encase tab-delimited lists as well as lists created previously with Autopsy. For Encase lists, folder structure and hierarchy is currently ignored. This will be fixed in a future version. There is currently no way to export lists for use with Encase. This will also be added in future releases.

String extraction setting
The string extraction setting defines how strings are extracted from files from which text cannot be extracted because their file formats are not supported. This is the case with arbitrary binary files (such as the page file) and chunks of unallocated space that represent deleted files. When we extract strings from binary files we need to interpet sequences of bytes as text differently, depending on the possible text encoding and script/language used. In many cases we don't know what the specific encoding / language the text is be encoded in in advance. However, it helps if the investigator is looking for a specific language, because by selecting less languages the indexing performance will be improved and a number of false positives will be reduced. The default setting is to search for English strings only, encoded as either UTF8 or UTF16. This setting has the best performance (shortest ingest time). The user can also use the String Viewer first and try different script/language settings, and see which setting gives satisfactory results for the type of text relevant to the investigation. Then the same setting that works for the investigation can be applied to the keyword search ingest.

The hash database ingest service can be configured to use the NIST NSRL hash database of known files. The keyword search advanced configuration dialog "General" tab contains an option to skip keyword indexing and search on files that have previously marked as "known" and uninteresting files. Selecting this option can greatly reduce size of the index and improve ingest performance. In most cases, user does not need to keyword search for "known" files.

Result update frequency during ingest
To control how frequently searches are executed during ingest, user can adjust the timing setting available in the keyword search advanced configuration dialog "General" tab. Setting the number of minutes lower will result in more frequent index updates and searches being executed and the user will be able to see results more in real-time. However, more frequent updates can affect the overall performance, especially on lower-end systems, and can potentially lengthen the overall time needed for the ingest to complete.

Lists tab


String Extraction tab


General tab


Keyword Search Bar

The keyword search bar is used to search for keywords in the manual mode (outside of ingest). The existing index will be searched for matching words, phrases, lists, or regular expressions. Results will be opened in a separate Results Viewer for every search executed and they will also be saved in the Directory Tree.

Individual Keyword Search
Individual keyword or regular expressions can be quickly searched using the search text box widget. To toggle between keyword and regular expression mode, use the down arrow in the search box.

Keyword List Search
Lists created using the Keyword Search Configuration Dialog can be manually searched by the user by pressing on the 'Keyword Lists' button, selecting the check boxes corresponding to the lists to be searched, and pressing the 'Search' button.

Searching during ingest
The manual search for individual keywords or regular expressions can be executed also during the ongoing ingest on the current index using the search text box widget. Note however, that you may miss some results if not entire index has yet been populated. Autopsy enables you to perform the search on an incomplete index in order to retrieve some preliminary results in real-time.

During the ingest, the manual search by keyword list is deactivated. A newly selected list can instead be added to the ongoing ingest, and it will be searched in the background instead.

Keywords and lists can be managed during ingest.


Copyright © 2012-2015 Basis Technology. Generated on Tue Mar 17 2015
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.