Autopsy User Documentation  3.1
Graphical digital forensics platform for The Sleuth Kit and other tools.
Keyword Search Module

What Does It Do

The Keyword Search module facilitates both the ingest portion of searching and also supports manual text searching after ingest has completed. It extracts text from the files being ingested and adds them to a Solr index that can then be searched.

Autopsy tries its best to extract the maximum amount of text from the files being indexed. First, the indexing will try to extract text from supported file formats, such as pure text file format, MS Office Documents, PDF files, Email, and many others. If the file is not supported by the standard text extractor, Autopsy will fall back to a string extraction algorithm. String extraction on unknown file formats or arbitrary binary files can often extract a sizeable amount of text from a file, often enough to provide additional clues to reviewers. String extraction will not extract text strings from encrypted files.

Configuration

Autopsy ships with some built-in lists that define regular expressions and enable the user to search for Phone Numbers, IP addresses, URLs and E-mail addresses. However, enabling some of these very general lists can produce a very large number of hits, and many of them can be false-positives. Regular expressions involving backtracking can potentially take a long time to complete.

Once files are placed in the Solr index, they can be searched quickly for specific keywords, regular expressions, or keyword search lists that can contain a mixture of keywords and regular expressions. Search queries can be executed automatically during the ingest run or at the end of the ingest, depending on the current settings and the time it takes to ingest the image.

Keyword Search Configuration Dialog

The keyword search configuration dialog has three tabs, each with it's own purpose:

To create a list, select the 'New List' button and choose a name for the new Keyword List. Once the list has been created, keywords can be added to it. Regular expressions are supported using Java Regex Syntax. Lists can be added to the keyword search ingest process; searches will happen at regular intervals as content is added to the index.

List Import and Export
Autopsy supports importing Encase tab-delimited lists as well as lists created previously with Autopsy. For Encase lists, folder structure and hierarchy is currently ignored. This will be fixed in a future version. There is currently no way to export lists for use with Encase. This will also be added in future releases.

Lists tab

keyword-search-configuration-dialog.PNG


String extraction setting
The string extraction setting defines how strings are extracted from files from which text cannot be extracted because their file formats are not supported. This is the case with arbitrary binary files (such as the page file) and chunks of unallocated space that represent deleted files. When we extract strings from binary files we need to interpet sequences of bytes as text differently, depending on the possible text encoding and script/language used. In many cases we don't know in advance what the specific encoding/language the text is encoded in. However, it helps if the investigator is looking for a specific language, because by selecting less languages the indexing performance will be improved and the number of false positives will be reduced.

The default setting is to search for English strings only, encoded as either UTF8 or UTF16. This setting has the best performance (shortest ingest time). The user can also use the String Viewer first and try different script/language settings, and see which settings give satisfactory results for the type of text relevant to the investigation. Then the same setting that works for the investigation can be applied to the keyword search ingest.
String Extraction tab

keyword-search-configuration-dialog-string-extraction.PNG



General Settings

NIST NSRL Support
The hash database ingest service can be configured to use the NIST NSRL hash database of known files. The keyword search advanced configuration dialog "General" tab contains an option to skip keyword indexing and search on files that have previously marked as "known" and uninteresting files. Selecting this option can greatly reduce size of the index and improve ingest performance. In most cases, user does not need to keyword search for "known" files.

Result update frequency during ingest
To control how frequently searches are executed during ingest, the user can adjust the timing setting available in the keyword search advanced configuration dialog "General" tab. Setting the number of minutes lower will result in more frequent index updates and searches being executed and the user will be able to see results more in real-time. However, more frequent updates can affect the overall performance, especially on lower-end systems, and can potentially lengthen the overall time needed for the ingest to complete.

One can also choose to have no periodic searches. This will speed up the ingest. Users choosing this option can run their keyword searches once the entire keyword search index is complete.

General tab

keyword-search-configuration-dialog-general.PNG


Using the Module

Search queries can be executed manually by the user at any time, as long as there are some files already indexed and ready to be searched. Searching before indexing is complete will naturally only search indexes that are already compiled.

See Ingest for more information on ingest in general.

Once there are files in the index, the Keyword Search Bar will be available for use to manually search at any time.

Ingest Settings

The Ingest Settings for the Keyword Search module allow the user to enable or disable the specific built-in search expressions, Phone Numbers, IP Addresses, Email Addresses, and URLs. Using the Advanced button (covered below), one can add custom keyword groups.

keyword-search-ingest-settings.PNG


Keyword Search Bar

The keyword search bar is used to search for keywords in the manual mode (outside of ingest). The existing index will be searched for matching words, phrases, lists, or regular expressions.

Individual Keyword Search
Individual keyword or regular expressions can quickly be searched using the search text box widget. You can select "Exact Match", "Substring Match" and "Regular Expression" match.

keyword-search-bar.PNG


Results will be opened in a separate Results Viewer for every search executed and they will also be saved in the Directory Tree as shown in the screenshot below.

keyword-search-hits.PNG


Keyword List Search
Lists created using the Keyword Search Configuration Dialog can be manually searched by the user by pressing on the 'Keyword Lists' button, selecting the check boxes corresponding to the lists to be searched, and pressing the 'Search' button.

keyword-search-list.PNG


The results of the keyword list search are shown in the tree, as shown below.

keyword-search-list-results.PNG


Searching during ingest
Manual search for individual keywords or regular expressions can be executed while ingest is ongoing, using the current index. Note however, that you may miss some results if the entire index has not yet been populated. Autopsy enables you to perform the search on an incomplete index in order to retrieve some preliminary results in real-time.

During the ingest, the manual search by keyword list is deactivated. A newly selected list can instead be added, and it will be searched in the background instead.

Keywords and lists can be managed during ingest.

Seeing Results

The Keyword Search module will save the search results regardless whether the search is performed by the ingest process, or manually by the user. The saved results are available in the Directory Tree in the left hand side panel.

To see keyword search results in real-time while ingest is running, add keyword lists using the Keyword Search Configuration Dialog and select the "Use during ingest" check box. You can select "Send messages to inbox during ingest" per list, if the hits on that list should be reported in the Inbox, which is recommended for very specific searches.


Copyright © 2012-2015 Basis Technology. Generated on Mon Oct 19 2015
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.