Autopsy User Documentation  4.20.0
Graphical digital forensics platform for The Sleuth Kit and other tools.
Ad Hoc Keyword Search

Table of Contents

Overview

The ad hoc keyword search features allows you to run single keyword terms or lists of keywords against all images in a case. Both options are located in the top right of the main Autopsy window.

keyword-search-ad-hoc.PNG

The Keyword Search Module must be selected during ingest before doing an ad hoc keyword search. If you don't want to search for any of the existing keyword lists, you can deselect everything to just index the files for later searching.

Limitations of Ad Hoc Keyword Search

With the release of Autopsy 4.21.0, two types of keyword searching are supported: Solr search with full text indexing and the built-in Autopsy "In-Line" Keyword Search.

Enabling full text indexing with Solr during the ingest process allows for comprehensive ad-hoc manual text searching, encompassing all of the extracted text from files and artifacts.

On the other hand, the In-Line Keyword Search conducts the search during ingest, specifically at the time of text extraction. It only indexes small sections of the files that contain keyword matches (for display purposes). Consequently, unless full text indexing with Solr is enabled, the ad-hoc search will be restricted to these limited sections of the files that had keyword hits. This limitation significantly reduces the amount of searchable text available for ad-hoc searches.

Other situations which will result in not being able to search all of the text extracted from all of the files and artifacts include:

Creating Keywords

The following sections will give a description of each keyword type, then will show some sample text and how various search terms would work against it.

Exact match

Exact match should be used in cases where the search term is expected to always be surrounded by non-word characters (typically whitespace or punctuation). Spaces/punctuation are allowed in the search term, and capitalization is ignored.

The quick reddish-brown fox jumps over the lazy dog.

Substring match

Substring match should be used where the search term is just part of a word, or to allow for different word endings. Capitalization is ignored but spaces and other punctuation can not appear in the search term.

The quick reddish-brown fox jumps over the lazy dog.

Regex match

Regex match can be used to search for a specific pattern. Regular expressions are supported using Lucene Regex Syntax which is documented here: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-regexp-query.html#regexp-syntax. Wildcards are automatically added to the beginning and end of the regular expressions to ensure all matches are found. Additionally, the resulting hits are split on common token separator boundaries (e.g. space, newline, colon, exclamation point etc.) to make the resulting keyword hit more amenable to highlighting. As of Autopsy 4.9, regex searches are no longer case sensitive. This includes literal characters and character classes.

Note: Since Autopsy 4.4, boundary characters ('^' and '$') no longer work as word boundaries. Previously a search for "^[0-9]{5}$" would return all five digit strings surrounded by some type of non-word characters. For example, "The number 12345 is.." would contain a match, while "123456789 people" would not. This was because the regex was compared against each "word" in the document. In newer versions, the text is not split into words internally so this type of search no longer works. To get similar results, replace the boundary characters with the specific characters that should represent a word break. For example, "^[0-9]{5}$" could become "[ \.\-\,][0-9]{5}[ \.\-\,]".

There is some validation on the regex but it's best to test on a sample image to make sure your regexes are correct and working as expected. One simple way to test is by creating a sample text file that your expression should match, ingesting it as a Logical File Set and then running the regex query.

In the year 1885 in an article titled Current Notes, the quick brown fox first jumped over the lazy dog.

Other notes

Built-in keywords

The Keyword Search Module has several built-in searches that can not be edited. The ones that are most prone to false hits (IP Address and Phone Number) require that the matching text is surrounded by boundary characters, such as spaces or certain punctuation. For example:

If you want to override this default behavior:

Non-Latin text

In general all three types of keyword searches will work as expected but the feature has not been thoroughly tested with all character sets. For example, the searches may no longer be case-insensitive. As with regex above, we suggest testing on a sample file.

Differences between "In-Line" and Solr regular expression search

It's important to be aware that there might be occasional differences in the results of regular expression searches between the "In-Line" Keyword Search and Solr search. This is because the "In-Line" Keyword Search uses Java regular expressions, while Solr search employs Lucene regular expressions.

Keyword Search

Individual keyword or regular expressions can quickly be searched using the search text box widget. You can select "Exact Match", "Substring Match" and "Regular Expression" match. See the earlier Creating Keywords section for information on each keyword type, as well as Limitations of Ad Hoc Keyword Search. The search can be restricted to only certain data sources by selecting the checkbox near the bottom and then highlighting the data sources to search within. Multiple data sources can be selected used shift+left click or control+left click. The "Save search results" checkbox determines whether the search results will be saved to the case database.

keyword-search-bar.PNG

Results will be opened in a separate Results Viewer for every search executed. If the "Save search results" checkbox was enabled, the results will also be saved in the Directory Tree as shown in the screenshot below.

keyword-search-hits.PNG

Keyword Lists

In addition to being selected during ingest, keyword lists can also be run through the Keyword Lists button. For information on setting up these keyword lists, see the Lists tab section of the ingest module documentation.

Lists created using the Keyword Search Configuration Dialog can be manually searched by the user by pressing on the 'Keyword Lists' button and selecting the check boxes corresponding to the lists to be searched. The search can be restricted to only certain data sources by selecting the checkbox near the bottom and then highlighting the data sources to search within. Multiple data sources can be selected used shift+left click or control+left click. Once everything has been configured, press "Search" to begin the search. The "Save search results" checkbox determines whether the search results will be saved to the case database.

keyword-search-list.PNG

If the "Save search results" checkbox was enabled, the results of the keyword list search will be shown in the tree, as shown below.

keyword-search-list-results.PNG

Copyright © 2012-2022 Basis Technology. Generated on Tue Aug 1 2023
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.