Keyword Search and Indexing

Autopsy uses the powerful text indexing engine Apache SOLR to power its fast and robust keyword searching features. Pre-defined lists of keywords and regular expressions can be configured to run while the image is being ingested. By default, Autopsy includes regular expression searches for:

  • Email Addresses
  • Phone Numbers
  • IP Addresses
  • URLs

In addition, ad-hoc keyword searches can be run directly from the content bar during an investigation after (or even during) ingest of the disk image. Multiple concurrent keyword searches can be run against the search index.

Instead of searching the raw data, keyword searching in Autopsy is performed on the output of text extraction modules. Autopsy uses Tika and other libraries to extract text from HTML, Microsoft Office, PDF, RTF, and more. This approach is more effective at finding text than the byte-level searching for non-English PDF files and docx files whereby the data is compressed.

Any results from the searches that are enabled (regular expression or user defined lists) will appear in the 'Keyword hits' node in the Autopsy navigation tree.

Thanks to Apache SOLR, all of the files Autopsy identifies that have text content in them will be indexed for searching by either the pre-defined regular expression lists, user defined keyword lists, or ad-hoc queries.