Autopsy  4.21.0
Graphical digital forensics platform for The Sleuth Kit and other tools.
Classes | Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | Static Private Attributes | List of all members
org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule Class Reference

Inherits org.sleuthkit.autopsy.ingest.FileIngestModule.


enum  IngestStatus
enum  StringsExtractOptions

Public Member Functions

ProcessResult process (AbstractFile abstractFile)
void shutDown ()
void startUp (IngestJobContext context) throws IngestModuleException

Private Member Functions

BlackboardAttribute checkAttribute (String key, String value)
void cleanup ()
void createMetadataArtifact (AbstractFile aFile, Map< String, String > metadata)
boolean extractStringsAndIndex (AbstractFile aFile)
boolean extractTextAndSearch (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws IngesterException
Optional< TextExtractorgetExtractor (AbstractFile abstractFile)
Reader getTikaOrTextExtractor (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws TextExtractor.InitReaderException
boolean isLimitedOCRFile (AbstractFile aFile, String mimeType)
void postIndexSummary ()
void searchFile (Optional< TextExtractor > extractor, AbstractFile aFile, String mimeType, boolean indexContent)
boolean searchTextFile (AbstractFile aFile)

Static Private Member Functions

static void putIngestStatus (long ingestJobId, long fileId, IngestStatus status)

Private Attributes

IngestJobContext context
FileTypeDetector fileTypeDetector
Ingester ingester = null
boolean initialized = false
int instanceNum = 0
long jobId
final IngestServices services = IngestServices.getInstance()
final KeywordSearchJobSettings settings
Lookup stringsExtractionContext

Static Private Attributes

static final String IMAGE_MIME_TYPE_PREFIX = "image/"
static final Map< Long, Map< Long, IngestStatus > > ingestStatus = new HashMap<>()
static final AtomicInteger instanceCount = new AtomicInteger(0)
static final int LIMITED_OCR_SIZE_MIN = 100 * 1024
static final Logger logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())
static final List< String > METADATA_DATE_TYPES
static final Map< String, BlackboardAttribute.ATTRIBUTE_TYPE > METADATA_TYPES_MAP
static final ImmutableSet< String > OCR_DOCUMENTS
static final IngestModuleReferenceCounter refCounter = new IngestModuleReferenceCounter()

Detailed Description

An ingest module on a file level Performs indexing of allocated and Solr supported files, string extraction and indexing of unallocated and not Solr supported files Index commit is done periodically (determined by user set ingest update interval) Runs a periodic keyword / regular expression search on currently configured lists for ingest and writes results to blackboard Reports interesting events to Inbox and to viewers

Definition at line 90 of file

Member Function Documentation

BlackboardAttribute org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.checkAttribute ( String  key,
String  value 

Definition at line 656 of file

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.cleanup ( )

Common cleanup code when module stops or final searcher completes

Definition at line 445 of file

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.createMetadataArtifact ( AbstractFile  aFile,
Map< String, String >  metadata 
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractStringsAndIndex ( AbstractFile  aFile)

Extract strings using heuristics from the file and add to index.

aFilefile to extract strings from, divide into chunks and index
true if the file was text_ingested, false otherwise

Definition at line 708 of file

References org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context, org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.STRINGS_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractTextAndSearch ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws IngesterException

File indexer, processes and indexes known/allocated files, unknown/unallocated files and directories accordingly Extract text with Tika or other text extraction modules (by streaming) from the file Divide the file into chunks and index the chunks

extractorOptionalThe textExtractor to use with this file or empty.
aFilefile to extract strings from, divide into chunks and index
extractedMetadataMap that will be populated with the file's metadata.
true if the file was text_ingested, false otherwise
IngesterExceptionexception thrown if indexing failed

Definition at line 566 of file

Optional<TextExtractor> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getExtractor ( AbstractFile  abstractFile)
Reader org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getTikaOrTextExtractor ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws TextExtractor.InitReaderException
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.isLimitedOCRFile ( AbstractFile  aFile,
String  mimeType 

Returns true if file should have OCR performed on it when limited OCR setting is specified.

aFileThe abstract file.
mimeTypeThe file mime type.
True if file should have text extracted when limited OCR setting is on.

Definition at line 460 of file

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.postIndexSummary ( )
ProcessResult org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.process ( AbstractFile  file)

Processes a file. Called between calls to startUp() and shutDown(). Will be called for each file in a data source.

IMPORTANT: In addition to returning ProcessResult.OK or ProcessResult.ERROR, modules should log all errors using methods provided by the org.sleuthkit.autopsy.coreutils.Logger class. Log messages should include the name and object ID of the data being processed and any other information that would be useful for debugging. If an exception has been caught by the module, the exception should be sent to the logger along with the log message so that a stack trace will appear in the application log.

fileThe file to analyze.
A result code indicating success or failure of the processing.

Implements org.sleuthkit.autopsy.ingest.FileIngestModule.

Definition at line 346 of file

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.modules.filetypeid.FileTypeDetector.getMIMEType(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchJobSettings.isOCREnabled(), org.sleuthkit.autopsy.ingest.IngestModule.ProcessResult.OK, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING.

static void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.putIngestStatus ( long  ingestJobId,
long  fileId,
IngestStatus  status 

Records the ingest status for a given file for a given ingest job. Used for final statistics at the end of the job.

ingestJobIdid of ingest job
fileIdid of file
statusingest status of the file

Definition at line 208 of file

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchFile ( Optional< TextExtractor extractor,
AbstractFile  aFile,
String  mimeType,
boolean  indexContent 

Adds the file to the index. Detects file type, calls extractors, etc.

extractorThe textExtractor to use with this file or empty if no extractor found.
aFileFile to analyze.
mimeTypeThe file mime type.
indexContentFalse if only metadata should be text_ingested. True if content and metadata should be index.

Extract unicode strings from unallocated and unused blocks and carved text files. The reason for performing string extraction on these is because they all may contain multiple encodings which can cause text to be missed by the more specialized text extractors used below.

Definition at line 734 of file

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.METADATA_INGESTED, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_TEXTEXTRACT, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchTextFile ( AbstractFile  aFile)

Adds the text file to the index given an encoding. Returns true if indexing was successful and false otherwise.

aFileText file to analyze

Definition at line 848 of file

References org.sleuthkit.autopsy.textextractors.TextFileExtractor.getReader(), and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.shutDown ( )
void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.startUp ( IngestJobContext  context) throws IngestModuleException

Member Data Documentation

IngestJobContext org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context
FileTypeDetector org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.fileTypeDetector

Definition at line 176 of file

final String org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IMAGE_MIME_TYPE_PREFIX = "image/"

Definition at line 152 of file

Ingester org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingester = null

Definition at line 175 of file

final Map<Long, Map<Long, IngestStatus> > org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingestStatus = new HashMap<>()

Definition at line 198 of file

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.initialized = false

Definition at line 182 of file

final AtomicInteger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceCount = new AtomicInteger(0)

Definition at line 184 of file

int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceNum = 0

Definition at line 185 of file

long org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.jobId

Definition at line 183 of file

final int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.LIMITED_OCR_SIZE_MIN = 100 * 1024

Definition at line 92 of file

final Logger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())

Definition at line 173 of file

final List<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_DATE_TYPES
Initial value:
= ImmutableList.of(

Definition at line 134 of file

final Map<String, BlackboardAttribute.ATTRIBUTE_TYPE> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_TYPES_MAP
Initial value:
= ImmutableMap.<String, BlackboardAttribute.ATTRIBUTE_TYPE>builder()
.put("Last-Save-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_MODIFIED)
.put("Last-Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_USER_ID)
.put("Creation-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_CREATED)
.put("Company", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_ORGANIZATION)
.put("Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_OWNER)
.put("Application-Name", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
.put("Last-Printed", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_LAST_PRINTED_DATETIME)
.put("Producer", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
.put("Title", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DESCRIPTION)
.put("pdf:PDFVersion", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_VERSION)

Definition at line 139 of file

final ImmutableSet<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.OCR_DOCUMENTS
Initial value:
= ImmutableSet.of(

Definition at line 155 of file

final IngestModuleReferenceCounter org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.refCounter = new IngestModuleReferenceCounter()

Definition at line 186 of file

final IngestServices = IngestServices.getInstance()

Definition at line 174 of file

final KeywordSearchJobSettings org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.settings

Definition at line 181 of file

Lookup org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.stringsExtractionContext

Definition at line 180 of file

The documentation for this class was generated from the following file:

Copyright © 2012-2022 Basis Technology. Generated on: Tue Feb 6 2024
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.