Autopsy  4.21.0
Graphical digital forensics platform for The Sleuth Kit and other tools.
Classes | Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | Static Private Attributes | List of all members
org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule Class Reference

Inherits org.sleuthkit.autopsy.ingest.FileIngestModule.

Classes

enum  IngestStatus
 
enum  StringsExtractOptions
 

Public Member Functions

ProcessResult process (AbstractFile abstractFile)
 
void shutDown ()
 
void startUp (IngestJobContext context) throws IngestModuleException
 

Private Member Functions

BlackboardAttribute checkAttribute (BlackboardAttribute.ATTRIBUTE_TYPE attrType, String value)
 
void cleanup ()
 
void createMetadataArtifact (AbstractFile aFile, Map< String, String > metadata)
 
boolean extractStringsAndIndex (AbstractFile aFile)
 
boolean extractTextAndSearch (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws IngesterException
 
Optional< TextExtractorgetExtractor (AbstractFile abstractFile)
 
Reader getTikaOrTextExtractor (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws TextExtractor.InitReaderException
 
boolean isLimitedOCRFile (AbstractFile aFile, String mimeType)
 
void postIndexSummary ()
 
void searchFile (Optional< TextExtractor > extractor, AbstractFile aFile, String mimeType, boolean indexContent)
 
boolean searchTextFile (AbstractFile aFile)
 

Static Private Member Functions

static void putIngestStatus (long ingestJobId, long fileId, IngestStatus status)
 

Private Attributes

IngestJobContext context
 
FileTypeDetector fileTypeDetector
 
Ingester ingester = null
 
boolean initialized = false
 
int instanceNum = 0
 
long jobId
 
final IngestServices services = IngestServices.getInstance()
 
final KeywordSearchJobSettings settings
 
Lookup stringsExtractionContext
 

Static Private Attributes

static final String IMAGE_MIME_TYPE_PREFIX = "image/"
 
static final Map< Long, Map< Long, IngestStatus > > ingestStatus = new HashMap<>()
 
static final AtomicInteger instanceCount = new AtomicInteger(0)
 
static final int LIMITED_OCR_SIZE_MIN = 100 * 1024
 
static final Logger logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())
 
static final Map< String, Pair< BlackboardAttribute.ATTRIBUTE_TYPE, Integer > > METADATA_TYPES_MAP
 
static final ImmutableSet< String > OCR_DOCUMENTS
 
static final IngestModuleReferenceCounter refCounter = new IngestModuleReferenceCounter()
 

Detailed Description

An ingest module on a file level Performs indexing of allocated and Solr supported files, string extraction and indexing of unallocated and not Solr supported files Index commit is done periodically (determined by user set ingest update interval) Runs a periodic keyword / regular expression search on currently configured lists for ingest and writes results to blackboard Reports interesting events to Inbox and to viewers

Definition at line 105 of file KeywordSearchIngestModule.java.

Member Function Documentation

BlackboardAttribute org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.checkAttribute ( BlackboardAttribute.ATTRIBUTE_TYPE  attrType,
String  value 
)
private

Create a metadata blackboard attribute based on specified content.

Parameters
attrTypeThe attribute type.
keyThe key for the attribute.
valueThe value of the attribute.
Returns

Definition at line 752 of file KeywordSearchIngestModule.java.

References org::sleuthkit::datamodel::BlackboardAttribute::TSK_BLACKBOARD_ATTRIBUTE_VALUE_TYPE.DATETIME.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.cleanup ( )
private

Common cleanup code when module stops or final searcher completes

Definition at line 510 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.createMetadataArtifact ( AbstractFile  aFile,
Map< String, String >  metadata 
)
private
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractStringsAndIndex ( AbstractFile  aFile)
private
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractTextAndSearch ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws IngesterException
private

File indexer, processes and indexes known/allocated files, unknown/unallocated files and directories accordingly Extract text with Tika or other text extraction modules (by streaming) from the file Divide the file into chunks and index the chunks

Parameters
extractorOptionalThe textExtractor to use with this file or empty.
aFilefile to extract strings from, divide into chunks and index
extractedMetadataMap that will be populated with the file's metadata.
Returns
true if the file was text_ingested, false otherwise
Exceptions
IngesterExceptionexception thrown if indexing failed

Definition at line 631 of file KeywordSearchIngestModule.java.

Optional<TextExtractor> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getExtractor ( AbstractFile  abstractFile)
private
Reader org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getTikaOrTextExtractor ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws TextExtractor.InitReaderException
private
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.isLimitedOCRFile ( AbstractFile  aFile,
String  mimeType 
)
private

Returns true if file should have OCR performed on it when limited OCR setting is specified.

Parameters
aFileThe abstract file.
mimeTypeThe file mime type.
Returns
True if file should have text extracted when limited OCR setting is on.

Definition at line 525 of file KeywordSearchIngestModule.java.

References org::sleuthkit::datamodel::TskData::TSK_DB_FILES_TYPE_ENUM.DERIVED, org::sleuthkit::datamodel::AbstractFile.getSize(), and org::sleuthkit::datamodel::AbstractFile.getType().

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.postIndexSummary ( )
private
ProcessResult org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.process ( AbstractFile  file)

Processes a file. Called between calls to startUp() and shutDown(). Will be called for each file in a data source.

IMPORTANT: In addition to returning ProcessResult.OK or ProcessResult.ERROR, modules should log all errors using methods provided by the org.sleuthkit.autopsy.coreutils.Logger class. Log messages should include the name and object ID of the data being processed and any other information that would be useful for debugging. If an exception has been caught by the module, the exception should be sent to the logger along with the log message so that a stack trace will appear in the application log.

Parameters
fileThe file to analyze.
Returns
A result code indicating success or failure of the processing.

Implements org.sleuthkit.autopsy.ingest.FileIngestModule.

Definition at line 411 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org::sleuthkit::datamodel::AbstractContent.getId(), org::sleuthkit::datamodel::AbstractFile.getKnown(), org.sleuthkit.autopsy.modules.filetypeid.FileTypeDetector.getMIMEType(), org::sleuthkit::datamodel::AbstractContent.getName(), org::sleuthkit::datamodel::AbstractFile.getType(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchJobSettings.isOCREnabled(), org::sleuthkit::datamodel::TskData::FileKnown.KNOWN, org.sleuthkit.autopsy.ingest.IngestModule.ProcessResult.OK, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, and org::sleuthkit::datamodel::TskData::TSK_DB_FILES_TYPE_ENUM.VIRTUAL_DIR.

static void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.putIngestStatus ( long  ingestJobId,
long  fileId,
IngestStatus  status 
)
staticprivate

Records the ingest status for a given file for a given ingest job. Used for final statistics at the end of the job.

Parameters
ingestJobIdid of ingest job
fileIdid of file
statusingest status of the file

Definition at line 273 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchFile ( Optional< TextExtractor extractor,
AbstractFile  aFile,
String  mimeType,
boolean  indexContent 
)
private

Adds the file to the index. Detects file type, calls extractors, etc.

Parameters
extractorThe textExtractor to use with this file or empty if no extractor found.
aFileFile to analyze.
mimeTypeThe file mime type.
indexContentFalse if only metadata should be text_ingested. True if content and metadata should be index.

Extract unicode strings from unallocated and unused blocks and carved text files. The reason for performing string extraction on these is because they all may contain multiple encodings which can cause text to be missed by the more specialized text extractors used below.

Definition at line 830 of file KeywordSearchIngestModule.java.

References org::sleuthkit::datamodel::TskData::TSK_DB_FILES_TYPE_ENUM.CARVED, org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org::sleuthkit::datamodel::AbstractContent.getId(), org::sleuthkit::datamodel::AbstractContent.getName(), org::sleuthkit::datamodel::AbstractFile.getNameExtension(), org::sleuthkit::datamodel::AbstractFile.getSize(), org::sleuthkit::datamodel::AbstractFile.getType(), org::sleuthkit::datamodel::AbstractFile.isDir(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.METADATA_INGESTED, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_TEXTEXTRACT, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED, org::sleuthkit::datamodel::TskData::TSK_DB_FILES_TYPE_ENUM.UNALLOC_BLOCKS, and org::sleuthkit::datamodel::TskData::TSK_DB_FILES_TYPE_ENUM.UNUSED_BLOCKS.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchTextFile ( AbstractFile  aFile)
private

Adds the text file to the index given an encoding. Returns true if indexing was successful and false otherwise.

Parameters
aFileText file to analyze

Definition at line 944 of file KeywordSearchIngestModule.java.

References org::sleuthkit::datamodel::AbstractContent.getId(), org::sleuthkit::datamodel::AbstractContent.getName(), org.sleuthkit.autopsy.textextractors.TextFileExtractor.getReader(), and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.shutDown ( )
void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.startUp ( IngestJobContext  context) throws IngestModuleException

Member Data Documentation

IngestJobContext org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context
private
FileTypeDetector org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.fileTypeDetector
private

Definition at line 241 of file KeywordSearchIngestModule.java.

final String org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IMAGE_MIME_TYPE_PREFIX = "image/"
staticprivate

Definition at line 217 of file KeywordSearchIngestModule.java.

Ingester org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingester = null
private

Definition at line 240 of file KeywordSearchIngestModule.java.

final Map<Long, Map<Long, IngestStatus> > org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingestStatus = new HashMap<>()
staticprivate

Definition at line 263 of file KeywordSearchIngestModule.java.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.initialized = false
private

Definition at line 247 of file KeywordSearchIngestModule.java.

final AtomicInteger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceCount = new AtomicInteger(0)
staticprivate

Definition at line 249 of file KeywordSearchIngestModule.java.

int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceNum = 0
private

Definition at line 250 of file KeywordSearchIngestModule.java.

long org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.jobId
private

Definition at line 248 of file KeywordSearchIngestModule.java.

final int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.LIMITED_OCR_SIZE_MIN = 100 * 1024
staticprivate

Definition at line 107 of file KeywordSearchIngestModule.java.

final Logger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())
staticprivate

Definition at line 238 of file KeywordSearchIngestModule.java.

final Map<String, Pair<BlackboardAttribute.ATTRIBUTE_TYPE, Integer> > org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_TYPES_MAP
staticprivate

A mapping of the Tika metadata key to the corresponding attribute type and the priority of that key versus other related keys (lower integer value is higher priority).

Definition at line 153 of file KeywordSearchIngestModule.java.

final ImmutableSet<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.OCR_DOCUMENTS
staticprivate
Initial value:
= ImmutableSet.of(
"application/pdf",
"application/msword",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.ms-powerpoint",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
"application/vnd.ms-excel",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
)

Definition at line 220 of file KeywordSearchIngestModule.java.

final IngestModuleReferenceCounter org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.refCounter = new IngestModuleReferenceCounter()
staticprivate

Definition at line 251 of file KeywordSearchIngestModule.java.

final IngestServices org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.services = IngestServices.getInstance()
private

Definition at line 239 of file KeywordSearchIngestModule.java.

final KeywordSearchJobSettings org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.settings
private

Definition at line 246 of file KeywordSearchIngestModule.java.

Lookup org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.stringsExtractionContext
private

Definition at line 245 of file KeywordSearchIngestModule.java.


The documentation for this class was generated from the following file:

Copyright © 2012-2024 Sleuth Kit Labs. Generated on: Mon Mar 17 2025
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.