Autopsy  4.19.3
Graphical digital forensics platform for The Sleuth Kit and other tools.
Classes | Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | Static Private Attributes | List of all members
org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule Class Reference

Inherits org.sleuthkit.autopsy.ingest.FileIngestModule.

Classes

enum  IngestStatus
 
enum  StringsExtractOptions
 

Public Member Functions

ProcessResult process (AbstractFile abstractFile)
 
void shutDown ()
 
void startUp (IngestJobContext context) throws IngestModuleException
 

Private Member Functions

BlackboardAttribute checkAttribute (String key, String value)
 
void cleanup ()
 
void createMetadataArtifact (AbstractFile aFile, Map< String, String > metadata)
 
boolean extractStringsAndIndex (AbstractFile aFile)
 
boolean extractTextAndSearch (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws IngesterException
 
Optional< TextExtractorgetExtractor (AbstractFile abstractFile)
 
Reader getTikaOrTextExtractor (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws TextExtractor.InitReaderException
 
boolean isLimitedOCRFile (AbstractFile aFile, String mimeType)
 
void postIndexSummary ()
 
void searchFile (Optional< TextExtractor > extractor, AbstractFile aFile, String mimeType, boolean indexContent)
 
boolean searchTextFile (AbstractFile aFile)
 

Static Private Member Functions

static void putIngestStatus (long ingestJobId, long fileId, IngestStatus status)
 

Private Attributes

IngestJobContext context
 
FileTypeDetector fileTypeDetector
 
Ingester ingester = null
 
boolean initialized = false
 
int instanceNum = 0
 
long jobId
 
final IngestServices services = IngestServices.getInstance()
 
final KeywordSearchJobSettings settings
 
Lookup stringsExtractionContext
 

Static Private Attributes

static final String IMAGE_MIME_TYPE_PREFIX = "image/"
 
static final Map< Long, Map< Long, IngestStatus > > ingestStatus = new HashMap<>()
 
static final AtomicInteger instanceCount = new AtomicInteger(0)
 
static final int LIMITED_OCR_SIZE_MIN = 100 * 1024
 
static final Logger logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())
 
static final List< String > METADATA_DATE_TYPES
 
static final Map< String, BlackboardAttribute.ATTRIBUTE_TYPE > METADATA_TYPES_MAP
 
static final ImmutableSet< String > OCR_DOCUMENTS
 
static final IngestModuleReferenceCounter refCounter = new IngestModuleReferenceCounter()
 

Detailed Description

An ingest module on a file level Performs indexing of allocated and Solr supported files, string extraction and indexing of unallocated and not Solr supported files Index commit is done periodically (determined by user set ingest update interval) Runs a periodic keyword / regular expression search on currently configured lists for ingest and writes results to blackboard Reports interesting events to Inbox and to viewers

Definition at line 90 of file KeywordSearchIngestModule.java.

Member Function Documentation

BlackboardAttribute org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.checkAttribute ( String  key,
String  value 
)
private

Definition at line 656 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.cleanup ( )
private

Common cleanup code when module stops or final searcher completes

Definition at line 445 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.createMetadataArtifact ( AbstractFile  aFile,
Map< String, String >  metadata 
)
private
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractStringsAndIndex ( AbstractFile  aFile)
private

Extract strings using heuristics from the file and add to index.

Parameters
aFilefile to extract strings from, divide into chunks and index
Returns
true if the file was text_ingested, false otherwise

Definition at line 708 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context, org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.STRINGS_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractTextAndSearch ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws IngesterException
private

File indexer, processes and indexes known/allocated files, unknown/unallocated files and directories accordingly Extract text with Tika or other text extraction modules (by streaming) from the file Divide the file into chunks and index the chunks

Parameters
extractorOptionalThe textExtractor to use with this file or empty.
aFilefile to extract strings from, divide into chunks and index
extractedMetadataMap that will be populated with the file's metadata.
Returns
true if the file was text_ingested, false otherwise
Exceptions
IngesterExceptionexception thrown if indexing failed

Definition at line 566 of file KeywordSearchIngestModule.java.

Optional<TextExtractor> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getExtractor ( AbstractFile  abstractFile)
private
Reader org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getTikaOrTextExtractor ( Optional< TextExtractor extractorOptional,
AbstractFile  aFile,
Map< String, String >  extractedMetadata 
) throws TextExtractor.InitReaderException
private
boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.isLimitedOCRFile ( AbstractFile  aFile,
String  mimeType 
)
private

Returns true if file should have OCR performed on it when limited OCR setting is specified.

Parameters
aFileThe abstract file.
mimeTypeThe file mime type.
Returns
True if file should have text extracted when limited OCR setting is on.

Definition at line 460 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.postIndexSummary ( )
private
ProcessResult org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.process ( AbstractFile  file)

Processes a file. Called between calls to startUp() and shutDown(). Will be called for each file in a data source.

IMPORTANT: In addition to returning ProcessResult.OK or ProcessResult.ERROR, modules should log all errors using methods provided by the org.sleuthkit.autopsy.coreutils.Logger class. Log messages should include the name and object ID of the data being processed and any other information that would be useful for debugging. If an exception has been caught by the module, the exception should be sent to the logger along with the log message so that a stack trace will appear in the application log.

Parameters
fileThe file to analyze.
Returns
A result code indicating success or failure of the processing.

Implements org.sleuthkit.autopsy.ingest.FileIngestModule.

Definition at line 346 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.modules.filetypeid.FileTypeDetector.getMIMEType(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchJobSettings.isOCREnabled(), org.sleuthkit.autopsy.ingest.IngestModule.ProcessResult.OK, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING.

static void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.putIngestStatus ( long  ingestJobId,
long  fileId,
IngestStatus  status 
)
staticprivate

Records the ingest status for a given file for a given ingest job. Used for final statistics at the end of the job.

Parameters
ingestJobIdid of ingest job
fileIdid of file
statusingest status of the file

Definition at line 208 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchFile ( Optional< TextExtractor extractor,
AbstractFile  aFile,
String  mimeType,
boolean  indexContent 
)
private

Adds the file to the index. Detects file type, calls extractors, etc.

Parameters
extractorThe textExtractor to use with this file or empty if no extractor found.
aFileFile to analyze.
mimeTypeThe file mime type.
indexContentFalse if only metadata should be text_ingested. True if content and metadata should be index.

Extract unicode strings from unallocated and unused blocks and carved text files. The reason for performing string extraction on these is because they all may contain multiple encodings which can cause text to be missed by the more specialized text extractors used below.

Definition at line 734 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.METADATA_INGESTED, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_TEXTEXTRACT, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchTextFile ( AbstractFile  aFile)
private

Adds the text file to the index given an encoding. Returns true if indexing was successful and false otherwise.

Parameters
aFileText file to analyze

Definition at line 848 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.textextractors.TextFileExtractor.getReader(), and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.shutDown ( )
void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.startUp ( IngestJobContext  context) throws IngestModuleException

Member Data Documentation

IngestJobContext org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context
private
FileTypeDetector org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.fileTypeDetector
private

Definition at line 176 of file KeywordSearchIngestModule.java.

final String org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IMAGE_MIME_TYPE_PREFIX = "image/"
staticprivate

Definition at line 152 of file KeywordSearchIngestModule.java.

Ingester org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingester = null
private

Definition at line 175 of file KeywordSearchIngestModule.java.

final Map<Long, Map<Long, IngestStatus> > org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingestStatus = new HashMap<>()
staticprivate

Definition at line 198 of file KeywordSearchIngestModule.java.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.initialized = false
private

Definition at line 182 of file KeywordSearchIngestModule.java.

final AtomicInteger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceCount = new AtomicInteger(0)
staticprivate

Definition at line 184 of file KeywordSearchIngestModule.java.

int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceNum = 0
private

Definition at line 185 of file KeywordSearchIngestModule.java.

long org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.jobId
private

Definition at line 183 of file KeywordSearchIngestModule.java.

final int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.LIMITED_OCR_SIZE_MIN = 100 * 1024
staticprivate

Definition at line 92 of file KeywordSearchIngestModule.java.

final Logger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())
staticprivate

Definition at line 173 of file KeywordSearchIngestModule.java.

final List<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_DATE_TYPES
staticprivate
Initial value:
= ImmutableList.of(
"Last-Save-Date",
"Last-Printed",
"Creation-Date")

Definition at line 134 of file KeywordSearchIngestModule.java.

final Map<String, BlackboardAttribute.ATTRIBUTE_TYPE> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_TYPES_MAP
staticprivate
Initial value:
= ImmutableMap.<String, BlackboardAttribute.ATTRIBUTE_TYPE>builder()
.put("Last-Save-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_MODIFIED)
.put("Last-Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_USER_ID)
.put("Creation-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_CREATED)
.put("Company", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_ORGANIZATION)
.put("Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_OWNER)
.put("Application-Name", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
.put("Last-Printed", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_LAST_PRINTED_DATETIME)
.put("Producer", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
.put("Title", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DESCRIPTION)
.put("pdf:PDFVersion", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_VERSION)
.build()

Definition at line 139 of file KeywordSearchIngestModule.java.

final ImmutableSet<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.OCR_DOCUMENTS
staticprivate
Initial value:
= ImmutableSet.of(
"application/pdf",
"application/msword",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.ms-powerpoint",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
"application/vnd.ms-excel",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
)

Definition at line 155 of file KeywordSearchIngestModule.java.

final IngestModuleReferenceCounter org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.refCounter = new IngestModuleReferenceCounter()
staticprivate

Definition at line 186 of file KeywordSearchIngestModule.java.

final IngestServices org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.services = IngestServices.getInstance()
private

Definition at line 174 of file KeywordSearchIngestModule.java.

final KeywordSearchJobSettings org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.settings
private

Definition at line 181 of file KeywordSearchIngestModule.java.

Lookup org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.stringsExtractionContext
private

Definition at line 180 of file KeywordSearchIngestModule.java.


The documentation for this class was generated from the following file:

Copyright © 2012-2022 Basis Technology. Generated on: Tue Jun 27 2023
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.