Inherits org.sleuthkit.autopsy.ingest.FileIngestModule.

Classes
enum	IngestStatus

enum	StringsExtractOptions

Public Member Functions
ProcessResult	process (AbstractFile abstractFile)

void	shutDown ()

void	startUp (IngestJobContext context) throws IngestModuleException

Private Member Functions
BlackboardAttribute	checkAttribute (String key, String value)

void	cleanup ()

void	createMetadataArtifact (AbstractFile aFile, Map< String, String > metadata)

boolean	extractStringsAndIndex (AbstractFile aFile)

boolean	extractTextAndSearch (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws IngesterException

Optional< TextExtractor >	getExtractor (AbstractFile abstractFile)

Reader	getTikaOrTextExtractor (Optional< TextExtractor > extractorOptional, AbstractFile aFile, Map< String, String > extractedMetadata) throws TextExtractor.InitReaderException

boolean	isLimitedOCRFile (AbstractFile aFile, String mimeType)

void	postIndexSummary ()

void	searchFile (Optional< TextExtractor > extractor, AbstractFile aFile, String mimeType, boolean indexContent)

boolean	searchTextFile (AbstractFile aFile)

Static Private Member Functions
static void	putIngestStatus (long ingestJobId, long fileId, IngestStatus status)

Private Attributes
IngestJobContext	context

FileTypeDetector	fileTypeDetector

Ingester	ingester = null

boolean	initialized = false

int	instanceNum = 0

long	jobId

final IngestServices	services = IngestServices.getInstance()

final KeywordSearchJobSettings	settings

Lookup	stringsExtractionContext

Static Private Attributes
static final String	IMAGE_MIME_TYPE_PREFIX = "image/"

static final Map< Long, Map< Long, IngestStatus > >	ingestStatus = new HashMap<>()

static final AtomicInteger	instanceCount = new AtomicInteger(0)

static final int	LIMITED_OCR_SIZE_MIN = 100 * 1024

static final Logger	logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())

static final List< String >	METADATA_DATE_TYPES

static final Map< String, BlackboardAttribute.ATTRIBUTE_TYPE >	METADATA_TYPES_MAP

static final ImmutableSet< String >	OCR_DOCUMENTS

static final IngestModuleReferenceCounter	refCounter = new IngestModuleReferenceCounter()

Detailed Description

An ingest module on a file level Performs indexing of allocated and Solr supported files, string extraction and indexing of unallocated and not Solr supported files Index commit is done periodically (determined by user set ingest update interval) Runs a periodic keyword / regular expression search on currently configured lists for ingest and writes results to blackboard Reports interesting events to Inbox and to viewers

Definition at line 90 of file KeywordSearchIngestModule.java.

Member Function Documentation

BlackboardAttribute org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.checkAttribute	(	String	key,
		String	value
	)

private

Definition at line 656 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.cleanup ( )

private

Common cleanup code when module stops or final searcher completes

Definition at line 445 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.createMetadataArtifact	(	AbstractFile	aFile,
		Map< String, String >	metadata
	)

private

Definition at line 621 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.casemodule.Case.getCurrentCaseThrows(), and org.sleuthkit.autopsy.casemodule.Case.getSleuthkitCase().

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractStringsAndIndex ( AbstractFile aFile )

private

Extract strings using heuristics from the file and add to index.

Parameters

aFile file to extract strings from, divide into chunks and index

Returns: true if the file was text_ingested, false otherwise

Definition at line 708 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context, org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.STRINGS_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractTextAndSearch	(	Optional< TextExtractor >	extractorOptional,
		AbstractFile	aFile,
		Map< String, String >	extractedMetadata
	)		throws IngesterException

private

File indexer, processes and indexes known/allocated files, unknown/unallocated files and directories accordingly Extract text with Tika or other text extraction modules (by streaming) from the file Divide the file into chunks and index the chunks

Parameters

extractorOptional	The textExtractor to use with this file or empty.
aFile	file to extract strings from, divide into chunks and index
extractedMetadata	Map that will be populated with the file's metadata.

Returns: true if the file was text_ingested, false otherwise

Exceptions

IngesterException exception thrown if indexing failed

Definition at line 566 of file KeywordSearchIngestModule.java.

Optional<TextExtractor> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getExtractor ( AbstractFile abstractFile )

private

Definition at line 535 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.textextractors.TextExtractorFactory.getExtractor(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchJobSettings.isOCREnabled(), and org.sleuthkit.autopsy.textextractors.configs.ImageConfig.setOCREnabled().

Reader org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.getTikaOrTextExtractor	(	Optional< TextExtractor >	extractorOptional,
		AbstractFile	aFile,
		Map< String, String >	extractedMetadata
	)		throws TextExtractor.InitReaderException

private

Definition at line 587 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.textextractors.TextExtractor.getMetadata(), and org.sleuthkit.autopsy.textextractors.TextExtractor.getReader().

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.isLimitedOCRFile	(	AbstractFile	aFile,
		String	mimeType
	)

private

Returns true if file should have OCR performed on it when limited OCR setting is specified.

Parameters

aFile	The abstract file.
mimeType	The file mime type.

Returns: True if file should have text extracted when limited OCR setting is on.

Definition at line 460 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.postIndexSummary ( )

private

Posts inbox message with summary of text_ingested files

Definition at line 476 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestMessage.createMessage(), org.sleuthkit.autopsy.coreutils.MessageNotifyUtil.Notify.error(), org.sleuthkit.autopsy.ingest.IngestMessage.MessageType.INFO, org.sleuthkit.autopsy.ingest.IngestServices.postMessage(), and org.sleuthkit.autopsy.coreutils.MessageNotifyUtil.Notify.warn().

ProcessResult org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.process ( AbstractFile file )

Processes a file. Called between calls to startUp() and shutDown(). Will be called for each file in a data source.

IMPORTANT: In addition to returning ProcessResult.OK or ProcessResult.ERROR, modules should log all errors using methods provided by the org.sleuthkit.autopsy.coreutils.Logger class. Log messages should include the name and object ID of the data being processed and any other information that would be useful for debugging. If an exception has been caught by the module, the exception should be sent to the logger along with the log message so that a stack trace will appear in the application log.

Parameters

file	The file to analyze.

Returns: A result code indicating success or failure of the processing.

Implements org.sleuthkit.autopsy.ingest.FileIngestModule.

Definition at line 346 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.modules.filetypeid.FileTypeDetector.getMIMEType(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchJobSettings.isOCREnabled(), org.sleuthkit.autopsy.ingest.IngestModule.ProcessResult.OK, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING.

static void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.putIngestStatus	(	long	ingestJobId,
		long	fileId,
		IngestStatus	status
	)

staticprivate

Records the ingest status for a given file for a given ingest job. Used for final statistics at the end of the job.

Parameters

ingestJobId	id of ingest job
fileId	id of file
status	ingest status of the file

Definition at line 208 of file KeywordSearchIngestModule.java.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchFile	(	Optional< TextExtractor >	extractor,
		AbstractFile	aFile,
		String	mimeType,
		boolean	indexContent
	)

private

Adds the file to the index. Detects file type, calls extractors, etc.

Parameters

extractor	The textExtractor to use with this file or empty if no extractor found.
aFile	File to analyze.
mimeType	The file mime type.
indexContent	False if only metadata should be text_ingested. True if content and metadata should be index.

Extract unicode strings from unallocated and unused blocks and carved text files. The reason for performing string extraction on these is because they all may contain multiple encodings which can cause text to be missed by the more specialized text extractors used below.

Definition at line 734 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.METADATA_INGESTED, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_INDEXING, org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.SKIPPED_ERROR_TEXTEXTRACT, and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.searchTextFile ( AbstractFile aFile )

private

Adds the text file to the index given an encoding. Returns true if indexing was successful and false otherwise.

Parameters

aFile Text file to analyze

Definition at line 848 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.textextractors.TextFileExtractor.getReader(), and org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IngestStatus.TEXT_INGESTED.

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.shutDown ( )

After all files are ingested, execute final index commit and final search Cleanup resources, threads, timers

Implements org.sleuthkit.autopsy.ingest.IngestModule.

Definition at line 401 of file KeywordSearchIngestModule.java.

References org.sleuthkit.autopsy.ingest.IngestModuleReferenceCounter.decrementAndGet(), org.sleuthkit.autopsy.ingest.IngestJobContext.fileIngestIsCancelled(), org.sleuthkit.autopsy.ingest.IngestJobContext.getJobId(), org.sleuthkit.autopsy.keywordsearch.KeywordSearch.getServer(), org.sleuthkit.autopsy.keywordsearch.Server.queryNumIndexedChunks(), and org.sleuthkit.autopsy.keywordsearch.Server.queryNumIndexedFiles().

void org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.startUp ( IngestJobContext context ) throws IngestModuleException

Initializes the module for new ingest run Sets up threads, timers, retrieves settings, keyword lists to run on

Implements org.sleuthkit.autopsy.ingest.IngestModule.

Definition at line 237 of file KeywordSearchIngestModule.java.

Member Data Documentation

IngestJobContext org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.context

private

Definition at line 187 of file KeywordSearchIngestModule.java.

Referenced by org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.extractStringsAndIndex().

FileTypeDetector org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.fileTypeDetector

private

Definition at line 176 of file KeywordSearchIngestModule.java.

final String org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.IMAGE_MIME_TYPE_PREFIX = "image/"

staticprivate

Definition at line 152 of file KeywordSearchIngestModule.java.

Ingester org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingester = null

private

Definition at line 175 of file KeywordSearchIngestModule.java.

final Map<Long, Map<Long, IngestStatus> > org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.ingestStatus = new HashMap<>()

staticprivate

Definition at line 198 of file KeywordSearchIngestModule.java.

boolean org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.initialized = false

private

Definition at line 182 of file KeywordSearchIngestModule.java.

final AtomicInteger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceCount = new AtomicInteger(0)

staticprivate

Definition at line 184 of file KeywordSearchIngestModule.java.

int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.instanceNum = 0

private

Definition at line 185 of file KeywordSearchIngestModule.java.

long org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.jobId

private

Definition at line 183 of file KeywordSearchIngestModule.java.

final int org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.LIMITED_OCR_SIZE_MIN = 100 * 1024

staticprivate

Definition at line 92 of file KeywordSearchIngestModule.java.

final Logger org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.logger = Logger.getLogger(KeywordSearchIngestModule.class.getName())

staticprivate

Definition at line 173 of file KeywordSearchIngestModule.java.

final List<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_DATE_TYPES

staticprivate

Initial value:

= ImmutableList.of(
                    "Last-Save-Date", 
                    "Last-Printed", 
                    "Creation-Date")

Definition at line 134 of file KeywordSearchIngestModule.java.

final Map<String, BlackboardAttribute.ATTRIBUTE_TYPE> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.METADATA_TYPES_MAP

staticprivate

Initial value:

= ImmutableMap.<String, BlackboardAttribute.ATTRIBUTE_TYPE>builder()
            .put("Last-Save-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_MODIFIED)
            .put("Last-Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_USER_ID)
            .put("Creation-Date", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DATETIME_CREATED)
            .put("Company", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_ORGANIZATION)
            .put("Author", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_OWNER)
            .put("Application-Name", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
            .put("Last-Printed", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_LAST_PRINTED_DATETIME)
            .put("Producer", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PROG_NAME)
            .put("Title", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_DESCRIPTION)
            .put("pdf:PDFVersion", BlackboardAttribute.ATTRIBUTE_TYPE.TSK_VERSION)
            .build()

Definition at line 139 of file KeywordSearchIngestModule.java.

final ImmutableSet<String> org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.OCR_DOCUMENTS

staticprivate

Initial value:

= ImmutableSet.of(
            "application/pdf",
            "application/msword",
            "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "application/vnd.ms-powerpoint",
            "application/vnd.openxmlformats-officedocument.presentationml.presentation",
            "application/vnd.ms-excel",
            "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
    )

Definition at line 155 of file KeywordSearchIngestModule.java.

final IngestModuleReferenceCounter org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.refCounter = new IngestModuleReferenceCounter()

staticprivate

Definition at line 186 of file KeywordSearchIngestModule.java.

final IngestServices org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.services = IngestServices.getInstance()

private

Definition at line 174 of file KeywordSearchIngestModule.java.

final KeywordSearchJobSettings org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.settings

private

Definition at line 181 of file KeywordSearchIngestModule.java.

Lookup org.sleuthkit.autopsy.keywordsearch.KeywordSearchIngestModule.stringsExtractionContext

private

Definition at line 180 of file KeywordSearchIngestModule.java.

The documentation for this class was generated from the following file:

/home/carriersleuth/repos/autopsy/KeywordSearch/src/org/sleuthkit/autopsy/keywordsearch/KeywordSearchIngestModule.java

Classes

Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Static Private Attributes

Detailed Description

Member Function Documentation

Member Data Documentation