Use Cases

Goal

The primary goal of tq-asr-documizer is to convert documents into machine-processable resources. In this system, text documents are processed with these steps:

Create an instance of an IDocument which serves as a container for all metadata, text resources, and processing state records
Where paragraphs exist, create collections of IParagraph objects which serve as containers for metadata, processing state records, and sentences
Create instances of ISentence objects for every sentence. Those serve as containers for all metadata, the sentence itself, and all processing state records.

Use Cases

Batch Processing

Static Collections

In general, a core function is to process document collections such as, but not limited to:

PubMed abstracts
PubMed full text documents
Text books from PDF files (e.g. open text books)
Other documents from PDF and other files

Dynamic Collections

Dynamic collections are those being driven by:

Web spiders
Carrot 2 clustered searches

On Demand Processing

This is fundamentally a kind of local web services feature, in which various operations in the OpenSherlock ecosystem can ask for a search on a topic, or a particular URL.

In the case of a web search, the system performs the search and harvests received documents
In the case of a particular URL, the system fetches and harvests the page.

In all cases, it is important to realize that this system maintains a record of all documents it has already fetched. Unless otherwise instructed, it will not re-fetch documents it already has on record.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Cases

Goal

Use Cases

Batch Processing

Static Collections

Dynamic Collections

On Demand Processing

Clone this wiki locally