-
Notifications
You must be signed in to change notification settings - Fork 3
Post Processors
A post processor service is a type that configured in a SearchContext
will process a sequence of ResultInfo
producing a new one. It must be a subtype of PostProcessor
and must override the following abstract method:
IEnumerable<ResultInfo> Process(IEnumerable<ResultInfo> results)
It's also mandatory to define a constructor that accepts a single object
parameter, used for the settings of the specific post processor.
PickAll comes with following built-in post processors:
- Uniqueness: removes duplicate results by URL.
- Order: orders results placing indexes of same number close by each other.
- FuzzyMatch: compares a string against results descriptions.
- Improve: improves results computing word frequency to perform a subsequent search.
- Textify: extract all text from documents of results URLs.
FuzzyMatch
post processors computes Levenshtein Distance between a given string and results descriptions. If the distance is out of the specified range, the result will be excluded. Is configured as follows:
var context = SearchContext.Default
.With<FuzzyMatch>(new FuzzyMatchSettings {
Text = options.FuzzyMatch,
MaximumDistance = 10 }); // MinimumDistance default is 0
Improve
post processor reduces results descriptions to words, than computes the more frequents to be used in the query of a subsequent search. It's configured as follows:
var context = SearchContext.Default
.With<Improve>(
new ImproveSettings {
WordCount = 2,
NoiseLength = 3});
In this case it will consider only the first two more frequent words. All words with a length of 3 caracthers or less will be excluded from the computation.
Textify
extracts all text from each URL of results. It follows a configuration sample:
var context = SearchContext.Default
.With<Textify>(
new TextifySettings {
IncludeTitle = true,
NoiseLength = 3}); // Textify doesn't support NoiseLength
Data is presented in different ways:
ResultInfo result = results.First();
// Wordify
IEnumerable<string> words = (WordifyData)result.Data).Words;
// Textify
string text = (Textify)result.Data).Text;
By default Textify
doesn't sanitize text (set TextifySettings.SanitizeText
to true
, will guarantee only alphanumeric text).