Skip to content

Aho Corasick pattern matching algorithm implementation

License

Notifications You must be signed in to change notification settings

Immortale-dev/ahoq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aho Corasick JS

Aho Corasick pattern search algorithm that provides linear time search searching of bunch of patterns.

Time complexity: O(m) for preparation and O(n+r) for searching where

  • n - length of the searching text
  • m - the summ of length of all searching patterns
  • r - search result count

Requirements

  • NodeJS: 13.2+
  • NPM: 6.13+

Install

npm install ahoq

Docs

Package provides couple of classes:

  • AhoQ - main class
  • AhoQSearch - class returned from AhoQ::search public method and provides ability for advanced usage of searching algorithm described below.
  • AhoQSearchResult - search result item type
  • AhoQOptions - search options typeˇ

AhoQ


Main class used for pattern searching

constructor(patterns: string[])

Accepts the list of patterns as a string array and creates a AhoQ instance.

add(patterns: string[]): void

Accepts the list of patterns, adds new pattern list to the existing one and rebuild internal searching structure.

remove(patterns: string[]): void

Accepts the list of patterns, removes passed patterns from the existing lit and rebuild internal searching structure.

search(text: string, options: AhoQOptions): AhoQSearch

Main method to search the patterns in the text. It requires text to be passed as the first parameter, and options - optional parameter where you can set the search options.

It returns AhoQSearch instance that could be used to get the search results.

Example:

const ahoq = new AhoQ(['a', 'b', 'c']);
const ahoqSearch = ahoq.search('asdzxcqweb');
const results = [...ahoqSearch];

find(text: string): AhoQSearchResult[]

Another method to search the patterns in text. This method is much simpler. It accepts the text that will be searched, and it returns the array of search results.

Example:

const ahoq =  new  AhoQ(['a',  'b',  'c']);
const results = ahoq.find('asdzxcqweb');

AhoQSearch


Class for retrieving search results Constructing the object manually is not allowed. You can get the instance of AhoQSearch class by calling Ahoq::search method.

This class manage internal states of Aho Corasick data structure, and you can lazy search the patterns and manage the states and position in text which gives you a lot of flexibility and potential performance optimisation (depends on the tasks you want to perform)

Class implements iterator methods, so you can use for of and spread operator.

Notice: You can find couple of examples of using advanced searching in tests

getPos(): number

returns position in text searching is performing now

setPos(pos: number): void

accepts the number, that will be set as the new position in the text.

reset(): void

Resets the internal searching state, e.g. if the word stating to match, it will reset the current match.

reload(): void

Reloads the searching, e.g. resets the state and sets text position to 0.

next(): {value: AhoQSearchResult[], done: boolean}

Proceed the search and returns the next search results. This method used as an iterator next method.

AhoQOptions


Data class that represents search options

Contains one property: report: AhoQ.REPORT

There is 2 options in AhoQ.REPORT enum:

  • MATCH
  • STEP

MATCH used by default.

In case of MATCH report property used, the AhoQSearch::next method will return search results in case of any match found, and in case of STEP report value used, AhoQSearch::next will return on every character in the searching text.

AhoQSearchResult


Search result data class returned from AhoQ::find and AhoQSearch::next methods.

Class contain 2 properties:

  • pattern: String
  • index: number

representing the pattern found in the text (notice, pattern returned not as primitive type, but as a String object) and index in text where the pattern starts.

Example

import {AhoQ} from 'ahoq'

const ahoq = new AhoQ(['ab', 'aa', 'bc', 'abc', 'aaa', 'aab']);
let result = ahoq.find('aabbcaabc');

/**
 * result:
 * [
 *  {index:0, pattern: 'aa'},
 *  {index:0, pattern: 'aab'},
 *  {index:1, pattern: 'ab'},
 *  {index:3, pattern: 'bc'},
 *  {index:5, pattern: 'aa'},
 *  {index:5, pattern: 'aab'},
 *  {index:6, pattern: 'ab'},
 *  {index:6, pattern: 'abc'},
 *  {index:7, pattern: 'bc'}
 * ]
 */

ahoq.remove(['ab', 'bc', 'aaa', 'aa']);
const search = ahoq.search('asdasabcsdaabbbasd');

for (const it of search) {
	// it - contains next results
	// you can change position in the text, or reset the state
	// or whatever
}

Time complexity

  • O(n + r) - for searching, where n corresponds to the length of searching text, and r represents the number of found
  • O(m) - for preparing (building internal structures), where m represents the sum of all the pattern lengths used in search.

Internal structure rebuilds on constructing the AhoQ instance, and every time AhoQ::add and AhoQ::remove used.

License

MIT

Have fun! :)

About

Aho Corasick pattern matching algorithm implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published