Azdegar NLP provides an advanced and high level API for natural language processing on top of Stanford CoreNLP library.
- Transforms raw text into main and subclauses.
- Recognizes tense, voice and person of clauses.
- Recognizes subject, object, verbs and adverbs as parts of sentences.
- Fixes POS tagger bugs. Example:
This plane lands in Tehran.
CoreNLP mistakenly recognizes lands as a plural noun (NNS), while it should be recognized as a verb (VBZ). - Fixes lemmatizer bugs.
- Adds ontology through implementing OntologyRepository interface.
- Allows defining word combinations through implementing MultiWordRepository interface. Example:
Georg Wilhelm Friedrich Hegel
ortheory of types
- Allows defining phrasal verbs.
Properties properties = new Properties();
MaxentTagger tagger = new MaxentTagger("english-caseless-left3words-distsim.tagger", properties);
ParserGrammar grammar = LexicalizedParser.loadModel("englishPCFG.caseless.ser.gz");
Parser parser = new Parser(tagger, grammar);
MultiWordRepository multiWordRepository = new FarzanegiMultiWordRepository();
parser.setMultiWordRepository(multiWordRepository);
Map phrasalVerbs = new TreeMap();
phrasalVerbs.put("break", new TreeSet(Arrays.asList("in", "up", "out")));
phrasalVerbs.put("look", new TreeSet(Arrays.asList("after", "for", "into","out")));
...
parser.setPhrasalVerbs(phrasalVerbs);
Map<Integer, Clause> clauses = parser.parse(text, null);