-
Notifications
You must be signed in to change notification settings - Fork 4
Getting_started
Lara is a lightweight Python3 NLP library for ChatBots written in Hungarian language. The parser Class is capable of matching inflected forms of keywords in text messages written in Hungarian. Lara also comes with a collection of common NLP functions for text processing and can identify common small talk topics for Chatbot interactions.
Due to the complexity of the Hungarian language, most known stemmers and lemmatisers either fail to find the correct lemmas or require a lot of computational power while relying on large dictionaries. Lara provides a smart workaround for this, by tackling the problem the other way around. The user can provide a set of root words and their word classes, and Lara will automatically create complex regular expressions to match most of the root words' possible inflected forms. The user can then match any root word with a given text and check whether any inflected forms of that word are present. However, it is worth noting that this method could also return false positives for certain words.
Lara is perfect for developing ChatBots in Hungarian language, where certain keywords would trigger certain answers. The Class will allow developers to easily match almost every possible inflected forms of any keyword in Hungarian language. For example:
from lara import parser
igekoto_intents = {
"to_do" : [{"stem":"csinál","wordclass":"verb"}],
}
igekoto_test = parser.Intents(igekoto_intents)
Will match the intent "to_do"
in the following sentences:
- Ő mit csinál a szobában?
- Mit fogok még csinálni?
- Mikor csináltad meg a szekrényt?
- Megcsináltatták a berendezést.
- Teljesen kicsinálva érzem magamat ettől a melegtől.
- Csinálhatott volna mást is.
- Visszacsinalnad az ekezeteket a billentyuzetemen, kerlek?
The Class also comes with some basic NLP functions that are most useful for processing short texts in Hungarian. Please note, that despite being an NLP Class, Lara is currently incompatible with languages other than Hungarian. It was developed with the focus on all the quirks and specialties of the Hungarian grammar in mind and was not meant to be an equally useful processing tool for all languages.
- providing you with functions that detect common expressions, small talk topics and even pop culture references in the user's chat messages
- extracting named entities and meta information from messages
- extracting time, date and other numerical information from messages and converting them to uniform formats
- providing NLP functions that can easily process text in Hungarian for information retrieval, and create features for Machine Learning models