Here is the official Python client for Crowlingo. Access to all NLP and NLU services that analyze texts regardless of the language.
You can use pip to install the library:
$ pip install PyCrowlingo
Alternatively, you can just clone the repository and run the setup.py script:
$ python setup.py install
First of all, you will need to instantiate a client of Crowlingo. You can do it using your API token:
from PyCrowlingo import Client
client = Client('<TOKEN>')
Or using your account credentials:
from PyCrowlingo import Client
client = Client(username='<EMAIL>', password='<PASSWORD>')
You can call all the endpoints available on Crowlingo. All of them are detailed with examples on the documentation.
text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
res = client.languages.detect(text)
print(res)
# => Detect(sentences=[Sentence(start=0, end=65, languages_confidence=[ConfidenceLang(name='French', code='fr', confidence=98.0)], text="Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?")], languages_confidence=[ConfidenceLang(name='French', code='fr', confidence=98.0)])
The response will be Pydantic object. So, you can get the values with the response's attributes:
print(client.languages.detect(text).languages_confidence)
# => '[ConfidenceLang(name='French', code='fr', confidence=98.0)]'
If you need to analyze texts through different services, it can be cumbersome to call the API for every step of processing. Gain some speed and productivity by using a Pipeline. It allows you to create a workflow of processing for your data. To do so, you have to use the ApiModels instead of the client function.
from PyCrowlingo import Pipeline
from PyCrowlingo.ApiModels import *
text = "On 26 April 1986, Chernobyl suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for 10 days. Clouds carrying radioactive particles drifted for thousands of miles, releasing toxic rain all over Europe. Those living close to Chernobyl - about 116,000 people - were immediately evacuated. A 30 km exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas."
pipeline = Pipeline(client, text=text)
# Put the client on the pipeline and the common variables using keywords arguments
pipeline.add(Concepts.Extract, precision=0.9).add(Entities.Extract, visualize=True).add(Entities.Duckling)
# Add each step using pipeline.add(EndpointModel, *individuals arguments)
res = pipeline.call()
# Execute the pipeline
print(res)
# => responses={'[POST] /entities/duckling': {'duckling': [{'body': 'On 26 April 1986', 'start': 0, 'value': {'values': [{'value': '1986-04-26T00:00:00.000-08:00', 'grain': 'day', 'type': 'value'}], 'value': '1986-04-26T00:00:00.000-08:00', 'grain': 'day', 'type': 'value'}, 'end': 16, 'dim': 'time', 'latent': False}, {'body': '10 days', 'start': 190, 'value': {'value': 10, 'day': 10, 'type': 'value', 'unit': 'day', 'normalized': {'value': 864000, 'unit': 'second'}}, 'end': 197, 'dim': 'duration', 'latent': False}, {'body': 'thousands', 'start': 249, 'value': {'value': 1000, 'type': 'value'}, 'end': 258, 'dim': 'number', 'latent': False}, {'body': '116,000', 'start': 347, 'value': {'value': 116000, 'type': 'value'}, 'end': 354, 'dim': 'number', 'latent': False}, {'body': 'immediately', 'start': 369, 'value': {'values': [{'value': '2020-05-25T15:57:30.724-07:00', 'grain': 'second', 'type': 'value'}], 'value': '2020-05-25T15:57:30.724-07:00', 'grain': 'second', 'type': 'value'}, 'end': 380, 'dim': 'time', 'latent': False}, {'body': '30 km', 'start': 394, 'value': {'value': 30, 'type': 'value', 'unit': 'kilometre'}, 'end': 399, 'dim': 'distance', 'latent': False}]}, '[POST] /entities/extract': {'entities': [{'start': 3, 'end': 16, 'ent_type': 'DATE', 'text': '26 April 1986'}, {'start': 18, 'end': 27, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 190, 'end': 197, 'ent_type': 'DATE', 'text': '10 days'}, {'start': 249, 'end': 267, 'ent_type': 'QUANTITY', 'text': 'thousands of miles'}, {'start': 299, 'end': 305, 'ent_type': 'LOC', 'text': 'Europe'}, {'start': 329, 'end': 338, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 341, 'end': 354, 'ent_type': 'CARDINAL', 'text': 'about 116,000'}, {'start': 394, 'end': 399, 'ent_type': 'QUANTITY', 'text': '30 km'}], 'visualization': '<div class="entities" style="line-height: 2.5; direction: ltr">On \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 26 April 1986\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n, \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Chernobyl\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 10 days\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n. Clouds carrying radioactive particles drifted for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n thousands of miles\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n, releasing toxic rain all over \n<mark class="entity" style="background: #ff9561; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Europe\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">LOC</span>\n</mark>\n. Those living close to \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Chernobyl\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n - \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n about 116,000\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">CARDINAL</span>\n</mark>\n people - were immediately evacuated. A \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 30 km\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas.</div>'}, '[POST] /concepts/extract': {'concepts': [{'id': 'Q129677', 'weight': 0.19254024269693001, 'labels': [{'text': 'Chernobyl', 'mentions': [{'start': 18, 'end': 27}, {'start': 329, 'end': 338}]}]}, {'id': 'Q11448', 'weight': 0.13384788867053848, 'labels': [{'text': 'radioactive', 'mentions': [{'start': 215, 'end': 226}]}, {'text': 'radiation', 'mentions': [{'start': 176, 'end': 185}]}]}, {'id': 'Q46', 'weight': 0.11258210752213413, 'labels': [{'text': 'Europe', 'mentions': [{'start': 299, 'end': 305}]}]}, {'id': 'Q274160', 'weight': 0.07002172766602058, 'labels': [{'text': 'toxic', 'mentions': [{'start': 279, 'end': 284}]}]}, {'id': 'Q7925', 'weight': 0.06886892370214791, 'labels': [{'text': 'rain', 'mentions': [{'start': 285, 'end': 289}]}]}, {'id': 'Q101965', 'weight': 0.06562043143894636, 'labels': [{'text': 'experiment', 'mentions': [{'start': 76, 'end': 86}]}]}, {'id': 'Q3196', 'weight': 0.06482017292518794, 'labels': [{'text': 'fire', 'mentions': [{'start': 158, 'end': 162}]}]}, {'id': 'Q356936', 'weight': 0.06390318225879862, 'labels': [{'text': 'exclusion zone', 'mentions': [{'start': 400, 'end': 414}]}]}, {'id': 'Q486', 'weight': 0.06317545950269358, 'labels': [{'text': 'nuclear disaster', 'mentions': [{'start': 55, 'end': 71}]}, {'text': 'disaster', 'mentions': []}]}, {'id': 'Q11369', 'weight': 0.057931103203040506, 'labels': [{'text': 'particles', 'mentions': [{'start': 227, 'end': 236}]}]}, {'id': 'Q8074', 'weight': 0.05530684102502764, 'labels': [{'text': 'Clouds', 'mentions': [{'start': 199, 'end': 205}]}]}, {'id': 'Q11573', 'weight': 0.05138191938853427, 'labels': [{'text': 'km', 'mentions': [{'start': 397, 'end': 399}]}]}]}}
print(res.responses[Entities.Extract.eid()])
# => {'entities': [{'start': 3, 'end': 16, 'ent_type': 'DATE', 'text': '26 April 1986'}, {'start': 18, 'end': 27, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 190, 'end': 197, 'ent_type': 'DATE', 'text': '10 days'}, {'start': 249, 'end': 267, 'ent_type': 'QUANTITY', 'text': 'thousands of miles'}, {'start': 299, 'end': 305, 'ent_type': 'LOC', 'text': 'Europe'}, {'start': 329, 'end': 338, 'ent_type': 'GPE', 'text': 'Chernobyl'}, {'start': 341, 'end': 354, 'ent_type': 'CARDINAL', 'text': 'about 116,000'}, {'start': 394, 'end': 399, 'ent_type': 'QUANTITY', 'text': '30 km'}], 'visualization': '<div class="entities" style="line-height: 2.5; direction: ltr">On \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 26 April 1986\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n, \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Chernobyl\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n suffered the world’s worst nuclear disaster. An experiment designed to test the safety of the power plant went wrong and caused a fire which spewed radiation for \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 10 days\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">DATE</span>\n</mark>\n. Clouds carrying radioactive particles drifted for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n thousands of miles\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n, releasing toxic rain all over \n<mark class="entity" style="background: #ff9561; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Europe\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">LOC</span>\n</mark>\n. Those living close to \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n Chernobyl\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n - \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n about 116,000\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">CARDINAL</span>\n</mark>\n people - were immediately evacuated. A \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n 30 km\n <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem">QUANTITY</span>\n</mark>\n exclusion zone was imposed around the damaged reactor. This was later expanded to cover more affected areas.</div>'}
# EndpointModel.ied() returns the id of endpoint which is used in the response
Most of the time, you will need to apply this process on a dataset. Again, you will gain speed by using bulk request. It allows to perform many operations in the same time. Here is an example on how to do it:
from PyCrowlingo import Bulk, Pipeline
from PyCrowlingo.ApiModels import *
text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
pipelines = [Pipeline().add(Languages.Detect, text=text)] * 300
res = Bulk(client, pipelines).call()
assert len(res.responses) == 300 # True
You can also do it in an iterative way:
from PyCrowlingo import Bulk, Pipeline
from PyCrowlingo.ApiModels import *
text = "Est-il recommandé d'utiliser MongoDb pour indexer mes documents ?"
bulk = Bulk(client)
for i in range(300):
bulk.add(Pipeline().add(Languages.Detect, text=text))
res = bulk.call()
assert len(res.responses) == 300 # True
Using a bulk will automatically make API requests using batch (you can controle its size using batch_size
argument). So that, you don't have to worry about the management of the query size.
Sometimes, you may face error when you call an endpoint. Every errors are identifiable by their ID. It can be easily managed on a pythonic way:
from PyCrowlingo.Errors import ModelNotFound, CrowlingoException
model_id = "AskUbuntu"
try:
client.classifier.clear_model(model_id)
except ModelNotFound:
client.classifier.create_model(model_id)
except CrowlingoException as e:
print(e)
Here is the list of available exceptions:
Class | Error ID | Status code | Description |
---|---|---|---|
TrainingError | TRAINING_ERROR | 400 | An error happened during the training |
TokenNotFound | TOKEN_NOT_FOUND | 401 | Token not found. Insert your token in the query parameter with api_key=[YOUR_TOKEN] or in the headers with x-api-key:[YOUR TOKEN]. |
BadCredentials | BAD_CREDENTIALS | 401 | Could not validate credentials. Their might be an error in your token or email/password. Maybe your account has been disabled. Please contact us if you do not understand the reason. |
TestModelForbidden | TEST_MODEL_FORBIDDEN | 403 | You do not have access to the test version of this model. Ask the access to the owner of the model or use the prod version of this model. |
BadModelsPerms | BAD_MODELS_PERMS | 403 | You do not have the permissions to perform this action on this model. Ask for the owner of this model to provides you more rights. |
BadModelCategory | BAD_MODEL_CATEGORY | 404 | This model cannot be use for this kind of request. Create a new model or use another endpoint. |
ModelNotDeployed | MODEL_NOT_DEPLOYED | 404 | This model is not deployed. Use the test model or deploy it first. |
CollaboratorNotFound | COLLABORATOR_NOT_FOUND | 404 | This collaborator was not found. Maybe it has already delete the model or you did not add it as collaborator on this model. |
ModelNotFound | MODEL_NOT_FOUND | 404 | We cannot find a model with this id. You have to create a model before using it. |
DocumentNotFound | DOCUMENT_NOT_FOUND | 404 | We cannot find a document with this id. You have to create a document before using it. |
DuplicateModelId | DUPLICATE_MODEL_ID | 409 | You already have a model with this id, please delete the model first if you want to overwrite it or use the endpoint update to create a new version of this model. |
ContentLengthRequired | CONTENT_LENGTH_REQUIRED | 411 | You need to provide a content length header for POST and PATCH requests. |
RequestEntityTooLarge | REQUEST_ENTITY_TOO_LARGE | 413 | The payload of your body is too large. Try to split your request with smaller payload. |
BadParametersQuery | BAD_PARAMETERS_QUERY | 422 | The parameters of the query do not correspond to the documentation description. The query cannot be processed. |
ModelNotTrained | MODEL_NOT_TRAINED | 423 | This model is not trained yet. You have to wait until it is trained or run the training before performing this action. |
MinuteLimitReached | MINUTE_LIMIT_REACHED | 429 | Minute limit reached, wait the number of seconds indicated by the header: x-minute-reset or change subscription plan. |
PeriodLimitReached | PERIOD_LIMIT_REACHED | 429 | Period limit reached, wait the number of seconds indicated by the header: x-period-reset or change subscription plan. |
ModelsLimitReached | MODELS_LIMIT_REACHED | 429 | You have reached the maximal number of custom models. If you want to create a new one, you have to delete one of your custom models first or change your subscription plan. |
InternalError | INTERNAL_ERROR | 500 | Internal Error, we have been notified and will fix the problem as soon as possible. Try again later and do not hesitate to contact us if you need help. |
If you want to build custom models, you will have to upload your dataset. You can do it automatically on a CSV by using the function classifier.upload_documents
.
client.classifier.upload_csv(model_id, "data.csv", fieldnames=["text", "class_id"], delimiter="\t")
It will split the dataset in several parts to avoid exceed the payload size limit. If you have a more specific dataset format, you can do it by using the functions listed on the API documentation.
Some functions of Crowlingo might be long, so they are asynchronous. that means it will send you a response before the end of the process. For each one, you have a function to watch the progression and wait until the end of the task. Here are the functions to wait for each task:
Async Function | Wait Function |
---|---|
client.model.train |
client.model.wait_training |
client.model.deploy |
client.model.wait_deploying |
client.search_engine.create_documents |
client.search_engine.wait_indexing |
For example, use these lines to train, and wait until the model is deployed:
client.model.train(model_id)
client.model.wait_training(model_id)
client.model.deploy(model_id)
client.model.wait_deploying(model_id)
Crowlingo services can be very useful to create a polyglot chatbot using an existing one. The easiest way is to do it through Rasa. PyCrowlingo provides packages to easily integrate on Rasa.
To install rasa dependencies, simply enter the following command:
pip install PyCrowlingo[rasa]
Follow the Rasa quick start guide to build your chatbot.
Open the file config.yml and modify the pipeline to integrate Crowlingo NLU components.
Here is an example of a chatbot created with Rasa quick start guide:
language: en
pipeline:
- name: PyCrowlingo.Rasa.EntitiesExtractor
token: "<TOKEN>"
- name: PyCrowlingo.Rasa.IntentClassifier
token: "<TOKEN>"
model_id: "intent_rasa"
Train the model:
rasa train
And now, enjoy your multilingual chatbot:
rasa shell
>>> Your input -> Bonjour !
<<< Hey! How are you ?
>>> Your input -> Va bene :)
<<< Great! Carry on!
>>> Your input -> Bist du ein Roboter oder ein Mensch?
<<< I am a bot powered by Rasa