Skip to content

Commit 93d93e8

Browse files
committed
update docs
1 parent cfc906b commit 93d93e8

File tree

2 files changed

+19
-18
lines changed

2 files changed

+19
-18
lines changed

readme.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ If you are building chatbots using commercial models, open source frameworks or
2020

2121
This project contains the:
2222
- [Online chatito IDE](https://rodrigopivi.github.io/Chatito/)
23-
- [Chatito DSL specification](https://github.com/rodrigopivi/Chatito/blob/master/spec.md)
23+
- [Chatito DSL specification](https://github.com/rodrigopivi/Chatito/blob/master/spec.md)
2424
- [DSL AST parser in pegjs format](https://github.com/rodrigopivi/Chatito/blob/master/parser/chatito.pegjs)
25-
- [Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)
25+
- [Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)
2626

2727
### Chatito language
2828
For the full language specification and documentation, please refer to the [DSL spec document](https://github.com/rodrigopivi/Chatito/blob/master/spec.md).
@@ -31,7 +31,7 @@ For the full language specification and documentation, please refer to the [DSL
3131
The language is independent from the generated output format and because each model can receive different parameters and settings, there are 3 data format adapters provided. This section describes the adapters, their specific behaviors and use cases:
3232

3333
#### Default format
34-
Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:
34+
Use the default format if you plan to train a custom model or if you are writing a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the DSL. E.g.:
3535

3636
```
3737
%[some intent]('context': 'some annotation')
@@ -46,7 +46,7 @@ Custom entities like 'context', 'required' and 'type' will be available at the o
4646

4747
#### [Rasa NLU](https://rasa.com/docs/nlu/)
4848
[Rasa NLU](https://rasa.com/docs/nlu/) is a great open source framework for training NLU models.
49-
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated rasa dataset will map the alias as a synonym. e.g.:
49+
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated Rasa dataset will map the alias as a synonym. e.g.:
5050

5151
```
5252
%[some intent]('training': '1')
@@ -60,14 +60,14 @@ One particular behavior of the Rasa adapter is that when a slot definition sente
6060
synonym 2
6161
```
6262

63-
In this example, the generated rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.
63+
In this example, the generated Rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.
6464

6565
#### [LUIS](https://www.luis.ai/)
6666
[LUIS](https://www.luis.ai/) is part of Microsoft's Cognitive services. Chatito supports training a LUIS NLU model through its [batch add labeled utterances endpoint](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c09), and its [batch testing api](https://docs.microsoft.com/en-us/azure/cognitive-services/LUIS/luis-how-to-batch-test).
6767

68-
To train a LUIS model, you will need to post the utterance in batches to the relevant api for training or testing.
68+
To train a LUIS model, you will need to post the utterance in batches to the relevant API for training or testing.
6969

70-
Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/)
70+
Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/61)
7171

7272
#### [Snips NLU](https://snips-nlu.readthedocs.io/en/latest/)
7373
[Snips NLU](https://snips-nlu.readthedocs.io/en/latest/) is another great open source framework for NLU. One particular behavior of the Snips adapter is that you can define entity types for the slots. e.g.:
@@ -81,11 +81,11 @@ Reference issue: [#61](https://github.com/rodrigopivi/Chatito/issues/)
8181
~[tomorrow]
8282
```
8383

84-
In the previous example, all `@[date]` values will be taged with the `snips/datetime` entity tag.
84+
In the previous example, all `@[date]` values will be tagged with the `snips/datetime` entity tag.
8585

8686
### NPM package
8787

88-
Chatito is supports nodejs `v8.11.2 LTS` or higher.
88+
Chatito supports Node.js `v8.11.2 LTS` or higher.
8989

9090
Install it globally:
9191
```
@@ -120,7 +120,7 @@ npx chatito <pathToFileOrDirectory> --format=<format> --formatOptions=<formatOpt
120120
121121
### Notes to prevent overfitting
122122
123-
Overfitting (https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and having probabilistic description of possible sentences. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
123+
[Overfitting](https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and a probabilistic description of possible sentences combinations. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
124124
125125
### Author and maintainer
126126
Rodrigo Pimentel

spec.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ non printable characters, this are the requirements of document source text and
6060
- Comments: Lines of text starting with '//' or '#' (no spaces before)
6161
- Imports: Lines of text starting with 'import' keyword followed by a relative filepath
6262
- Entity arguments: Optional key-values that can be declared at intents and slot definitions
63+
- Probability operator: an optional keyword declared at the start of sentences to control the probabilities.
6364

6465
### 2.1 - Entities
6566
Entities are the way to define keywords that wrap sentence variations and attach some properties to them.
@@ -83,7 +84,7 @@ added to the sentences defined inside. e.g.:
8384
hi
8485
```
8586

86-
The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the genreated utterance, that is where entity arguments can help.
87+
The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the generated utterance, that is where entity arguments can help.
8788

8889
Entity arguments are comma separated key-values declared with the entity definition inside parenthesis. Each entity argument is composed of a key, followed by the `:` symbol and the value. The argument key or value are just strings wrapped with single or double quotes, optional spaces between the parenthesis and comma are allowed, the format is similar to ndjson but only for string values.
8990

@@ -154,7 +155,7 @@ Nesting entities: Sentences defined inside a slot can only reference alias entit
154155

155156
#### 2.1.3 - Alias
156157
The alias entity is defined by the `~[` symbols at the start of a line, following by the name of the alias and `]`.
157-
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is usefull for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:
158+
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is useful for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:
158159

159160
```
160161
%[greet]
@@ -172,14 +173,14 @@ When an alias is referenced inside a slot definition, and it is the only token o
172173

173174
Alias definitions are not allowed to declare entity arguments.
174175

175-
Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops
176+
Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops.
176177

177178

178179
### 2.2 - Sentence probability operator
179180

180-
The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the this operator.
181+
The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the probability operator.
181182

182-
The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by the probability of generating the sentence (max 100) and `]`. The value inside the probability operator must by an integer betwen 1 and 100.
183+
The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by a number, the probability of generating the sentence and `]`. The value inside the probability operator must be an integer between 1 and 100, and the sum of all probability operators inside an entity definition should never exceed 100.
183184

184185
```
185186
%[greet]('training': '2', 'testing': '2')
@@ -190,11 +191,11 @@ The sentence probability operator is defined by the `*[` symbols at the start of
190191

191192
This way, it is possible to declare that from the first sentence we want 5 testing and 5 training examples (50%). The second sentence will generate 30% of the utterances. And the 20% remaining will come from the remaining possibilities of all sentences.
192193

193-
NOTE: Be carefull when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.
194+
NOTE: Be careful when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.
194195

195196
### 2.3 - Importing chatito files
196197

197-
To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file, only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.
198+
To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.
198199

199200
As an example, given two chatito files:
200201

@@ -216,7 +217,7 @@ import ./slot1.chatito
216217
```
217218

218219
The file `main.chatito` will import all alias and slot definitions from `./slot1.chatito`.
219-
The text next to the import statement should be a relative path from the main file to the imported file.
220+
The text next to the import statement should be a relative path from the main file to the imported file. Imports can be nested, and the path is always relative to the file that declares the reference.
220221

221222
Note: Chatito will throw an exception if two imports define the same entity.
222223

0 commit comments

Comments
 (0)