You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[DSL AST parser in pegjs format](https://github.com/rodrigopivi/Chatito/blob/master/parser/chatito.pegjs)
25
-
-[Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)
25
+
-[Generator implemented in typescript + npm package](https://github.com/rodrigopivi/Chatito/tree/master/src)
26
26
27
27
### Chatito language
28
28
For the full language specification and documentation, please refer to the [DSL spec document](https://github.com/rodrigopivi/Chatito/blob/master/spec.md).
@@ -31,7 +31,7 @@ For the full language specification and documentation, please refer to the [DSL
31
31
The language is independent from the generated output format and because each model can receive different parameters and settings, there are 3 data format adapters provided. This section describes the adapters, their specific behaviors and use cases:
32
32
33
33
#### Default format
34
-
Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:
34
+
Use the default format if you plan to train a custom model or if you are writing a custom adapter. This is the most flexible format because you can annotate `Slots` and `Intents` with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the DSL. E.g.:
35
35
36
36
```
37
37
%[some intent]('context': 'some annotation')
@@ -46,7 +46,7 @@ Custom entities like 'context', 'required' and 'type' will be available at the o
46
46
47
47
#### [Rasa NLU](https://rasa.com/docs/nlu/)
48
48
[Rasa NLU](https://rasa.com/docs/nlu/) is a great open source framework for training NLU models.
49
-
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated rasa dataset will map the alias as a synonym. e.g.:
49
+
One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated Rasa dataset will map the alias as a synonym. e.g.:
50
50
51
51
```
52
52
%[some intent]('training': '1')
@@ -60,14 +60,14 @@ One particular behavior of the Rasa adapter is that when a slot definition sente
60
60
synonym 2
61
61
```
62
62
63
-
In this example, the generated rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.
63
+
In this example, the generated Rasa dataset will contain the `entity_synonyms` of `synonym 1` and `synonym 2` mapping to `some slot synonyms`.
64
64
65
65
#### [LUIS](https://www.luis.ai/)
66
66
[LUIS](https://www.luis.ai/) is part of Microsoft's Cognitive services. Chatito supports training a LUIS NLU model through its [batch add labeled utterances endpoint](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c09), and its [batch testing api](https://docs.microsoft.com/en-us/azure/cognitive-services/LUIS/luis-how-to-batch-test).
67
67
68
-
To train a LUIS model, you will need to post the utterance in batches to the relevant api for training or testing.
68
+
To train a LUIS model, you will need to post the utterance in batches to the relevant API for training or testing.
[Snips NLU](https://snips-nlu.readthedocs.io/en/latest/) is another great open source framework for NLU. One particular behavior of the Snips adapter is that you can define entity types for the slots. e.g.:
Overfitting(https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and having probabilistic description of possible sentences. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
123
+
[Overfitting](https://en.wikipedia.org/wiki/Overfitting) is a problem that can be prevented if we use Chatito correctly. The idea behind this tool, is to have an intersection between data augmentation and a probabilistic description of possible sentences combinations. It is not intended to generate deterministic datasets, you should avoid generating all possible combinations.
Copy file name to clipboardExpand all lines: spec.md
+9-8Lines changed: 9 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,7 @@ non printable characters, this are the requirements of document source text and
60
60
- Comments: Lines of text starting with '//' or '#' (no spaces before)
61
61
- Imports: Lines of text starting with 'import' keyword followed by a relative filepath
62
62
- Entity arguments: Optional key-values that can be declared at intents and slot definitions
63
+
- Probability operator: an optional keyword declared at the start of sentences to control the probabilities.
63
64
64
65
### 2.1 - Entities
65
66
Entities are the way to define keywords that wrap sentence variations and attach some properties to them.
@@ -83,7 +84,7 @@ added to the sentences defined inside. e.g.:
83
84
hi
84
85
```
85
86
86
-
The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the genreated utterance, that is where entity arguments can help.
87
+
The previous example will generate all possible unique examples for greet (in this case 2 utterances). But there are cases where there is no need to generate all utterances, or when we want to attach some extra properties to the generated utterance, that is where entity arguments can help.
87
88
88
89
Entity arguments are comma separated key-values declared with the entity definition inside parenthesis. Each entity argument is composed of a key, followed by the `:` symbol and the value. The argument key or value are just strings wrapped with single or double quotes, optional spaces between the parenthesis and comma are allowed, the format is similar to ndjson but only for string values.
89
90
@@ -154,7 +155,7 @@ Nesting entities: Sentences defined inside a slot can only reference alias entit
154
155
155
156
#### 2.1.3 - Alias
156
157
The alias entity is defined by the `~[` symbols at the start of a line, following by the name of the alias and `]`.
157
-
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is usefull for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:
158
+
Alias are just variations of a word and does not generate any tag. By default if an alias is referenced but not defined (like in the next example for `how are you`, it just uses the alias key name, this is useful for making a word optional but not having to add the extra lines of code defining a new alias. e.g.:
158
159
159
160
```
160
161
%[greet]
@@ -172,14 +173,14 @@ When an alias is referenced inside a slot definition, and it is the only token o
172
173
173
174
Alias definitions are not allowed to declare entity arguments.
174
175
175
-
Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops
176
+
Nesting entities: Sentences defined inside aliases can reference slots and other aliases but preventing recursive loops.
176
177
177
178
178
179
### 2.2 - Sentence probability operator
179
180
180
-
The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the this operator.
181
+
The way Chatito works, is like pulling samples from a cloud of possible combinations, but once the sentences definitions start getting more complex, the max possible combination possibilities increments exponentially, causing a problem where the generator will most likely pick sentences that have more possible combinations, and omit some sentences that may be more important at the dataset. To have some control of the generator principle, you can use the probability operator.
181
182
182
-
The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by the probability of generating the sentence (max 100) and `]`. The value inside the probability operator must by an integer betwen 1 and 100.
183
+
The sentence probability operator is defined by the `*[` symbols at the start of a sentence, following by a number, the probability of generating the sentence and `]`. The value inside the probability operator must be an integer between 1 and 100, and the sum of all probability operators inside an entity definition should never exceed 100.
183
184
184
185
```
185
186
%[greet]('training': '2', 'testing': '2')
@@ -190,11 +191,11 @@ The sentence probability operator is defined by the `*[` symbols at the start of
190
191
191
192
This way, it is possible to declare that from the first sentence we want 5 testing and 5 training examples (50%). The second sentence will generate 30% of the utterances. And the 20% remaining will come from the remaining possibilities of all sentences.
192
193
193
-
NOTE: Be carefull when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.
194
+
NOTE: Be careful when using probability operator, because if the sentence reaches its max number of unique generated values, it will start producing duplicates and possibly slowing down the generator that may filter duplicates.
194
195
195
196
### 2.3 - Importing chatito files
196
197
197
-
To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file, only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.
198
+
To allow reusing entity declarations. It is possible to import another chatito file using the import keyword. Importing another chatito file only allows using the slots and aliases defined there, if the imported file defines intents, they will be ignored since intents are generation entry points.
198
199
199
200
As an example, given two chatito files:
200
201
@@ -216,7 +217,7 @@ import ./slot1.chatito
216
217
```
217
218
218
219
The file `main.chatito` will import all alias and slot definitions from `./slot1.chatito`.
219
-
The text next to the import statement should be a relative path from the main file to the imported file.
220
+
The text next to the import statement should be a relative path from the main file to the imported file. Imports can be nested, and the path is always relative to the file that declares the reference.
220
221
221
222
Note: Chatito will throw an exception if two imports define the same entity.
0 commit comments