-
Notifications
You must be signed in to change notification settings - Fork 5
Template syntax for extraction
This is a derivative of the Template Syntax page, aimed at people who are making templates which match natural language structures for extracting information (i.e., using the DataTemple Command Line Tool).
Rules combine a template and a command. Every template that is matched in the input will have its command run. Usually the template is just a sequence of words or wild-cards (which can match many different words), but it can contain other commands too. Usually the command is for some kind of output, using the identifiers for wild-cards that were matched, but it too can contain other commands.
Here are some example rules:
Template | Example Match |
---|---|
where is the murder weapon | Where's the murder weapon? |
I like to * | I like to swim in the ocean. |
%sentence I like to %verb %inall | I like to swim in the ocean. |
All commands, wild-cards, and special modifiers start with a special character: *, _, %, @, /, and #. All other words are just words, matched in the template or included in the output. In the template punctuation and capitalization are ignored; in the output, punctuation is shown to the user exactly as is.
Special | Usage |
---|---|
* | Matches any number of words in the input, or reproduces those words in the output. An optional name after the * (like *foo) names the match for future use. |
_ | Matches a single word in the input, or reproduces it in the output. An optional name after the _ (like _foo) names the match for future use. |
% | Matches a restricted word (like any verb, with %verb) in the input, or reproduces it in the output. |
@ | Applies a functional rule to the input or output. Everything following an @ rule (like @reply Where do you live?), up to the next @ rule, is applied to the rule. |
/ | Delimits multiple arguments to a command. Examples below. |
# | Ends one rule or command so another can be included. Examples below. |
To include the literal symbol in an input or output, use the symbol twice. Example: Did you know 3 ** 5 is 15? or I saw him//her.
Command | Template Use | Command Use |
---|---|---|
* | Matches any number of words | Produces the first * matched |
_ | Matches a single word | Produces the word matched by _something in the input |
%verb | Matches any verb | Produces the first verb matched |
%verbing | Matches an gerund | Produces the gerund form of the first verb matched |
When you use * or _, you can refer to these matches with either the same symbol, if there was only one, or a symbol followed by an ordinal number: the first wild-card is called *1 or just * in the response, the second is called *2, the third is *3, and so on.
Template | Example Input | Output |
---|---|---|
I like to eat * with * | I like to eat hamburgers with lettuce. | Subject likes *2 on *1. |
Restricted wild-cards, like %verb<tt> can be referred to by the same name (e.g. <tt>%verb).
Restricted wild-cards can also have grammar changes, by adding suffixes to the name (e.g., {| class="wikitable" border="1" cellpadding="3" cellspacing="0" ! Template !! Example Input !! Output !! Example Output |- | <tt>I like to %verb || I like to swim || Likes %verbing! || Likes swimming! |- | I like %verbing || I like swimming || Likes to %verb! || Likes to swim! |}
Declinations are modified variables, which can be used for matching or producing results. For example, %noun:capital used in a template only matches capitalized nouns, but used in an output will capitalized the corresponding %noun.
The following are the currently supported declinations:
- :lower - matches or produces lower case words
- :capital - matches or produces words whose first letter is capitalized (ignoring 'of', 'and', 'a', and 'the')
- :s - matches or produces a verb in the present tense
- :ed - matches or produces a verb in the past tense
- :ing - matches or produces a present participle
- :en - matches or produces a past participle
- :x - matches or produces an verb in its infinitive form
Note: this section is not yet active in DataTemple.
A variety of commands let you match on the grammar of a sentence, and use the rules of English grammar to keep your responses sensible.
Example: %subject %verb the dog * => The dog was %verbed by %1object. matches: <tt>I walked the dog</tt>, but not <tt>Where is the dog?</tt>) <tt>I walked the dog</tt> produces <tt>The dog was walked by you.</tt> <tt>They want the dog</tt> produces <tt>The dog was wanted by them.</tt>
Most part of speech commands have both a proper command name and a simple one, as a mnemonic. The proper name is the part of speech. The simple one is a word of that part of speech that would be used at that point in the sentence. So, rather than %object you can use %them and rather than %possessivepronoun you can use %theirs.
Commands can be used both to produce output, and to save information for later.
The output command is @print. For example @print Subject eats %noun
The two memory commands are @set and @know.
A @set command saves a value to a name. The syntax for @set is:
@set name value
The value can be as long as you want. Use # to end it before the end of the rule. To refer to the same value later, use %name.
Example: I am * years old => @set yearsold *1
A @know command saves a particular relation between elements. The syntax is:
@know one thing @relation something else
@relation can be one of several commands, defined below.
Example: %person died * => @know %person @HasAttribute dead
Another class of commands lets you specify definitions, typically used before the template. The supported definition commands are:
- @defwc: A word choice definition, constructing a synonym class. For example, @defwc %building building house tower factory would allow you to use %building in a following template (e.g., %noun entered the %building).
- @defvc: A verb choice definition, like @defwc but allowing verbs to be in any conjugation (e.g., go, goes, went, will go, have gone).
- @defpc: A phrase choice definition. For example, @defpc %belphr I believe that / We can be sure that / It is now known that.
Notes:
- words in [brackets] should be replaced by value of your choosing
- words in <braces> are optional, and you can include them or not depending on what behavior you want
- words in (parenthesis) are examples
- words in italics are not currently implemented
Syntax | Description | Example |
---|---|---|
* | Matches any number of words (including none) | I love * (I love to read) => Loves *1! I love * in * (I love walking in the snow) => Loves *1 |
_ | Matches a single word | I _ to * (I love to read) => _ to * (love to read) |
%adj | Adjective | |
%adv | Adverb | |
%noun | Matches either a noun or a noun phrase | the mad dog |
%is | Part of the verb “to be” | am, are, is, was, were |
%do | Part of the verb “to do” | do, does, did |
%has | Part of the verb “to have” | have, has, had |
%will | Modal verb | shall, should, will, would, can, could, may, might, must, ought to |
%verb | Present or past tense of a verb (excluding “to be” and modals) | |
%verbx | Base form of a verb | |
%verbs | present form of a verb | |
%verbed | past form of a verb | |
%verbing | Present participle (or “gerund”) of a verb | |
%verben | Past participle | |
%verball | An entire verb phrase | They %verball matches They ate the cake. |
%in | Preposition | of, by, to, from, with, without, on, off, under, in, into, before, after |
%inall | Prepositional phrase | |
%attime | Timing phrase | on July 3. |
%punct | Any sentence punctuation (., !, ?) | |
%sentence | Specifies that the following template matches a whole sentence | %sentence %noun %verball |
%question | Specifies that the following template matches a whole question | %question Where %is %noun |
%nounphrase [elements] | Matches elements only within a noun phrase | %sentence %pronoun %verb %nounphrase * %noun matches He fed the mad dog. |
%opt [any input] # | Allows an optional element in a template | the %opt %adj # dog |