Skip to content

Template syntax for extraction

jrising edited this page Jan 14, 2013 · 11 revisions

Table of Contents

Introduction

This is a derivative of the Template Syntax page, aimed at people who are making templates which match natural language structures for extracting information (i.e., using the DataTemple Command Line Tool).

Rules combine a template and a command. Every template that is matched in the input will have its command run. Usually the template is just a sequence of words or wild-cards (which can match many different words), but it can contain other commands too. Usually the command is for some kind of output, using the identifiers for wild-cards that were matched, but it too can contain other commands.

Here are some example rules:

Template Example Match
where is the murder weapon Where's the murder weapon?
I like to * I like to swim in the ocean.
%sentence I like to %verb %inall I like to swim in the ocean.

Basic Syntax

All commands, wild-cards, and special modifiers start with a special character: *, _, %, @, /, and #. All other words are just words, matched in the template or included in the output. In the template punctuation and capitalization are ignored; in the output, punctuation is shown to the user exactly as is.

Special Usage
* Matches any number of words in the input, or reproduces those words in the output. An optional name after the * (like *foo) names the match for future use.
_ Matches a single word in the input, or reproduces it in the output. An optional name after the _ (like _foo) names the match for future use.
% Matches a restricted word (like any verb, with %verb) in the input, or reproduces it in the output.
@ Applies a functional rule to the input or output. Everything following an @ rule (like @reply Where do you live?), up to the next @ rule, is applied to the rule.
/ Delimits multiple arguments to a command. Examples below.
# Ends one rule or command so another can be included. Examples below.

To include the literal symbol in an input or output, use the symbol twice. Example: Did you know 3 ** 5 is 15? or I saw him//her.

Examples of simple commands:

Command Template Use Command Use
* Matches any number of words Produces the first * matched
_ Matches a single word Produces the word matched by _something in the input
%verb Matches any verb Produces the first verb matched
%verbing Matches an gerund Produces the gerund form of the first verb matched

Naming and Grammar Changes

When you use * or _, you can refer to these matches with either the same symbol, if there was only one, or a symbol followed by an ordinal number: the first wild-card is called *1 or just * in the response, the second is called *2, the third is *3, and so on.

Template Example Input Output
I like to eat * with * I like to eat hamburgers with lettuce. Subject likes *2 on *1.

Restricted wild-cards, like %verb<tt> can be referred to by the same name (e.g. <tt>%verb).

Restricted wild-cards can also have grammar changes, by adding suffixes to the name (e.g., {| class="wikitable" border="1" cellpadding="3" cellspacing="0" ! Template !! Example Input !! Output !! Example Output |- | <tt>I like to %verb || I like to swim || Likes %verbing! || Likes swimming! |- | I like %verbing || I like swimming || Likes to %verb! || Likes to swim! |}

Declinations

Declinations are modified variables, which can be used for matching or producing results. For example, %noun:capital used in a template only matches capitalized nouns, but used in an output will capitalized the corresponding %noun.

The following are the currently supported declinations:

  • :lower - matches or produces lower case words
  • :capital - matches or produces words whose first letter is capitalized (ignoring 'of', 'and', 'a', and 'the')
  • :s - matches or produces a verb in the present tense
  • :ed - matches or produces a verb in the past tense
  • :ing - matches or produces a present participle
  • :en - matches or produces a past participle
  • :x - matches or produces an verb in its infinitive form
Multiple declinations may be used. For example, you might have "%verb:ing:lower".

Parts of Speech identifiers

Note: this section is not yet active in DataTemple.

A variety of commands let you match on the grammar of a sentence, and use the rules of English grammar to keep your responses sensible.

  Example: %subject %verb the dog *  =>  The dog was %verbed by %1object.
    matches: <tt>I walked the dog</tt>, but not <tt>Where is the dog?</tt>)
      <tt>I walked the dog</tt> produces <tt>The dog was walked by you.</tt>
      <tt>They want the dog</tt> produces <tt>The dog was wanted by them.</tt>

Most part of speech commands have both a proper command name and a simple one, as a mnemonic. The proper name is the part of speech. The simple one is a word of that part of speech that would be used at that point in the sentence. So, rather than %object you can use %them and rather than %possessivepronoun you can use %theirs.

Commands

Commands can be used both to produce output, and to save information for later.

The output command is @print. For example @print Subject eats %noun

The two memory commands are @set and @know.

A @set command saves a value to a name. The syntax for @set is:

  @set name value

The value can be as long as you want. Use # to end it before the end of the rule. To refer to the same value later, use %name.

  Example: I am * years old => @set yearsold *1

A @know command saves a particular relation between elements. The syntax is:

  @know one thing @relation something else

@relation can be one of several commands, defined below.

  Example: %person died * => @know %person @HasAttribute dead

Definitions

Another class of commands lets you specify definitions, typically used before the template. The supported definition commands are:

  • @defwc: A word choice definition, constructing a synonym class. For example, @defwc %building building house tower factory would allow you to use %building in a following template (e.g., %noun entered the %building).
  • @defvc: A verb choice definition, like @defwc but allowing verbs to be in any conjugation (e.g., go, goes, went, will go, have gone).
  • @defpc: A phrase choice definition. For example, @defpc %belphr I believe that / We can be sure that / It is now known that.

Command Reference

Notes:

  • words in [brackets] should be replaced by value of your choosing
  • words in <braces> are optional, and you can include them or not depending on what behavior you want
  • words in (parenthesis) are examples
  • words in italics are not currently implemented

Pattern Commands:

Syntax Description Example
* Matches any number of words (including none) I love * (I love to read) => Loves *1! I love * in * (I love walking in the snow) => Loves *1
_ Matches a single word I _ to * (I love to read) => _ to * (love to read)
%adj Adjective
%adv Adverb
%noun Matches either a noun or a noun phrase the mad dog
%is Part of the verb “to be” am, are, is, was, were
%do Part of the verb “to do” do, does, did
%has Part of the verb “to have” have, has, had
%will Modal verb shall, should, will, would, can, could, may, might, must, ought to
%verb Present or past tense of a verb (excluding “to be” and modals)
%verbx Base form of a verb
%verbs present form of a verb
%verbed past form of a verb
%verbing Present participle (or “gerund”) of a verb
%verben Past participle
%verball An entire verb phrase They %verball matches They ate the cake.
%in Preposition of, by, to, from, with, without, on, off, under, in, into, before, after
%inall Prepositional phrase
%attime Timing phrase on July 3.
%punct Any sentence punctuation (., !, ?)
%sentence Specifies that the following template matches a whole sentence %sentence %noun %verball
%question Specifies that the following template matches a whole question %question Where %is %noun
%nounphrase [elements] Matches elements only within a noun phrase %sentence %pronoun %verb %nounphrase * %noun matches He fed the mad dog.
%opt [any input] # Allows an optional element in a template the %opt %adj # dog