Skip to content

Example datasets for statistical learning models

Phil Paradis edited this page Sep 18, 2016 · 1 revision

The dataset must follow this general rule

Context-free examples

General format: handling_function_name, example sentence with all commas removed

Example:

is_handling, My dog is brown
is_question, What is your dog?
has_handling, I have 1 dog
has_handling, My dog has no puppies
is_handling, My dog is an animal
type_handling, Dogs are animals
type_handling, Animals are not persons
type_handling, Animals are mammals
type_handling, Mammals are entities
type_question, What are animals?
is_question, What is an animal?
is_handling, My name is Phil
is_question, Who is Phil?
has_question, What does Phil have?
ability_function, My dog can jump.
ability_question, What can my dog do?
rating_function, My dog is fantastic!
rating_question, How good is my dog?
describe_statement, My dog is fast.
describe_question, How is my dog?

With-context examples

General format: handling_func, context, example sentence

Note each row can contain multiple lines. Rows are separated by | characters. So all commas, double-quotes and pipe characters must be first parsed out.

Example:

is_question,
,
What is your name?
has_statement,
<context sentence=<sentence distance=1, user="What is your name?" chatbot="My name is chatbox!"> >,
I have 3 dogs.
|
has_question,
<context sentence=<sentence distance=1, user="What is your name?" chatbot="My name is chatbox!">
         sentence=<sentence distance=2, user="I have 3 dogs." chatbot="Okay, Phil has 3 dogs."> >,
What does Phil have?
|
rating_function,
<context sentence=<sentence distance=3, user="What is your name?" chatbot="My name is chatbox!">
         sentence=<sentence distance=2, user="I have 3 dogs." chatbot="Okay, Phil has 3 dogs"> 
         sentence=<sentence distance=1, user="What does Phil have?" chatbot="Phil has 3 dogs."> >,
My dogs are fantastic!