notes.text


The project, tell-phone-number, is for building a dialog management system to
transmit structured data between a transmitter and a receiver using human
conversational dialogue.  

May, 2016


-------------
2016/05/14

Initial choice: Use Otto or build my own interaction engine?
Otto is great at providing the framework for mapping between utterances and 
LogicalForms.  Also, the DialogRule framework is interesting, and can be
used as a production system.
But Otto has some constraints.  It has a strong sense of whose turn it is, and
is not designed to be truly mixed initiative in the sense of allowing either party
to speak at any time.  It does not support a timer with a timeout event to
model when one party gets impatient.  Also, I would have to implement most of
the interesting logic in custom java programs anyway.

So therefore, I will write my own dialog management engine, in python.
I can leverage my cdm rule parser from March.

Initial checkin to repo saund/tell-phone-number


-------------
2016/05/27
-A decent level 3 dialog manager is now working for self (computer) telling partner (user)
a telephone number:
dialogue/tell-phone-number/python/dialogAgent.py
dialogue/tell-phone-number/python/ruleProcesing.py
dialogue/tell-phone-number/rules/tell-phone-number-lf-rules.txt

This includes some chunk size adjustment, and turn handling including wait.
The program architecture is not entirely clean.  
The good:
-the architecture of agent, dialog-model, data-model
-the architecture of converting between utterance text and Otto-style 
 DialogActs with intents and LogicalForm arguments
The bad:
-Decisions about responses are too special-purpose procedurally coded.
 The goal is to have all responses computed from first principles based on
 dialog state, which includes dialog-model data state, dialog-model parameters,
 and dialog history.

ruleProcessing.py needs some reworking:
-allow separation of interpretation and generation rules, <-, -> in addition to <->
-needs to support full nesting of UtteranceCategories, to be faithful to
 John's design that separates semantics from syntax.


-looking at Python support for speech recognition and production ASR and TTS:
Installing python SpeechRecognition from 
pip install SpeechRecognition
https://pypi.python.org/pypi/SpeechRecognition/
Also 
pip install PyAudio


-I horsed around with Python27/Lib/site-packages/speech_recognition/
__init__   and 
__main__

to take control of the parameters, 

r.dynamic_energy_threshold = False
r.energy_threshold = val  (100 is good for this with the logitech mic on jarmela)
r.pause_threshold = .3
r.non_speaking_duration = min(r.non_speaking_duration, r.pause_threshold)
Z
This is not too too bad about printing out a speech recognition result fairly promptly.


For TTS:
https://pypi.python.org/pypi/gTTS
pip install gTTS


-------------
2016/06/06

over the past week:

-working on a document about dialog management for data communication such as 
 telephone numbers
k:/chat/docs/dm-for-data-communication.text
k:/chat/docs/dialogue-management-for-data-communication.pptx

-while refactoring to improve organization of the existing level 3 capability
dialogManagement.py

-now moving to 
tell-phone-number-lf-rules-2.txt

-worked on ruleProcessing.py to now support directional rules <-, ->, <->


-------------
2016/07/06

I think stable enough for alpha testing within the group.


Objectives for alpha testing:

1. Bugs 
   Does it crash, produce error printout?.

2. Assess naturalness.
   Is this interaction effective, fun, annoying?

3. Assess coverage.
   What utterances and paths does it not know about but should?