-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathnotes.text
131 lines (81 loc) · 3.58 KB
/
notes.text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
The project, tell-phone-number, is for building a dialog management system to
transmit structured data between a transmitter and a receiver using human
conversational dialogue.
May, 2016
-------------
2016/05/14
Initial choice: Use Otto or build my own interaction engine?
Otto is great at providing the framework for mapping between utterances and
LogicalForms. Also, the DialogRule framework is interesting, and can be
used as a production system.
But Otto has some constraints. It has a strong sense of whose turn it is, and
is not designed to be truly mixed initiative in the sense of allowing either party
to speak at any time. It does not support a timer with a timeout event to
model when one party gets impatient. Also, I would have to implement most of
the interesting logic in custom java programs anyway.
So therefore, I will write my own dialog management engine, in python.
I can leverage my cdm rule parser from March.
Initial checkin to repo saund/tell-phone-number
-------------
2016/05/27
-A decent level 3 dialog manager is now working for self (computer) telling partner (user)
a telephone number:
dialogue/tell-phone-number/python/dialogAgent.py
dialogue/tell-phone-number/python/ruleProcesing.py
dialogue/tell-phone-number/rules/tell-phone-number-lf-rules.txt
This includes some chunk size adjustment, and turn handling including wait.
The program architecture is not entirely clean.
The good:
-the architecture of agent, dialog-model, data-model
-the architecture of converting between utterance text and Otto-style
DialogActs with intents and LogicalForm arguments
The bad:
-Decisions about responses are too special-purpose procedurally coded.
The goal is to have all responses computed from first principles based on
dialog state, which includes dialog-model data state, dialog-model parameters,
and dialog history.
ruleProcessing.py needs some reworking:
-allow separation of interpretation and generation rules, <-, -> in addition to <->
-needs to support full nesting of UtteranceCategories, to be faithful to
John's design that separates semantics from syntax.
-looking at Python support for speech recognition and production ASR and TTS:
Installing python SpeechRecognition from
pip install SpeechRecognition
https://pypi.python.org/pypi/SpeechRecognition/
Also
pip install PyAudio
-I horsed around with Python27/Lib/site-packages/speech_recognition/
__init__ and
__main__
to take control of the parameters,
r.dynamic_energy_threshold = False
r.energy_threshold = val (100 is good for this with the logitech mic on jarmela)
r.pause_threshold = .3
r.non_speaking_duration = min(r.non_speaking_duration, r.pause_threshold)
Z
This is not too too bad about printing out a speech recognition result fairly promptly.
For TTS:
https://pypi.python.org/pypi/gTTS
pip install gTTS
-------------
2016/06/06
over the past week:
-working on a document about dialog management for data communication such as
telephone numbers
k:/chat/docs/dm-for-data-communication.text
k:/chat/docs/dialogue-management-for-data-communication.pptx
-while refactoring to improve organization of the existing level 3 capability
dialogManagement.py
-now moving to
tell-phone-number-lf-rules-2.txt
-worked on ruleProcessing.py to now support directional rules <-, ->, <->
-------------
2016/07/06
I think stable enough for alpha testing within the group.
Objectives for alpha testing:
1. Bugs
Does it crash, produce error printout?.
2. Assess naturalness.
Is this interaction effective, fun, annoying?
3. Assess coverage.
What utterances and paths does it not know about but should?