You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+30-30
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# JUSThink Dialogue and Actions Corpus (Dataset)
2
2
3
3
The information contained in this dataset (JUSThink Dialogue and Actions Corpus) includes dialogue transcripts, event logs, and test responses of children aged 9 through 12, as they participate in a robot-mediated human-human collaborative learning activity named JUSThink, where children in teams of two solve a problem on graphs together.
4
-
The information was collected in a study at two international schools in Switzerland, in October 2019.
4
+
The information was collected in a study at two international schools in Switzerland, in October 2019.
5
5
The JUSThink activity and its study is first described in [[1]](#references), and elaborated with findings concerning the link between children's learning, performance in the activity, and perception of self, the other and the robot in [[2]](#references).
6
6
See the [project website](https://www.epfl.ch/labs/chili/index-html/research/animatas/justhink/) for details.
7
7
@@ -18,81 +18,81 @@ See the [project website](https://www.epfl.ch/labs/chili/index-html/research/ani
18
18
19
19
JUSThink Dialogue and Actions Corpus is consisted of three parts:
20
20
21
-
1.[transcripts](transcripts): anonymised dialogue transcripts for 10 teams (see [transcripts](#transcript_content))
22
-
2.[logs](logs): anonymised event logs for 39 teams of two children (see [logs](#log_content) for details)
23
-
3.[test responses](test_responses): pre-test and post-test responses for 39 teams, and the key i.e. the correct responses (see [tests](#test_content))
21
+
1.[transcripts](transcripts): anonymised dialogue transcripts for 10 teams of two children (see [1.1. Transcripts](#transcript_content))
22
+
2.[logs](logs): anonymised event logs for 39 teams (see [1.2. Logs](#log_content) for details)
23
+
3.[test responses](test_responses): pre-test and post-test responses for 39 teams, and the key i.e. the correct responses (see [1.3. Test Responses](#test_content))
24
24
25
-
In addition, there is metadata that contains information on the network that the children have worked on:
26
-
It is a JSON file in a node-link format, providing the node labels (e.g. "Mount Luzern"), node ids, x, y position of a node, edges between the nodes, and edge costs ([metadata/network.json](metadata/network.json)).
25
+
In addition, there is metadata that contains information on the network that the children have worked on:
26
+
It is a JSON file in a node-link format, providing the node labels (e.g. "Mount Luzern"), node ids, x, y position of a node, edges between the nodes, and edge costs ([metadata/network.json](metadata/network.json)).
27
27
It can be [read](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.json_graph.node_link_graph.html) into a [NetworkX](https://networkx.org/) graph.
This part of the dataset contains the anonymised dialogue transcripts for 10 teams (out of 39 teams).
31
+
This part of the dataset contains the anonymised dialogue transcripts for 10 teams of two children (out of 39 teams).
32
32
33
33
It consists of 10 files, with one tab-separated text file per team ([transcripts/justhink19_transcript_<team_no\>.csv](transcripts/)).
34
34
35
35
In particular, the columns are:
36
36
37
37
**team_no*: The number of the team that the dialogue belongs to
38
-
**utterance_no*: The number of the utterance, starting from 1
38
+
**utterance_no*: The number of the utterance, starting from 0
39
39
**start*: The start timestamp of the utterance (in seconds), from the beginning of the activity
40
40
**end*: The end timestamp of the utterance (in seconds)
41
-
**interlocutor*: The person (or the robot) that is speaking (A, B: the participants; R: the robot; I: an experimenter)
41
+
**interlocutor*: The person (or the robot) that is speaking (*A*, *B*: the participants; *R*: the robot; *I*: an experimenter)
42
42
**utterance*: The content of the utterance
43
43
44
44
Utterance segmentation is based on [[3]](#references)'s definition of an *Inter Pausal Unit (IPU)*, defined as "a stretch of a single interlocutor's speech bounded by pauses longer than 100 ms".
45
-
We also annotated punctuation markers, such as commas, full stops, exclamation points and question marks, fillers, such as 'uh' and 'um', and the discourse marker 'oh'.
46
-
Transcription included incomplete elements, such as "Mount Neuchat-" in "Mount Neuchat- um Mount Interlaken".
47
-
We standardised variations of pronunciation in the transcriptions, and we do not account for e.g. variations in accent.
48
-
For anonymisation, a person's name is replaced with a pseudonym (in particular Ann for participant A, Bob for participant B, and other pesudonyms if an interlocutor refers to someone else).
45
+
We also annotated punctuation markers, such as commas, full stops, exclamation points and question marks, fillers, such as 'uh' and 'um', and the discourse marker 'oh'.
46
+
Transcription included incomplete elements, such as "Mount Neuchat-" in "Mount Neuchat- um Mount Interlaken".
47
+
We standardised variations of pronunciation in the transcriptions, and we do not account for e.g. variations in accent.
48
+
For anonymisation, a person's name is replaced with a pseudonym (in particular Ann for participant *A*, Bob for participant *B*, and other pesudonyms if an interlocutor refers to someone else).
49
49
The numbers that are explicitly referring to the cost of an edge or a set of edges are written as numerals.
50
50
A graduate student completed two passes on each transcript, and then were checked by another native English speaking graduate student with experience in transcription/annotation tasks.
51
51
52
-
Note that the start and end times are synchronised with the log times, and the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...", available in the logs) is not included in the transcripts. In addition, some of the utterances by the experimenter (I) and the robot (R) are omitted; these are indicated with "..." in the utterance content. All of the utterances by the robot are available in the logs in complete form.
52
+
Note that the start and end times are synchronised with the log times, and the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...", available in the [logs](#log_content)) is not included in the transcripts. In addition, some of the utterances by the experimenter (*I*) and the robot (*R*) are omitted from the transcripts; these are indicated with "..." in the utterance content. All of the utterances by the robot are available in the logs in complete form.
53
53
54
54
55
55
### 1.2. Logs <aname="log_content"></a>
56
-
This part of the dataset contains anonymised event log data for 39 teams.
56
+
This part of the dataset contains anonymised event log data for 39 teams of two children.
57
57
58
58
It consists of 39 files, with one tab-separated text file per team ([logs/justhink19_log_<team_no\>.csv](logs/)).
59
59
60
60
In particular, the columns are:
61
61
62
62
**team_no*: The number of the team that the event belongs to
63
63
**attempt_no*: The attempt number that the event belongs to, starting from 1. An attempt is the duration of the team constructing a solution and submitting it together.
64
-
**turn_no*: The turn number of the event, starting from 1. A turn is the duration where one of the participants is in figurative view, and the other is in abstract view (see [[2]](#references) for a description of the views).
64
+
**turn_no*: The turn number of the event, starting from 1. A turn is the duration where one of the participants is in the figurative view, and the other is in the abstract view (see [[2]](#references) for a description of the views).
65
65
**event_no*: The event number of the event, starting from 1
66
-
**time*: The logging timestamp of the event from the beginning of the activity (in seconds)
67
-
**subject*: The subject that the event is executed by (A, B: the participants; R: the robot; T: the team)
66
+
**time*: The logging timestamp of the event, from the beginning of the activity (in seconds)
67
+
**subject*: The subject that the event is executed by (*A*, *B*: the participants; *R*: the robot; *T*: the team)
68
68
**verb*: The verb that describes the event (e.g. "presses", "adds", "removes")
69
-
**object*: The object that is acted on by the subject performing the verb (e.g. "submit (enabled)" for subject: A, verb: "presses")
69
+
**object*: The object that is acted on by the subject performing the verb (e.g. "submit (enabled)" for subject i.e. participant: *A*, verb i.e. the action: "presses")
70
70
71
-
For example, in a logged event "A presses submit (disabled)", submit refers to the submit button, and "enabled" is the status of the button at the time of the button press by participant A (that it was active/enabled), and hence the help window is displayed afterwards.
72
-
Note that the closing of e.g. a help window is not logged.
73
-
An event "B presses submit (disabled)" is logged, when B tries to submit a solution, by pressing the submit button, while it was not allowed to submit, i.e. the current solution was not connecting all nodes to each other (see [[2]](#references) for the activity details).
71
+
For example, in a logged event "A presses submit (disabled)", submit refers to the submit button, and "enabled" is the status of the button at the time of the button press by participant *A* (that it was active/enabled), and hence the help window is displayed afterwards.
72
+
Note that the closing of e.g. the help window is not logged.
73
+
An event "B presses submit (disabled)" is logged, when *B* tries to submit a solution, by pressing the submit button, while it was not allowed to submit, i.e. the current solution was not connecting all nodes to each other (see [[2]](#references) for the activity details).
74
74
75
75
Regarding the collaborative modification and submission of a solution, e.g. an event "B adds Zurich-Gallen (2-8)" modifies the team's current solution by connecting Zurich to Gallen, where 2 and 8 correspond to the node ids respectively.
76
-
An event "T submits cost=64 (opt_cost=22)" indicates that the team's solution is registered as a solution, where the total cost of the submitted solution is 64, whereas the optimal cost (for a correct solution) is 22.
76
+
An event "T submits cost=64 (opt_cost=22)" indicates that the team's solution is registered as a solution by both of the participants pressing their respective submit buttons; where the total cost of the submitted solution is 64, whereas the optimal cost (for a correct i.e. optimal solution) is 22.
77
77
Note that in a few cases a team's submit event might not reflect the total cost by counting the add (and subtracting remove) events due to an error in the logging; however, the submitted solution's cost (as logged in an event "T submits ...") is always correct, and this is what the robot reacts to (by giving feedback on the solution, see [[2]](#references)).
78
78
79
-
For anonymisation, the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...") has the participant A's name replaced with Ann (and B with Bob), while within the activity the robot pronounces the names of children.
79
+
For anonymisation, the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...") has the participant *A*'s name replaced with Ann (and *B* with Bob), while within the activity the robot pronounces the names of children.
80
80
81
81
82
82
### 1.3. Test Responses <aname="test_content"></a>
83
-
This part of the dataset contains the responses of each participant in each team to the pre-test and post-test for 39 teams.
84
-
Each test contains 10 multiple-choice (single answer) questions (i.e. items) with 3 options (1 to 3), and assesses a concept on spanning trees (see [[2]](#references)).
83
+
This part of the dataset contains the responses of each participant in each team to the pre-test and post-test for 39 teams.
84
+
Each test contains 10 multiple-choice (single answer) questions (i.e. items) with 3 options (recorded as 0, 1 or 2), and assesses a concept on spanning trees (see [[2]](#references) for details).
85
85
86
-
It consists of 2 files:
86
+
It consists of 2 files:
87
87
88
-
* one comma-separated text file for the pre-test responses ([test_responses/justhink19_pretest.csv](test_responses/justhink19_pretest.csv)), and
88
+
* one comma-separated text file for the pre-test responses ([test_responses/justhink19_pretest.csv](test_responses/justhink19_pretest.csv)), and
89
89
* one comma-separated text file for the post-test responses ([test_responses/justhink19_posttest.csv](test_responses/justhink19_posttest.csv)).
90
90
91
91
In particular, the columns are:
92
92
93
93
**team_no*: The number of the team, or "key" for the correct responses
94
-
**q?\_A*: The response of participant A to a particular item (among 10 items indexed from 1 to 10)
95
-
**q?\_B*: The response of participant B to a particular item
94
+
**q?\_A*: The response of participant *A* to a particular item (among 10 items indexed from 1 to 10)
95
+
**q?\_B*: The response of participant *B* to a particular item
0 commit comments