Skip to content

Commit 1280efd

Browse files
committed
Update README
1 parent 4b62aa2 commit 1280efd

File tree

1 file changed

+30
-30
lines changed

1 file changed

+30
-30
lines changed

README.md

+30-30
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# JUSThink Dialogue and Actions Corpus (Dataset)
22

33
The information contained in this dataset (JUSThink Dialogue and Actions Corpus) includes dialogue transcripts, event logs, and test responses of children aged 9 through 12, as they participate in a robot-mediated human-human collaborative learning activity named JUSThink, where children in teams of two solve a problem on graphs together.
4-
The information was collected in a study at two international schools in Switzerland, in October 2019.
4+
The information was collected in a study at two international schools in Switzerland, in October 2019.
55
The JUSThink activity and its study is first described in [[1]](#references), and elaborated with findings concerning the link between children's learning, performance in the activity, and perception of self, the other and the robot in [[2]](#references).
66
See the [project website](https://www.epfl.ch/labs/chili/index-html/research/animatas/justhink/) for details.
77

@@ -18,81 +18,81 @@ See the [project website](https://www.epfl.ch/labs/chili/index-html/research/ani
1818

1919
JUSThink Dialogue and Actions Corpus is consisted of three parts:
2020

21-
1. [transcripts](transcripts): anonymised dialogue transcripts for 10 teams (see [transcripts](#transcript_content))
22-
2. [logs](logs): anonymised event logs for 39 teams of two children (see [logs](#log_content) for details)
23-
3. [test responses](test_responses): pre-test and post-test responses for 39 teams, and the key i.e. the correct responses (see [tests](#test_content))
21+
1. [transcripts](transcripts): anonymised dialogue transcripts for 10 teams of two children (see [1.1. Transcripts](#transcript_content))
22+
2. [logs](logs): anonymised event logs for 39 teams (see [1.2. Logs](#log_content) for details)
23+
3. [test responses](test_responses): pre-test and post-test responses for 39 teams, and the key i.e. the correct responses (see [1.3. Test Responses](#test_content))
2424

25-
In addition, there is metadata that contains information on the network that the children have worked on:
26-
It is a JSON file in a node-link format, providing the node labels (e.g. "Mount Luzern"), node ids, x, y position of a node, edges between the nodes, and edge costs ([metadata/network.json](metadata/network.json)).
25+
In addition, there is metadata that contains information on the network that the children have worked on:
26+
It is a JSON file in a node-link format, providing the node labels (e.g. "Mount Luzern"), node ids, x, y position of a node, edges between the nodes, and edge costs ([metadata/network.json](metadata/network.json)).
2727
It can be [read](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.json_graph.node_link_graph.html) into a [NetworkX](https://networkx.org/) graph.
2828

2929

3030
### 1.1. Transcripts <a name="transcript_content"></a>
31-
This part of the dataset contains the anonymised dialogue transcripts for 10 teams (out of 39 teams).
31+
This part of the dataset contains the anonymised dialogue transcripts for 10 teams of two children (out of 39 teams).
3232

3333
It consists of 10 files, with one tab-separated text file per team ([transcripts/justhink19_transcript_<team_no\>.csv](transcripts/)).
3434

3535
In particular, the columns are:
3636

3737
* *team_no*: The number of the team that the dialogue belongs to
38-
* *utterance_no*: The number of the utterance, starting from 1
38+
* *utterance_no*: The number of the utterance, starting from 0
3939
* *start*: The start timestamp of the utterance (in seconds), from the beginning of the activity
4040
* *end*: The end timestamp of the utterance (in seconds)
41-
* *interlocutor*: The person (or the robot) that is speaking (A, B: the participants; R: the robot; I: an experimenter)
41+
* *interlocutor*: The person (or the robot) that is speaking (*A*, *B*: the participants; *R*: the robot; *I*: an experimenter)
4242
* *utterance*: The content of the utterance
4343

4444
Utterance segmentation is based on [[3]](#references)'s definition of an *Inter Pausal Unit (IPU)*, defined as "a stretch of a single interlocutor's speech bounded by pauses longer than 100 ms".
45-
We also annotated punctuation markers, such as commas, full stops, exclamation points and question marks, fillers, such as 'uh' and 'um', and the discourse marker 'oh'.
46-
Transcription included incomplete elements, such as "Mount Neuchat-" in "Mount Neuchat- um Mount Interlaken".
47-
We standardised variations of pronunciation in the transcriptions, and we do not account for e.g. variations in accent.
48-
For anonymisation, a person's name is replaced with a pseudonym (in particular Ann for participant A, Bob for participant B, and other pesudonyms if an interlocutor refers to someone else).
45+
We also annotated punctuation markers, such as commas, full stops, exclamation points and question marks, fillers, such as 'uh' and 'um', and the discourse marker 'oh'.
46+
Transcription included incomplete elements, such as "Mount Neuchat-" in "Mount Neuchat- um Mount Interlaken".
47+
We standardised variations of pronunciation in the transcriptions, and we do not account for e.g. variations in accent.
48+
For anonymisation, a person's name is replaced with a pseudonym (in particular Ann for participant *A*, Bob for participant *B*, and other pesudonyms if an interlocutor refers to someone else).
4949
The numbers that are explicitly referring to the cost of an edge or a set of edges are written as numerals.
5050
A graduate student completed two passes on each transcript, and then were checked by another native English speaking graduate student with experience in transcription/annotation tasks.
5151

52-
Note that the start and end times are synchronised with the log times, and the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...", available in the logs) is not included in the transcripts. In addition, some of the utterances by the experimenter (I) and the robot (R) are omitted; these are indicated with "..." in the utterance content. All of the utterances by the robot are available in the logs in complete form.
52+
Note that the start and end times are synchronised with the log times, and the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...", available in the [logs](#log_content)) is not included in the transcripts. In addition, some of the utterances by the experimenter (*I*) and the robot (*R*) are omitted from the transcripts; these are indicated with "..." in the utterance content. All of the utterances by the robot are available in the logs in complete form.
5353

5454

5555
### 1.2. Logs <a name="log_content"></a>
56-
This part of the dataset contains anonymised event log data for 39 teams.
56+
This part of the dataset contains anonymised event log data for 39 teams of two children.
5757

5858
It consists of 39 files, with one tab-separated text file per team ([logs/justhink19_log_<team_no\>.csv](logs/)).
5959

6060
In particular, the columns are:
6161

6262
* *team_no*: The number of the team that the event belongs to
6363
* *attempt_no*: The attempt number that the event belongs to, starting from 1. An attempt is the duration of the team constructing a solution and submitting it together.
64-
* *turn_no*: The turn number of the event, starting from 1. A turn is the duration where one of the participants is in figurative view, and the other is in abstract view (see [[2]](#references) for a description of the views).
64+
* *turn_no*: The turn number of the event, starting from 1. A turn is the duration where one of the participants is in the figurative view, and the other is in the abstract view (see [[2]](#references) for a description of the views).
6565
* *event_no*: The event number of the event, starting from 1
66-
* *time*: The logging timestamp of the event from the beginning of the activity (in seconds)
67-
* *subject*: The subject that the event is executed by (A, B: the participants; R: the robot; T: the team)
66+
* *time*: The logging timestamp of the event, from the beginning of the activity (in seconds)
67+
* *subject*: The subject that the event is executed by (*A*, *B*: the participants; *R*: the robot; *T*: the team)
6868
* *verb*: The verb that describes the event (e.g. "presses", "adds", "removes")
69-
* *object*: The object that is acted on by the subject performing the verb (e.g. "submit (enabled)" for subject: A, verb: "presses")
69+
* *object*: The object that is acted on by the subject performing the verb (e.g. "submit (enabled)" for subject i.e. participant: *A*, verb i.e. the action: "presses")
7070

71-
For example, in a logged event "A presses submit (disabled)", submit refers to the submit button, and "enabled" is the status of the button at the time of the button press by participant A (that it was active/enabled), and hence the help window is displayed afterwards.
72-
Note that the closing of e.g. a help window is not logged.
73-
An event "B presses submit (disabled)" is logged, when B tries to submit a solution, by pressing the submit button, while it was not allowed to submit, i.e. the current solution was not connecting all nodes to each other (see [[2]](#references) for the activity details).
71+
For example, in a logged event "A presses submit (disabled)", submit refers to the submit button, and "enabled" is the status of the button at the time of the button press by participant *A* (that it was active/enabled), and hence the help window is displayed afterwards.
72+
Note that the closing of e.g. the help window is not logged.
73+
An event "B presses submit (disabled)" is logged, when *B* tries to submit a solution, by pressing the submit button, while it was not allowed to submit, i.e. the current solution was not connecting all nodes to each other (see [[2]](#references) for the activity details).
7474

7575
Regarding the collaborative modification and submission of a solution, e.g. an event "B adds Zurich-Gallen (2-8)" modifies the team's current solution by connecting Zurich to Gallen, where 2 and 8 correspond to the node ids respectively.
76-
An event "T submits cost=64 (opt_cost=22)" indicates that the team's solution is registered as a solution, where the total cost of the submitted solution is 64, whereas the optimal cost (for a correct solution) is 22.
76+
An event "T submits cost=64 (opt_cost=22)" indicates that the team's solution is registered as a solution by both of the participants pressing their respective submit buttons; where the total cost of the submitted solution is 64, whereas the optimal cost (for a correct i.e. optimal solution) is 22.
7777
Note that in a few cases a team's submit event might not reflect the total cost by counting the add (and subtracting remove) events due to an error in the logging; however, the submitted solution's cost (as logged in an event "T submits ...") is always correct, and this is what the robot reacts to (by giving feedback on the solution, see [[2]](#references)).
7878

79-
For anonymisation, the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...") has the participant A's name replaced with Ann (and B with Bob), while within the activity the robot pronounces the names of children.
79+
For anonymisation, the robot's introductory line to start the activity ("so, Ann and Bob, let's start building the tracks. ...") has the participant *A*'s name replaced with Ann (and *B* with Bob), while within the activity the robot pronounces the names of children.
8080

8181

8282
### 1.3. Test Responses <a name="test_content"></a>
83-
This part of the dataset contains the responses of each participant in each team to the pre-test and post-test for 39 teams.
84-
Each test contains 10 multiple-choice (single answer) questions (i.e. items) with 3 options (1 to 3), and assesses a concept on spanning trees (see [[2]](#references)).
83+
This part of the dataset contains the responses of each participant in each team to the pre-test and post-test for 39 teams.
84+
Each test contains 10 multiple-choice (single answer) questions (i.e. items) with 3 options (recorded as 0, 1 or 2), and assesses a concept on spanning trees (see [[2]](#references) for details).
8585

86-
It consists of 2 files:
86+
It consists of 2 files:
8787

88-
* one comma-separated text file for the pre-test responses ([test_responses/justhink19_pretest.csv](test_responses/justhink19_pretest.csv)), and
88+
* one comma-separated text file for the pre-test responses ([test_responses/justhink19_pretest.csv](test_responses/justhink19_pretest.csv)), and
8989
* one comma-separated text file for the post-test responses ([test_responses/justhink19_posttest.csv](test_responses/justhink19_posttest.csv)).
9090

9191
In particular, the columns are:
9292

9393
* *team_no*: The number of the team, or "key" for the correct responses
94-
* *q?\_A*: The response of participant A to a particular item (among 10 items indexed from 1 to 10)
95-
* *q?\_B*: The response of participant B to a particular item
94+
* *q?\_A*: The response of participant *A* to a particular item (among 10 items indexed from 1 to 10)
95+
* *q?\_B*: The response of participant *B* to a particular item
9696

9797

9898
## Acknowledgements

0 commit comments

Comments
 (0)