-
Notifications
You must be signed in to change notification settings - Fork 8
/
transcription_guidelines.txt
133 lines (97 loc) · 3.03 KB
/
transcription_guidelines.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
BeRP Transcription Guidelines:
Pauses which are longer than 1 sec get the dot.
Use only one, even if the pause is ten seconds long.
.
Filled pauses: We only use these four
[er]
[mm]
[uh]
[um]
Other noises:
[beep]
[laughter]
[lip_smack]
[loud_breath]
[noise]
[tap]
[unintelligible] (this is in case they say something you can't decipher)
Everything is lower case.
i
chuck
Restaurant names:
Use spaces in them when the name consists of multiple
english words, or names in which only one word is foreign:
meal ticket
won thai cuisine
gertie's chesapeake bay cafe
bette's ocean view diner
Names which are multiple foreign words:
should not have spaces in them; use dashes instead of spaces:
a-la-carte
au-coquelet
Exceptions to this rule are names where people refer to
the restaurant by individual parts of the name (such as below,
where people called it 'eiffel' or 'tour' as well):
la tour eiffel
Street/city names which have two or more English words: use spaces;
shattuck avenue
martin luther king
Street/city names which have two or more foreign words: Use __:
san__pablo
san__francisco
Words which consist of letters of the alphabet:
Add two underlines "--" between the letters.
(This is different than MADCOW standard, they would
say "a m" instead)
a__m
p__m
Genitives, other apostrophe-s, other apostrophes: DO include the apostrophe:
augusta's
chuck's
that's
christopher's-cafe
i'd
i'll
i'm
i've
how's
o'clock
The name of this place:
i__c__s__i (If spelled out)
icksee (if pronounced as a word)
Contractions: Use your judgement, but DO use the obvious ones:
wanna
gotta
hafta
THe word "OK", don't use "ok", use:
okay
This is a word:
oh
Fragments:
words which are cut off, either because the speaker stopped
in the middle of a word, or because the microphone turned
on or off in the middle of a word, are marked with a dash "-":
i wa- wanna eat
-o you wanna eat?
Put the dash at the very beginning or end of a word,
Mispronunciations: if they mispronounce a word (not just
a dialectal variation, but a true mistake). For example,
if they prononce 'understand' as "unshtam" or something weird
like that, spell the word with its standard orthography,
but put asterisks on either side:
i don't *understand*
Angle brackets: used for verbal repairs, to mean that
the "correct" parse of the sentence should ignore these words.
<i> i want to eat now.
if there is more than one word in the repair, put
angle brackets around EACH word:
<i> <want> <to> [uh] i wanna eat some chinese food
Parenthesis: Used for word-fragments, to indicate that the
part of the word in parenthesis is the transcribers
BEST GUESS as to what the speaker meant to say.
(this usually comes up when the speaker started
before the mike was turned on). Thus in the following,
we THINK the speaker meant to say 'this', but the only
thing we actually have phonetically in the file is the 's'.
(thi)-s is for breakfast
Squiggly brackets: I forget what these mean, I'll remember at some point.