-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdata.tex
178 lines (158 loc) · 9.28 KB
/
data.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
\section{New data policies should be collectively negotiated}
MOOCs generate vast amounts of data: data about the content of the
course (videos, quizzes, exercises, slides, \ldots), data about the
learners
(clickstream data, answers to questions, discussion forums, \ldots) and data
about the professor and his or her pedagogical team. This section is
concerned with the last two points, i.e. user data. As with any user data
collected by other online platforms, MOOCs raise serious concerns on
possible breaches of the user's privacy. These are addressed in the
first point. Nonetheless, these datasets do also constitute great
opportunities. At the individual level, this data is necessary for
adapting learning activities to individual needs. At the collective
level, the data supports extracting knowledge about the effectiveness of
the MOOC components, in order to improve the concerned MOOC or to
acquire general pedagogical knowledge. These opportunities are
emphasized in the second point.
%% \subsection{How can we trust MOOC platforms?}
\subsection{Data collection transparency and learner privacy}
Data privacy is a complex issue that cannot be fully developed here, but
we cannot help but point out the current opacity of data in MOOC
platforms. To know what is being recorded, users must read the
``terms of service'' document between the MOOC platform and the learner
or between the MOOC platform and the
universities providing courses.
These are complex documents that few users have the stamina to read
and that do not always clearly state what is being recorded.
There are many reasons learners might care about the retention of their
data.
%% The rationale for this individual ownership of personal data is that
%% stored data can sometimes be detrimental to learners.
For example, a learner might be concerned if a potential employer
could access that learner's performance data in a MOOC.
Another concern is that a record of
failure may reduce the teacher's expectations about a learner
that consequently may adapt to these lower expectations by
working less hard (the ``Rosenthal effect'').
% ck: belonging or mastering ?
We therefore propose a general principle: a learner's data belongs to the learner. According to
this principle, learners should be able to easily access and visualize any data recorded about them,
such as interaction traces. Learners should be able to share their data with others as well as
analyze it themselves, with access to the same analysis tools that MOOC platforms or instructors
use. And a learner should be allowed to delete any subset of her or his data: he or she will lose
the advantages of having it analyzed, but that would be a personal choice.
\tbd{ck: comment by fb: This comes too short because of aggregated data. What should be the impact
of user's deletion of her own data on formerly aggregated data?
I would suggest to stress that there is an issue there that on the one hand is not specific to
MOOCs and on the other hand deserves more investigations in the lmited, speicifc context of MOOCs.
}
This proposal can be compared with the medical field, where a patient's
digital healthcare record contains highly sensitive data for which any
breach in confidentiality may have medical, social, or economic
consequences to the patient,
such as stigmatisation or impact on health insurance.
%% Both, students and patients take ``exams''.
In most countries, patients' data are owned by the
patients themselves, with legal instruments providing guarantees of
security, reliability, and confidentiality. Patients may elect to share
their data with medical personnel, and with informed prior consent, may
choose to share their data for scientific purposes,
such as participation in a clinical study. But informed consent in these
cases is much more transparent than clicking the ``I Accept'' link at
the end of a long list of conditions often written in impenetrable legal
terms.
And even patients who refuse to participate in the
clinical study retain their right to the highest quality care.
MOOC learners should enjoy a similar benefit: they
should be the unique owners of their data, and should be able to
opt-out of sharing their data without sacrificing the quality
of their educational experience.
Finally, MOOCs can
produce an unprecedented degree of transparency in teaching: the log
files also contain data about the teachers or the
teaching team, for instance, the average response time for forum
postings. While this data constitutes a potential source of
analytics that might contribute to quality management,
labor unions may object to the direct use of such
measurements of productivity and work quality. An alternative might be
to monitor for problems related to resources for which the teacher is
responsible, for example, ensuring that learner questions are always
answered in a timely way and
alerting the teacher when this is not the case. The teacher would then
be evaluated on the effectiveness of the learner experience rather than on
direct metrics such as number of hours spent online.
\subsection{Sharing MOOC data across institutions and platforms}
Currently, each university receives data associated with its own MOOCs.
However, the research
community would be very interested in the possibility of making data
from MOOCs in some form available to all. Many scientific domains have
benefited from becoming ``data driven'' once large datasets were made
available. We therefore recommend active exploration of technological
and legal frameworks under which MOOC data can be shared to support
research.
One possibility would be that the terms of service include the
possibility to provide data to non-profit educational institutions for
research purposes. Such an agreement would need to be added to the
contracts that universities have with MOOC providers.
Some MOOC providers' terms of service already include a clause
stipulating that learner data may be made available to researchers.
There is still the potential for abuse, however: consider that most
whales today are nominally killed for research purposes. Fortunately,
most
universities have an internal review boards, such as Committees for the
Protection of Human Subjects, who could monitor for responsible data use.
Sharing data requires de-identification,
a hard problem in computer science in both principle and
practice, as Netflix and AOL discovered to their dismay when researchers
were able to recover individual identities from supposedly anonymised
datasets that those companies voluntarily provided for research.
While it is easy to remove names from log files, the forum
postings may include information that enables third parties to directly
or indirectly
(e.g. ``a female chemist from Lausanne'') identify the learner.
While learners should be responsible for keeping confidential data out
of public forums, users often cannot foresee all the ways in which their
``public'' information may lead to
privacy leaks.
\tbd{ck: comment by fb: " Sharing data requires de-identification"
One again, the issue is more complicated than this. The manifesto argues that MOOCs should deliver
ECTS and facilitate students mobility in Europe. This require sgharing data *without*
de-identification.
}
Some research and pedagogy requires learner identification.
For instance, in a teacher training MOOC, a pre-service
teacher would hardly be able to describe an example of classroom
conflict without providing any confidential information. From an
analytics point of view it is important that anonymisation maintains all
data from the same learner associated to the same ID. This, in turn,
would imply that user identity resolved from a forum posting would also
allow for de-anonymising test results later on.
We recommend both new
research and new institutional policy exploration regarding the
trade-offs between privacy concerns and
the research and teaching benefits of data sharing.
One condition for sharing data across MOOCs and across platforms is that
the context-specific meaning of collected should not be lost during that
transfer. For example, MOOCs contain many different types of
quizzes: some attention-enhancer quizzes simply check the understanding
of the last video segment and can be answered in a few seconds, while
other quizzes may propose several solutions to a complex problem that
may require several hours of work. It would not make sense to compare
the success rate of such two quizzes or to compute an average response
rate across them. Sharing MOOC data will only be useful if the data is
accompanied by a semantic description of what was collected, that is,
what each piece of data meant in its original context. Such a
description would also be necessary to address the ethical issues
already raised.
Currently, universities negotiate one-by-one with MOOC platforms, which
puts them in a rather weak position, despite the fact that they provide
content that in has been developed over many years with their own (often
public) funding. Universities and learners would be better served
if universities could move towards a collective negotiation, perhaps
even at a national level
for countries that have specific privacy laws.
%% , may be at European level
%% for countries that have rather similar laws. To reach this goal, we
We therefore recommend that associations of universities collect data
privacy and
data ownership concerns across their members institutions.