Jordan: Are licenses something that you need a lawyer for? Is there a default system for data/code protection built into github?
Sean: What is the role of personal privacy in licensing and protecting your project data? If we're taking data from someone who republished the data publicly, do we need to contact the republisher? The original publisher? What if we can't? Are there types of projects for which you wouldn't need to respect licenses? (I'm thinking of how IRB's often aren't required for findings that aren't "for generalizable knowledge".)
Joey: It seems like even if you collect data yourself, there can still be a reasonable argument as to why it is not yours (i.e., tweets aren't yours, webpages aren't yours, etc.). At what point are we allowed to claim our data as our own property?
Lindsey: What's the usage policy on code available on places like Stack Overflow or Reddit (for example, a Python script that filters out certain words from a corpus)? Are they fair game since they're public? How should we cite them?; How should we approach data published in other countries/languages? Are there any special precautions we should take like doing research on data policies in those countries, or should we just translate the license (if necessary) and use it at that surface level?
Natasha: Is simply creating a text file saying "do not steal" sufficient as a license? Are there levels of licensing protection, and if so what are they and when is it appropriate to use them?
Juan: What license do you recommend for projects such as ours (e.g., code for analyzing linguistic data)? If we are using a licensed data set, what are some best practices in terms of transparency and open science? I know that we cannot publish the whole data set, but can we post a sample (with due attribution)? If so, how big can the sample be?
Kiara: Are the repercussions for using data and ignoring the license and using data for something other than the license allows different? Also, is there a limit to how far a prosecuter can go depending on how the license is violated? For example, if someone completely ignored a license or if the license allowed for academic use but then someone continued to use the data for something other than academics? I also think it would depend on who the licenser is, like Disney and a college student would probably pursue violations differently, and if for some reason they both violated licensing, I feel like the repercussions would be different for both as well.
Anthony: Thank you both for joining us today! My question is about ethics and data, but beyond the scope of what is legally permissible under the auspices of copyright law and what is ethically or maybe morally sound. So specifically, I'm wondering about whether it's really ethical for us, as researchers, to be using people's data from Twitter, Reddit, Tumblr, etc. (publicly scrapable sources) for purposes that may result in profit for us (either directly through compensation for our work or indirectly through hiring on the grounds of the merit of our work) without then somehow compensating the people we're getting our data from. I know this is a mouthful and there probably isn't an answer but it seems important to think about. I know that participants aren't always compensated for their participation in a study, but it seems a little morally ambiguous to me to be taking people's data without their express consent and using it for personal gain, even if credit is given.