Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image licensing #1711

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from
Draft

image licensing #1711

wants to merge 6 commits into from

Conversation

lukavdplas
Copy link
Contributor

@lukavdplas lukavdplas commented Nov 20, 2024

This aims to address a few copyright-related issues:

  • I-analyzer reuses several images with a CC BY or CC BY-SA licence without providing attribution, so our reuse is not covered by the licence.
  • Sharing this entire repository under an MIT licence also violates those licences.
  • Most images have no clear attribution or source, so even if they are in the public domain, this is a pain in the ass to verify.

Solution:

  • Add a *.license file next to each image in the repository and mention this in the readme.
  • Mention attribution and licence in the corpus documentation, so it's also visible for users.

I've added licence info for all corpora where I could find the source of the image. Here is an overview of ones I could not trace:

  • Dutch Annual Reports: can't find a source for this - it is (or in some cases, was) available on some stock photo platforms, but I could not find any page that claimed to be the original author.
  • ECCO: This image is often used around the ECCO corpus, for example in this description from TCP. I assume that it's cropped and edited from an illustration that is in the public domain, and it's likely that no one would claim copyright over the editing. But I can't find an actual source for it.
  • DIOPTRA-L: Clearly designed for this project, but no licence or credit is included. I expect that we have specific permission to use it without attribution but there is no general licence.
  • Jewish Inscriptions / Jewish Migration: Reverse image search turns up nothing. May be the same situation as DIOPTRA-L, though as with the ECCO corpus, this may be considered public domain.
  • People & Parliament: France: Can't find a source for this. Reverse image search is unproductive because there are many similar pictures of the same room.
  • People & Parliament: Ireland: This image was originally intended as temporary due to copyright. Replaced the image with one that has a CC licence.
  • Troonredes: Can't find the source for this one. It's probably less effort to just find a suitable replacement.
  • U-Blad: No source provided. This image is UU-specific, so again, I expect we have permission but there is no licence. Note that it is a scan of copyrighted works, too.

@lukavdplas lukavdplas marked this pull request as draft November 20, 2024 16:08
@BeritJanssen
Copy link
Contributor

With ECCO, the image was supplied with the source data, as far as I remember. As for Dutch Annual Reports - I must've picked that from an image search for images in the open domain, but if that's not something we can reproduce, probably easier to go with a new image. Perhaps of the Amsterdam Zuidas, so it's more recognizably Dutch? DIOPTRA-L: I suspect that this image was produced by Alex back then, I'll check with Haidee if she knows. The image from JewishMigration / Inscriptions is from before my time. Again, I don't think anyone would be outraged if we picked a different (comparable) image here. For France, I think we can just go for any other image of the same room, too.

@jgonggrijp
Copy link
Contributor

Most image formats have built-in facilities for metadata. I know that in at least two cases (SVG and JPEG), this includes copyright and licensing information. You could use this instead of a separate *.license file.

@lukavdplas
Copy link
Contributor Author

lukavdplas commented Dec 6, 2024

Most image formats have built-in facilities for metadata. I know that in at least two cases (SVG and JPEG), this includes copyright and licensing information. You could use this instead of a separate *.license file.

Image metadata has its uses, but putting copyright/licence data exclusively in the metadata makes it kind of invisible in code repositories; github won't display it, nor will most code editors. The fact that such metadata even exists is obscure enough that you apparently felt you needed to explain this. (I guess I would too.) If you thought some of us might not know about this, would you expect that everyone viewing the repository does?

The Creative Commons wiki says the same thing:

While it is great to include attribution in metadata fields (such as EXIF for images), in most cases this is not the only place to include attribution information. This is because many users are likely not aware of, and will never see, attribution information included in metadata.

Many of these files have CC-BY and CC-BY-SA licences, so CC's recommened practices for attribution are relevant here. Though aside from any question of whether file metadata might be sufficient attribution to satisfy licence terms, I think it's also good professional courtesy to make sure attribution is clearly visible.

@jgonggrijp
Copy link
Contributor

I didn't mean to suggest that you should include the copyright and license information only in the metadata and nowhere else. You should definitely mention the fact that some images have separate licenses as well elsewhere, for example in the README (and in the corpus documentation as you suggested). This will alert people to whom it matters to look for those metadata. The metadata are just an alternative for the .license files.

@lukavdplas
Copy link
Contributor Author

Okay, looking through CC's licence terms and documentation, it does seem to be legally permissible to do it like that, as long as you don't make any edits to the images that would need to be described. (No standard way to do that in metadata, as far as I know.)

But even if this is technically and legally possible, why are you suggesting it in the first place? What is the problem with this format?

@jgonggrijp
Copy link
Contributor

More files more clutter. It's just a suggestion though, you can ignore it if you don't like it.

@lukavdplas
Copy link
Contributor Author

lukavdplas commented Dec 12, 2024

Great, thanks for looking into this! I'll add them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants