Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add api endpoint to duplicate document #570

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sampaccoud
Copy link
Member

@sampaccoud sampaccoud commented Jan 21, 2025

Purpose

We want users to be able to duplicate a document.

resolves #335 and #567

Proposal

  • Refactor the way access control is done on attachment files to play well with document duplication
  • Add a "duplicate" action to the document API endpoint.
  • Extract attachment keys from content when a document is saved, and add them to the list of readable attachments on the current document but only if they were already readable on another document by the editing user...

While doing this work I was brought to refactor the way the media-auth and collaboration-auth actions were factorizing code and add missing tests for these actions.

@sampaccoud sampaccoud self-assigned this Jan 21, 2025
Dockerfile Show resolved Hide resolved
src/backend/core/api/viewsets.py Outdated Show resolved Hide resolved
src/backend/core/api/viewsets.py Outdated Show resolved Hide resolved
src/backend/core/api/viewsets.py Outdated Show resolved Hide resolved
src/backend/core/api/viewsets.py Outdated Show resolved Hide resolved
Documents content is stored in the Ydoc format. We need a util
to extract it as xml/text.
These 2 actions had factorized code but a few iterations lead to
spaghetti code where factorized code includes "if" clauses.

Refactor abstractions so that code factorization really works.
Tests were forgotten. While writing the tests, I fixed
a few edge cases like the possibility to connect to the
collaboration server for an anonymous user.
These methods were involved in a bug that was fixed without first
evidencing the error in a test:
#556

Fixes #567
We took this opportunity to refactor the way access is controlled on
media attachments. We now add the media key to a list on the document
instance each time a media is uploaded to a document. This list is
passed along when a document is duplicated, allowing us to grant
access to readers on the new document, even if they don't have or
lost access to the original document.

We also propose an option to reproduce the same access rights on the
duplicate document as what was in place on the original document.
This can be requested by passing the "with_accesses=true" option in
the query string.

The tricky point is that we need to extract attachment keys from the
existing documents and set them on the new "attachments" field that is
now used to track access rights on media files.
We can't prevent document editors from copy/pasting content to from one
document to another. The problem is that copying content, will copy the
urls pointing to attachments but if we don't do anything, the reader of
the document to which the content is being pasted, may not be allowed to
access the attachment files from the original document.

Using the work from the previous commit, we can grant access to the readers
of the target document by extracting the attachment keys from the content and
adding themto the target document's "attachments" field. Before doing this,
we check that the current user can indeed access the attachment files extracted
from the content and that they are allowed to edit the current document.
The idea behind wrapping choices in `lazy` function was to allow
overriding the list of languages in tests with `override_settings`.
This was causin makemigrations to keep on including the field in
migrations when it is not needed. Since we finally don't override
the LANGUAGES setting in tests, we can remove it to fix the problem.
@sampaccoud sampaccoud force-pushed the add-api-endpoint-to-duplicate-document branch from a138321 to 95241cd Compare January 28, 2025 07:44
@sampaccoud sampaccoud requested a review from lunika January 28, 2025 07:44
@sampaccoud
Copy link
Member Author

@lunika done implementing your feedbacks 🙏

Copy link
Collaborator

@AntoLC AntoLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have existing data the last migration does not succeed.

I got a bit confused about the with_accesses as a query string instead of from the payload.
Except that works well! I used it frontend side, I challenged it a bit, works like excepted.

Comment on lines +12 to +22
def populate_attachments_on_all_documents(apps, schema_editor):
"""Populate "attachments" field on all existing documents in the database."""
Document = apps.get_model("core", "Document")

for document in Document.objects.all():
document.attachments = extract_attachments(document.content)
document.save(update_fields=['attachments'])


class Migration(migrations.Migration):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to apply the migration I get AttributeError: 'Document' object has no attribute 'content'.

It could be interesting to test the migration, there is a nice lib to do it django-test-migrations.
If you are interested, here some examples:
7e05800
5711bf5#diff-b94dbfcff3e13c1f70f5c42f5c071dde9fb1a4f884f80fe18a6ce522d5019476

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact, I have the same error ! The first time I applied the migration I didn't have documents saved !

document to allow cross-access.

Optionally duplicates accesses if `with_accesses` is set to true
in the payload.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, with_accesses is not set from the payload but from the query string.
Why not using the payload btw ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Do you think it's better in the payload? I think you're right...

@@ -395,9 +444,26 @@ class Document(BaseModel):
blank=True,
null=True,
)
duplicated_from = models.ForeignKey(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this column have a purpose in the system or is it just to get some information ?

Copy link
Member Author

@sampaccoud sampaccoud Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment just to keep the information as it seems legit. It is not being used for display or anything at the moment.

from core import enums


def base64_yjs_to_xml(base64_string):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see it used ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate docs
3 participants