Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group submission access logs #5113

Merged
merged 15 commits into from
Oct 2, 2024
Merged

Conversation

rgraber
Copy link
Contributor

@rgraber rgraber commented Sep 17, 2024

Checklist

  1. If you've added code that should be tested, add tests
  2. If you've changed APIs, update (or create!) the documentation
  3. Ensure the tests pass
  4. Make sure that your code lints and that you've followed our coding style
  5. Write a title and, if necessary, a description of your work suitable for publishing in our release notes
  6. Mention any related issues in this repository (as #ISSUE) and in other repositories (as kobotoolbox/other#ISSUE)
  7. Open an issue in the docs if there are UI/UX changes

📣Description

Updates the access logs endpoints to return submissions grouped together by hour to avoid floods of individual submission logs. Also fixes a bug where submissions to the v1 endpoint were not being counted as submissions.

Notes

Adds new queries to the AccessLogManager for returning access logs with submissions grouped together. The submissions are grouped by truncating the date created to the hour and combining it with the user_uid and using it as a grouping key. For non submissions, the group key is just the submission id, which is unique, and therefore none of them will be grouped together. This method results in a new group of submissions for every user every hour. For example, if user1 makes 3 submissions at 11:59 for and 3 submissions at 12:01, it will be reported as 2 different submission groups. 3 normal logins between 11:00 and 11:15 will still be reported as 3 separate events.

This requires changes to the serializer. It will now only return the minimum required fields (leaving out log_type, action, model_type, and app_name, all of which are the same for every access log).

Also fixes a bug where submissions to the v1 endpoint (XFormSubmissionApi) were not being recorded as such. To do this, it was necessary to replace the default token and basic authentication classes with the wrapped versions from kpi.authentication that have the call to create a new audit log.

Testing

Use mockobo to submit multiple submissions at once. Check api/v2/access-logs and api/v2/access-logs/all to make sure the submissions are grouped together in the response.
You can also create multiple submissions in the UI. If you create them within the same hour of the day, they should be grouped together.

Related issues

Replaces #5094

Copy link

@rgraber rgraber changed the base branch from beta-refactored to refactor-access-logs-as-subclass September 17, 2024 20:24
@rgraber rgraber changed the title Task 869 group at fetch Group submission access logs Sep 18, 2024
@rgraber rgraber force-pushed the refactor-access-logs-as-subclass branch 2 times, most recently from 5cc2fbc to 039b7ee Compare September 19, 2024 13:55
@rgraber rgraber added Back end API Changes related to API endpoints labels Sep 19, 2024
@rgraber rgraber marked this pull request as ready for review September 19, 2024 15:47
@rgraber
Copy link
Contributor Author

rgraber commented Sep 19, 2024

BTW the formatting should pass once #5119 is merged

Base automatically changed from refactor-access-logs-as-subclass to beta-refactored September 19, 2024 16:07
@rgraber rgraber closed this Sep 19, 2024
@rgraber rgraber reopened this Sep 19, 2024
@rgraber rgraber changed the base branch from beta-refactored to main September 19, 2024 19:58
@rgraber rgraber changed the base branch from main to beta-refactored September 19, 2024 19:58
# for submissions, the group key is hour created + user_uid
# this enables us to group submissions by user by hour
When(
metadata__auth_type=ACCESS_LOG_SUBMISSION_AUTH_TYPE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we may need an index metadata__auth_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not doing any filtering by auth type with this query. It ends up in postgres as

SELECT "auth_user"."username",
kpi-1  |        "audit_log_auditlog"."object_id",
kpi-1  |        "audit_log_auditlog"."user_uid",
kpi-1  |        CASE WHEN ("audit_log_auditlog"."metadata" -> 'auth_type') = '"submission"'::jsonb THEN CONCAT(((DATE_TRUNC('hour', "audit_log_auditlog"."date_created" AT TIME ZONE 'UTC'))::varchar)::text, ("audit_log_auditlog"."user_uid")::text)
kpi-1  |             ELSE ("audit_log_auditlog"."id")::varchar
kpi-1  |              END AS "group_key",
kpi-1  |        COUNT("audit_log_auditlog"."id") AS "count",
kpi-1  |        CASE WHEN ("audit_log_auditlog"."metadata" -> 'auth_type') = '"submission"'::jsonb THEN '{"auth_type": "submission-group"}'::jsonb
kpi-1  |             ELSE "audit_log_auditlog"."metadata"
kpi-1  |              END AS "metadata",
kpi-1  |        MIN("audit_log_auditlog"."date_created") AS "date_created"
kpi-1  |   FROM "audit_log_auditlog"
kpi-1  |   LEFT OUTER JOIN "auth_user"
kpi-1  |     ON ("audit_log_auditlog"."user_id" = "auth_user"."id")
kpi-1  |  WHERE ("audit_log_auditlog"."log_type" = 'access' AND "audit_log_auditlog"."user_uid" = 'ubVyXzUy7S9VdR2PgKkbdR')
kpi-1  |  GROUP BY "auth_user"."username",
kpi-1  |           "audit_log_auditlog"."object_id",
kpi-1  |           "audit_log_auditlog"."user_uid",
kpi-1  |           4,
kpi-1  |           6
kpi-1  |  ORDER BY 7 DESC
kpi-1  |  LIMIT 100

I don't think an index will do anything for us.

from rest_framework import serializers


class UsernameHyperlinkField(serializers.HyperlinkedRelatedField):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the class (and the file) to make it obvious it is expecting a dict.

Otherwise, we are using RelativePrefixHyperlinkedRelatedField everywhere else.
What about changing the its get_url method to be more permissive with obj. E.g.:

class RelativePrefixHyperlinkedRelatedField(HyperlinkedRelatedField):     
     ...
     def get_url(self, obj, view_name, request, format):
        if isinstance(obj, str):
            value = obj
        else:
            value = getattr(obj, self.lookup_field, None)

        return self.reverse(
            view_name, kwargs={self.lookup_url_kwarg: value}, request=request, format=format
        )

In the serializer, use RelativePrefixHyperlinkedRelatedField instead of UsernameHyperlinkField

    user = RelativePrefixHyperlinkedRelatedField(
        view_name='user-kpi-detail',
        source='user__username',
        queryset=get_user_model().objects.all(),
        style={'base_template': 'input.html'}  # Render as a simple text box
    )

Last option, keep UsernameHyperlinkField as-is (no rename), but still adapt get_url to support User object as described ⬆️

Serializers could call user = UsernameHyperlinkField(...) all time.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of updating RelativePrefix... to allow it to take strings. Updating

Comment on lines +181 to +182
AccessLog.objects.create(user=user1, date_created=today)
AccessLog.objects.create(user=user1, date_created=yesterday)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about switching both to be sure they are created in the different order we are expecting at the end?

'kpi.authentication.OPOAuth2Authentication.authenticate',
'kobo.apps.openrosa.libs.authentication.BasicAuthentication.authenticate',
)
def test_any_auth_for_submissions(self, authetication_method):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? authentication_method?

'kobo.apps.openrosa.apps.api.viewsets.xform_submission_api.XFormSubmissionApi.create', # noqa
return_value=HttpResponse(status=200),
):
# try both v1 and v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenRosa and v1 API ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Isn't v1 just the openrosa endpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgraber nope, they are two different things.

OpenRosa API implements three endpoints:

  • /formList
  • /manifest
  • /submission

That's what Enketo and Collect are using to POST data. They are expecting these routes.

/api/v1/submissions is a DRF endpoint that users could use to POST data via API.
IIRC, the former was calling a legacy django view before. It was two different part of code (which achieved the same thing). But now, they do call the same code

/<username>/submission	kobo.apps.openrosa.apps.api.viewsets.xform_submission_api.XFormSubmissionApi	submissions
/api/v1/submissions	kobo.apps.openrosa.apps.api.viewsets.xform_submission_api.XFormSubmissionApi	submissions-list
/submission	kobo.apps.openrosa.apps.api.viewsets.xform_submission_api.XFormSubmissionApi	submissions

Having that said, none of them are v2. You cannot POST to v2.

return_value=HttpResponse(status=200),
):
# check v1 and v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenRosa and v1 API ;-)

@noliveleger noliveleger self-assigned this Sep 26, 2024
@noliveleger noliveleger merged commit f1e5c20 into beta-refactored Oct 2, 2024
6 of 8 checks passed
@noliveleger noliveleger deleted the TASK-869-group-at-fetch branch October 2, 2024 13:17
@noliveleger noliveleger restored the TASK-869-group-at-fetch branch October 7, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes related to API endpoints Back end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants