Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group submission access logs #5094

Closed
wants to merge 18 commits into from

Conversation

rgraber
Copy link
Contributor

@rgraber rgraber commented Sep 4, 2024

Checklist

  1. If you've added code that should be tested, add tests
  2. If you've changed APIs, update (or create!) the documentation
  3. Ensure the tests pass
  4. Make sure that your code lints and that you've followed our coding style
  5. Write a title and, if necessary, a description of your work suitable for publishing in our release notes
  6. Mention any related issues in this repository (as #ISSUE) and in other repositories (as kobotoolbox/other#ISSUE)
  7. Open an issue in the docs if there are UI/UX changes

Description

Group submission access logs together to make it easier to view and understand large collections of access logs.

Notes

Model changes:

  1. Adds a submission_group field to the AuditLog model, used for grouping submission logs together.
  2. Adds 2 new proxy models based on AuditLogs: SubmissionAccessLog and SubmissionGroup. These proxy models are in charge of creating instances with the correct auth_type.
  3. The SubmissionGroup proxy model is also responsible for returning only those access logs with the 'submission-group' auth type when queried. This was not necessary for the SubmissionAccessLog so was left off.
  4. The SubmissionAccessLog also has a method for adding itself to an existing SubmissionGroup or creating a new one

Signal changes:
Adds 2 new signals.

  1. SubmissionAccessLog: On saving a new SubmissionAccessLog, we look for the most recent submission group for the same user if it exists. If that submission group contains a log whose creation date is within a specified number of minutes of the creation time of the calling log, we add the calling log to the existing group. Otherwise, create a new one.
  2. SubmissionGroups: On saving a new SubmissionGroup, we assign it to itself (ie set submission_group=self). This makes the grouping query easier.

Api changes:

  1. Replaces the queryset for the AccessLog endpoints with one that uses fields and annotate to group access logs by the submission_group, which should only be set on SubmissionAccessLogs and SubmissionGroups. If there is no submission_group, it groups by the id instead, so all other logs are effectively grouped with themselves.
  • In order to improve query performance, this also removes fields that will be the same on all AccessLogs (model_type,app_name,action,and log_type). Removing these fields means the grouping clause doesn't have to check their equality each time.
  1. Adds a new serializer to deal with the new query, which returns dicts instead of AuditLogs.
  2. Sets the default search field on the /all endpoint to be the username since that's the only really searchable field the requirements call for (besides date, which doesn't make sense for a text search)

Replaces #5080
Depends on #5092

Copy link

@rgraber rgraber mentioned this pull request Sep 4, 2024
7 tasks
@rgraber rgraber added API Changes related to API endpoints Back end labels Sep 4, 2024
@rgraber rgraber mentioned this pull request Sep 5, 2024
7 tasks
@@ -1748,6 +1748,7 @@ def dj_stripe_request_callback_method():
'application/x-zip-compressed'
]

ACCESS_LOG_SUBMISSION_GROUP_TIME_LIMIT_MINUTES=60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would add a Constance settings for that. Superuser may want to customize this without redeploying a new release.

Comment on lines 222 to 240
class SubmissionGroupManager(AccessLogManager):
def get_queryset(self):
return (
super()
.get_queryset()
.filter(metadata__auth_type=ACCESS_LOG_SUBMISSION_GROUP_AUTH_TYPE)
)

def create(self, **kwargs):
metadata = kwargs.pop('metadata', {})
metadata['auth_type'] = ACCESS_LOG_SUBMISSION_GROUP_AUTH_TYPE
kwargs['metadata'] = metadata
return super().create(**kwargs)


class SubmissionGroup(AccessLog):
objects = SubmissionGroupManager()

class Meta:
proxy = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move them after SubmissionAccessLogManager .
G comes after A :-P

If it is because of the typing annotations in SubmissionAccessLog, let's wrap it in single quotes.

@@ -47,3 +47,35 @@ def get_date_created(self, audit_log):

def get_username(self, audit_log):
return audit_log.user.username


class UsernameHyperlinkField(serializers.HyperlinkedRelatedField):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving this in its own file in kpi/fields?

user = UsernameHyperlinkField(source='user__username')
date_created = serializers.SerializerMethodField()
username = serializers.CharField(source='user__username')
object_id = serializers.IntegerField()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it has already fixed in other PR, but just in case, as per our discussion, we should get rid of object_id.

Comment on lines 30 to 34
latest_group = (
SubmissionGroup.objects.filter(user_uid=instance.user_uid)
.order_by('-date_created')
.first()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part is called each time a submission comes in.
I'll try to optimize as much as possible to avoid any bottlenecks. What do you thing about something like that?

1) Retrieve only fields we need
We only need pk and user_uid from latest_group

latest_group = (
        SubmissionGroup.objects.only('pk', 'user_uid').filter(user_uid=instance.user_uid)
        .order_by('-date_created')
        .first()
    )

I would rather use .values() instead (to avoid loading the object, just retrieve the values) but it would involve to refactor for add_to_existing_submission_group and latest_entry to use only submission_group_id below.

It won't be way faster, but at least, IMO, a little bit lighter on memory footprint.

):
if self.user_uid != submission_group.user_uid:
logging.error(
f'Cannot add submission log with user {self.user_uid} to group with user {submission_group.user_uid}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way above 88 :-P

log.add_to_existing_submission_group(group)
log.save()
# make sure we reported an error
patched_error.assert_called_once()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen in a real scenario? Should not we raise an exception instead of just logging the error and continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never happen in a real scenario. It's for future-proofing in case other code needs to use this method. I wasn't sure not adding to a group would be a big enough problem to raise an exception but I'm pretty ambivalent about it

Comment on lines 185 to 186
group_query = SubmissionGroup.objects.filter(user=user)
self.assertEqual(group_query.count(), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should assert it equals 0 before submission access log is created?

new_submission = SubmissionAccessLog.objects.create(user=user)
self.assertEqual(group_query.count(), 2)
new_group = group_query.order_by('-date_created').first()
self.assertEqual(new_submission.submission_group.id, new_group.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if group_query.count() == 2, I think it could be great to validate new_group.id != old_group.id just in case.

Comment on lines 100 to 128

> Example
>
> curl -X GET https://[kpi-url]/access-logs/all

> Response 200

> {
> "count": 10,
> "next": null,
> "previous": null,
> "results": [
> {
> "app_label": "kobo_auth",
> "model_name": "User",
> "object_id": 1,
> "user": "http://localhost/api/v2/users/admin/",
> "user_uid": "uBMZxx9tVfepvTRp3e9Twj",
> "username": "admin",
> "action": "AUTH",
> "metadata": {
> "source": "Firefox (Ubuntu)",
> "auth_type": "Digest",
> "ip_address": "172.18.0.6"
> },
> "date_created": "2024-08-19T16:48:58Z",
> "log_type": "access"
> },
> ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accident from moving things around

@noliveleger
Copy link
Contributor

While I was testing the logs with mockobo, (which uses /api/v1/submissions with auth-token). I have noticed that it only logs one event even if I called the endpoints 40 times in less than a minute.

@rgraber rgraber force-pushed the TASK-869-group-submissions-better branch from 67579ff to 55b66ea Compare September 17, 2024 12:47
@rgraber rgraber marked this pull request as draft September 17, 2024 17:52
@rgraber
Copy link
Contributor Author

rgraber commented Sep 17, 2024

Converting back to draft while I work on a different implementation

@rgraber rgraber mentioned this pull request Sep 18, 2024
7 tasks
@rgraber rgraber force-pushed the refactor-access-logs-as-subclass branch 2 times, most recently from 5cc2fbc to 039b7ee Compare September 19, 2024 13:55
Base automatically changed from refactor-access-logs-as-subclass to beta-refactored September 19, 2024 16:07
@rgraber rgraber closed this Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes related to API endpoints Back end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants