Skip to content

Conversation

@mohamedelabbas1996
Copy link
Contributor

@mohamedelabbas1996 mohamedelabbas1996 commented Jul 22, 2025

Summary

This PR fixes the issue of incorrectly grouped sessions (events) by refactoring and improving the image grouping logic when syncing a deployment.

List of Changes

  • Refactored group_images_into_events to support a new use_existing flag:

    • If use_existing=True:

      • Only new images (images not assigned to an event) are grouped.
      • Groups are merged into existing events based on time proximity and overlap.
    • If use_existing=False:

      • All images in the deployment are re-grouped.
      • If a new group exactly matches an existing event (same start and end time), it reuses the event.
      • Otherwise, new events are created.
  • Removed the group_by field from the Event model.

  • Added two admin actions:

    • Fix Sessions: allows to fix incorrectly grouped sessions.
    • Remove from Event: allows admins to manually remove selected source images from selected events.

Related Issues

Closes #237

Detailed Description

Previously, session grouping relied on the group_by field, which reused an existing event if a group had the same start date. This caused issues when images taken on the same day—but far apart in time—were incorrectly grouped into a single session, even though timestamp-based grouping split them into multiple groups. Since all groups shared the same start date, they got assigned to the same event due to group_by.
This PR fixes this issue by improving the group_images_into_events function. It introduces a use_existing flag to control the behavior: when use_existing=False, all deployment images are regrouped; when True, only new images (those not yet assigned to an event) are processed. Images are grouped based on their timestamps using a max_time_gap threshold, and then each group is either merged into an existing event (if overlapping or close enough and use_existing=True) or assigned to a new event (or an existing one if its start and end time exactly match the group). The group_by field is removed from the Event model, and an admin action is added to help fix incorrectly grouped sessions. Additionally, cached fields in event related models (e.g. Occurrence) are updated accordingly.

Screenshots

N/A

Deployment Notes

N/A

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

@netlify
Copy link

netlify bot commented Jul 22, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 8ec6671
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/68b636730aa86f0008052616

@mohamedelabbas1996 mohamedelabbas1996 self-assigned this Jul 22, 2025
# Get only newly added images (images without an event)
image_qs = image_qs.filter(event__isnull=True)

images = list(image_qs.order_by("timestamp"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You likely don't have to evaluate the queryset yet with list(image_qs). You can check if images are found with images_qs.exists(), which is efficient for large datasets.

event = None
if use_existing:
# Look for overlap or proximity
for existing_event in existing_events:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are looping over a queryset, you can do for existing_event in events_qs, which supposedly avoids loading the whole queryset result into memory. Sometimes you need to convert to a list so you can index the list like events[3], but often you never need to convert to a list.

email = os.environ.get("DJANGO_SUPERUSER_EMAIL", "Unknown")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional?

return created


def create_captures_in_range(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks helpful, thanks!


if event:
if use_existing:
# Adjust times if necessary (merge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? I think you just checked if and existing event has the exact start & end time. Perhaps you meant to do an OR query? If an existing event has either the same start or end time as the group.

If there is an existing event with exactly the same start AND end time (for same deployment), then I don't think we should check for use_existing. Just re-use those without question.

@mihow mihow requested a review from Copilot July 24, 2025 00:55

This comment was marked as outdated.

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

This is looking good!! One of our oldest issues :)

Will you also add a function for fixing existing events? It can be just a Django admin function in the Session list view (allow selecting multiple sessions). We need something to fix the sessions like this:

https://antenna.insectai.org/projects/18/sessions/2579
https://antenna.insectai.org/projects/18/sessions/5284

One approach could be to set the images in the selected sessions so that event=None, e.g. "Remove images from session". Then run the normal group_images_into_events function for the whole deployment. But open to other ideas.

I still like the idea of a function that can scan all sessions in the deployment (or project) and "detect" if there are images that shouldn't be there (based on the gap setting). Then we can alert the user that it needs to be regrouped.

Will you make a follow-up ticket for making the max gap setting a Project setting? for #893

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

I tested on the sessions in the Mothra deployment in project 18 and was able to repair the existing sessions:

before:
image

after:
image
image

you can see it broke the two long sessions into smaller chunks

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

I pushed a change with my suggested action for removing source images from existing events. I also noticed that Occurrences have a cached field that keeps track of the event as well, so this needs to be updated in our grouping methods. There are other ways to keep occurrences in sync, but they will likely be per-occurrence or per-image update, whereas this can update more at once.

@mohamedelabbas1996 mohamedelabbas1996 changed the title [Draft] Fix incorrect session grouping Fix incorrect session grouping Jul 25, 2025
@mohamedelabbas1996 mohamedelabbas1996 marked this pull request as ready for review July 25, 2025 09:11
queryset.dissociate_related_objects()
self.message_user(request, f"Dissociated {queryset.count()} events from captures and occurrences.")

@admin.action(description="Fix sessions by regrouping images")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a dedicated fix_sessions action. However you can use the function above (queryset.dissociate_related_objects()) to remove images and occurrences from the Event. That's what I designed it for. Something like:

queryset.dissociate_related_objects()
for deployment in deployments:
    group_images_into_events(deployment)

I think use_existing=True works in this case

Copy link
Collaborator

@mihow mihow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamedelabbas1996 I just brought this up-to-date with main and rebased the migration and fixed one test. I noticed there are type errors for event.end since the end time can be None if the event is on-going. We should handle these None cases, or if that adds too much complexity, we can start requiring an end time on the model, and set it to the last capture's timestamp (and perhaps use another method to detect ongoing events, like if the end timestamp is within the time gap of the current real-world time).

image

@mihow
Copy link
Collaborator

mihow commented Aug 29, 2025

@mohamedelabbas1996

Can you confirm how much this will affect existing sessions when we deploy? And after we re-sync a deployment? I think it's pretty safe if this only affects existing sessions on-demand.

When new data is synced for a deployment, will a split happen automatically if an existing session is incorrect? (e.g. 12 hour session with 2 short test sessions within it). Or is it only appends, prepends and new sessions?

I'm just trying to gauge how much can change when we deploy this. Thanks for refreshing my memory!

I would like to do an audit of all sessions in the live DB that are over 9 hours, then we can see what will happen to those.

@mohamedelabbas1996
Copy link
Contributor Author

mohamedelabbas1996 commented Aug 29, 2025

@mohamedelabbas1996

Can you confirm how much this will affect existing sessions when we deploy? And after we re-sync a deployment? I think it's pretty safe if this only affects existing sessions on-demand.

When new data is synced for a deployment, will a split happen automatically if an existing session is incorrect? (e.g. 12 hour session with 2 short test sessions within it). Or is it only appends, prepends and new sessions?

I'm just trying to gauge how much can change when we deploy this. Thanks for refreshing my memory!

I would like to do an audit of all sessions in the live DB that are over 9 hours, then we can see what will happen to those.

Since we call group_images_into_events with use_existing=True by default during syncs, existing sessions are mostly unaffected: only newly added images are considered, and they either get grouped into a new event or merged into an overlapping/nearby existing event; in some cases this can also cause two existing sessions to merge if the new images bridge the gap between them, but no existing sessions are ever split. If there are no new images, the function returns immediately and makes no changes. Incorrectly grouped sessions will not be fixed automatically without calling the fix_sessions action. This admin action explicitly runs the same function with use_existing=False, which disregards existing assignments and regroups all images for the deployment, meaning sessions can both split and merge according to the max_time_gap

@mihow mihow requested a review from Copilot September 1, 2025 18:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect session grouping by refactoring the image grouping logic and removing the problematic group_by field from the Event model. The changes improve how images are organized into monitoring sessions based on timestamp proximity rather than simple date-based grouping.

  • Replaces date-based grouping with time-range-based merging logic using a use_existing flag
  • Removes the group_by field from Event model and adds new unique constraint on deployment/start/end
  • Adds admin actions for fixing sessions and managing event associations

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ami/utils/dates.py Adds utility function for checking time range overlap and proximity
ami/tests/fixtures/storage.py Adds support for custom beginning timestamp in test data generation
ami/tests/fixtures/main.py Adds new test helper functions and fixes duplicate logging statements
ami/tasks.py Updates regroup task to use new grouping logic without returning events
ami/main/tests.py Adds comprehensive test coverage for new grouping behavior scenarios
ami/main/models.py Major refactoring of Event model and grouping logic with new manager/queryset
ami/main/migrations/0071_remove_event_unique_event_and_more.py Database migration removing group_by field and updating constraints
ami/main/admin.py Adds new admin actions for session management

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +470 to +471
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
Copy link

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are duplicated from lines 468-469. Remove the duplicate logging statements.

Suggested change
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")

Copilot uses AI. Check for mistakes.

audit_event_lengths(deployment)

audit_event_lengths(deployment)
Copy link

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audit_event_lengths(deployment) call is duplicated (also on line 1268). Remove one of these duplicate calls.

Suggested change
audit_event_lengths(deployment)

Copilot uses AI. Check for mistakes.
This is useful when the event is being deleted or dissociated from its captures.
It does not delete the event itself, but removes its associations with source images and occurrences.
This was created to reassociate source imag es and occurrences with a new event
Copy link

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra space in 'imag es'. It should be 'images'.

Suggested change
This was created to reassociate source imag es and occurrences with a new event
This was created to reassociate source images and occurrences with a new event

Copilot uses AI. Check for mistakes.
"id",
"path",
)
search_fields = ("id", "path", "event__start__date")
Copy link

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search field 'event__start__date' is incorrect for Django lookups. It should be 'event__start__date' for exact date matching or use a different approach like 'event__start' for datetime searching.

Suggested change
search_fields = ("id", "path", "event__start__date")
search_fields = ("id", "path")

Copilot uses AI. Check for mistakes.
@mihow
Copy link
Collaborator

mihow commented Sep 2, 2025

Hi @rhine3!

We have this fix for events/sessions that are incorrectly grouped (sessions that span multiple days, or a single session that contains multiple short sessions). I am testing the fix on a recent db snapshot and seeing how it will affect existing deployments & sessions. Below is the output for one deployment in your project. It seems that most changes are about splitting an event into multiple events - one main 7hr event and then some random 1 minute test sessions where the camera was turned on briefly. This will not be run automatically when we deploy it, but it will be run next time your deployment images are synced. Would this change throw off your existing analysis, or do you welcome it? Thanks in advance for your feedback.

Summary for a single deployment (#220 LEPS-033_Box1)

INFO 2025-09-01 19:57:36,790 models 1 136098806425408 

Finished grouping 7636 images into 5 events for deployment '#220 LEPS-033_Box1'
  Events: 3 -> 5
  Captures: 7636 -> 7636

  BEFORE Statistics:
  Count: 3
  Duration: avg=8.8h, std=3.1h, range=7.0h-12.3h
  Captures: avg=2545, std=47, range=2518-2600

  AFTER Statistics:
  Count: 5
  Duration: avg=4.2h, std=3.8h, range=0.1h-7.0h
  Captures: avg=1527, std=1357, range=38-2518
  New events (3):
    - Event 7037: 2024-09-02 15:11:49 to 2024-09-02 15:18:00 (0.1 hours, 38 captures)
    - Event 7038: 2024-09-02 17:44:50 to 2024-09-02 17:52:00 (0.1 hours, 44 captures)
    - Event 7039: 2024-09-02 20:30:30 to 2024-09-03 03:30:00 (7.0 hours, 2518 captures)
  Modified events: None
  Unchanged events: 2
  Deleted events (0 captures): 1
    - Event 4400: 2024-09-02 15:11:49 to 2024-09-03 03:30:00 (2600 captures) [DELETED]

  Day-to-day analysis:
    Days with multiple events AFTER regrouping: 1
      2024-09-02: 3 events
        - 15:11 to 15:18 (0.1h, 38 captures)
        - 17:44 to 17:52 (0.1h, 44 captures)
        - 20:30 to 03:30 (7.0h, 2518 captures)

  Multi-day event analysis:
    (Multi-day = spanning MORE than 2 calendar days)
    (Normal overnight monitoring spans exactly 2 days)

All deployments in project 84

Summary:
  Total events: 158 -> 215
  Total captures: 384507 -> 389334
  Events created: +57

@rhine3
Copy link
Collaborator

rhine3 commented Sep 16, 2025

Realized I replied to you directly but never posted it here - yes, I HUGELY welcome this change! I do want to exclude the 1min camera turn-on events from my analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some sessions are incorrectly grouped

4 participants