Skip to content

Comments

add analytics module, historical data logging, and session statistics#29

Open
arpit-k16 wants to merge 3 commits intoprattyan:mainfrom
arpit-k16:feature/analytics-and-historical-data
Open

add analytics module, historical data logging, and session statistics#29
arpit-k16 wants to merge 3 commits intoprattyan:mainfrom
arpit-k16:feature/analytics-and-historical-data

Conversation

@arpit-k16
Copy link

@arpit-k16 arpit-k16 commented Feb 9, 2026

Changes

  • analytics.py: summary stats (mean, std, min, max, percentiles), congestion metrics, session summary, CSV export helpers
  • Historical logging: traffic snapshots every 5s to data/traffic_session_*.csv; optional POST to Django API when TRAFFIC_API_URL is set
  • Dashboard: new "Session statistics" card (mean, min, max, σ, congestion ratio, elapsed time)
  • Django API: TrafficSnapshot model, CRUD + GET /api/traffic-snapshots/summary/?hours=24 for aggregated analytics
  • Docs: README Data & analytics section, main.py run instructions, project structure; requirements.txt cleaned; contributing.md updated

Summary by CodeRabbit

  • New Features

    • Session analytics UI showing vehicle breakdown, congestion metrics, elapsed time, and data export hint.
    • Periodic persistent session logging to CSV and optional posting to a configurable analytics API.
    • New read-only API resource for traffic snapshots with a summary endpoint for aggregated historical queries.
    • Admin/listing improvements for easier exploration of snapshot records.
  • Documentation

    • Updated docs and contributing guide to reflect project layout and new analytics/data features.

…istics

- Add analytics.py: summary stats, congestion metrics, session summary, CSV export
- Log traffic snapshots to data/traffic_session_*.csv every 5s for analysts
- Optional POST to Django API (TRAFFIC_API_URL) when backend is running
- Add Session Statistics card to dashboard (mean, min, max, std, congestion)
- Django: TrafficSnapshot model, CRUD + GET .../summary/?hours=24
- README: document data/analytics, fix app.py -> main.py, project structure
- requirements.txt: clean minimal deps; contributing.md: main.py and analytics

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Adds an analytics module and session CSV logging to the Dash app, optional posting to a new Django API, and extends the Django API with a TrafficSnapshot model, serializer, viewset (with summary action), admin registrations, and a migration.

Changes

Cohort / File(s) Summary
Documentation
README.md, contributing.md
Updated project layout and instructions: renamed project directory, replaced app.py with main.py, documented analytics, CSV session logging, and optional Django API configuration.
Analytics Module
analytics.py
New utilities: compute_summary_stats, compute_congestion_stats, session_summary, and export_session_csv for session analytics and CSV export.
Main Application
main.py
Conditional analytics import, in-memory traffic_snapshots, _snapshot_row() and _log_snapshot_to_csv() (CSV write + optional POST to API), periodic snapshot logging, session-stats UI card, and extended dashboard update outputs.
Django API — Models & Migration
traffic_manage/api/models.py, traffic_manage/api/migrations/0002_trafficsnapshot.py
Adds TrafficSnapshot model (timestamp, vehicle counts by type, congestion_pct, decision) and migration to create table with timestamp index and ordering.
Django API — Views, Serializers, URLs, Admin
traffic_manage/api/views.py, traffic_manage/api/serializers.py, traffic_manage/api/urls.py, traffic_manage/api/admin.py
Adds TrafficSnapshotSerializer, TrafficSnapshotViewSet (read-only + summary action with optional hours filter & aggregations), router registration at traffic-snapshots, and admin registrations/listing fields.
Project root
analytics.py, main.py, data/*
New analytics entrypoint and runtime data/ CSV logs for persisted sessions; main.py now coordinates analytics and optional API posting.

Sequence Diagram

sequenceDiagram
    participant Dashboard as Dashboard/Main App
    participant Analytics as Analytics Module
    participant CSV as CSV File System
    participant API as Django API

    Dashboard->>Analytics: compute_summary_stats(traffic_history)
    Analytics-->>Dashboard: stats
    Dashboard->>Analytics: compute_congestion_stats(traffic_history)
    Analytics-->>Dashboard: congestion metrics
    Dashboard->>Analytics: session_summary(traffic_history, vehicle_types, start_time)
    Analytics-->>Dashboard: session summary
    Dashboard->>CSV: append snapshot row to /data/traffic_session_YYYYMMDD.csv
    CSV-->>Dashboard: file path / error
    Dashboard->>API: POST /traffic-snapshots/ (optional snapshot payload)
    API-->>Dashboard: 200 or error (ignored)
    Dashboard->>Dashboard: render session-stats UI
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Added REST API #22: Modified the traffic_manage/api REST API scaffolding; this PR extends that API by adding the TrafficSnapshot model, serializer, viewset and routes.

Suggested reviewers

  • AbhayNath001

Poem

🐰 hop, hop — I count and I see,
CSV trails and stats for thee,
snapshots saved by day and night,
Django hums and dashboards light,
carrots for coders — analytics delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding an analytics module, historical data logging, and session statistics features across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@analytics.py`:
- Line 94: Replace the dynamic import call in the elapsed time calculation by
adding a top-level "import time" with the other stdlib imports and changing the
line that sets elapsed_sec (which currently uses __import__("time").time()) to
use time.time() instead (i.e., elapsed_sec = max(0, time.time() - start_time));
update references to the time module only where needed in analytics.py.

In `@main.py`:
- Around line 73-78: Replace the inline dynamic import with a top-level import:
add "import csv" at the top of main.py and change the inline
__import__("csv").DictWriter usage to csv.DictWriter where the writer is created
(the line assigning w using DictWriter for _session_csv_path and row). Ensure
existing behavior (writeheader when file not present and w.writerow(row))
remains unchanged.
- Around line 82-96: The current unconditional POST to TRAFFIC_API_URL blocks
the callback when the API is down; change this to an opt-in flow and log
failures: before building payload/Request, check an explicit opt-in like
os.environ.get("TRAFFIC_API_ENABLED") == "1" (or ensure TRAFFIC_API_URL is
explicitly set and not the default localhost) and skip the HTTP block if not
enabled; preserve payload and req creation only when enabled, and replace the
bare except with a logging call (e.g., logger.exception or logger.error with
exception info) instead of silently passing, referencing TRAFFIC_API_URL,
TRAFFIC_API_ENABLED, payload, urllib.request.Request and urllib.request.urlopen
to locate the code to modify.

In `@README.md`:
- Around line 98-112: The fenced directory-tree code block in README.md is
missing a language specifier (MD040); update the opening fence for the block
that contains the traffic-manage tree to include a language identifier such as
"text" (i.e., change the opening ``` to ```text) so markdownlint stops flagging
it and the block renders with explicit language metadata.

In `@traffic_manage/api/migrations/0002_trafficsnapshot.py`:
- Line 17: Update the migration's timestamp field to match the model's behavior
by replacing auto_now_add=True with default=timezone.now (and keep
db_index=True) so client-supplied timestamps from the Dash app (main.py) are not
ignored; also add "from django.utils import timezone" to the migration imports
and ensure the field remains editable per the model (i.e., do not set
editable=False).

In `@traffic_manage/api/views.py`:
- Around line 37-40: TrafficSnapshotViewSet currently inherits from ModelViewSet
which exposes full CRUD; change it to use ReadOnlyModelViewSet and add an
explicit create mixin (e.g., mixins.CreateModelMixin) if only creation is
needed, or keep ReadOnlyModelViewSet and add appropriate permission_classes
(e.g., IsAuthenticated or a custom permission) to restrict write operations;
update the class definition for TrafficSnapshotViewSet and ensure any allowed
write methods (create) are only available when the permission_classes permit
them.
- Around line 49-53: The code calls request.query_params.get("hours", type=int)
which is invalid in Django; change it to fetch the raw value via
request.query_params.get("hours"), then safely convert to int (e.g., try/except
ValueError) and only apply the timedelta/filter when the parsed integer is > 0;
update the block that assigns hours, computes since with timezone.now() and
timedelta, and applies qs = qs.filter(timestamp__gte=since) accordingly so
malformed or absent values are ignored instead of raising TypeError.
🧹 Nitpick comments (6)
analytics.py (1)

105-126: export_session_csv derives fieldnames from only the first row.

If rows have inconsistent keys (e.g., a later dict has an extra key), DictWriter will silently drop those values (default extrasaction='raise' would actually raise — but only for extra keys; missing keys produce empty cells). Consider collecting the union of all keys or documenting the assumption that all dicts share the same schema.

main.py (2)

86-87: Redundant timestamp manipulation.

Line 86 filters out "timestamp" from row, then line 87 immediately adds it back. The payload ends up identical to row. Either simplify to payload = dict(row) or remove these two lines.

♻️ Proposed fix
-        payload = {k: v for k, v in row.items() if k != "timestamp"}
-        payload["timestamp"] = row["timestamp"]
+        payload = dict(row)

43-45: Remove the unused traffic_snapshots variable.

The list is allocated on line 44 but is never appended to or read anywhere in the codebase. Remove it unless it's intended for future use.

traffic_manage/api/models.py (1)

27-42: Consider adding a __str__ method and validating congestion_pct range.

Two minor suggestions:

  1. All other models lack __str__ too, but since TrafficSnapshot will appear in Django admin (per the admin registration), a __str__ would improve usability.
  2. congestion_pct has no constraint preventing values outside 0–100, which could lead to misleading analytics.
Proposed improvements
     decision = models.CharField(max_length=64, blank=True)
 
+    def __str__(self):
+        return f"Snapshot {self.timestamp:%Y-%m-%d %H:%M:%S} ({self.vehicle_count} vehicles)"
+
     class Meta:
         ordering = ["-timestamp"]

For validation, consider using MinValueValidator/MaxValueValidator or a PositiveIntegerField with a check constraint on congestion_pct.

traffic_manage/api/views.py (2)

48-48: Use self.get_queryset() instead of TrafficSnapshot.objects.all() to respect any future filtering or overrides.

Suggested change
-        qs = TrafficSnapshot.objects.all()
+        qs = self.get_queryset()

62-68: Rounding logic is repetitive — consider a helper or dict comprehension.

Minor nit; the three nearly identical if/round blocks could be condensed.

Example
-        if agg.get("mean_vehicle_count") is not None:
-            agg["mean_vehicle_count"] = round(agg["mean_vehicle_count"], 2)
-        if agg.get("std_vehicle_count") is not None:
-            agg["std_vehicle_count"] = round(agg["std_vehicle_count"], 2)
-        if agg.get("mean_congestion") is not None:
-            agg["mean_congestion"] = round(agg["mean_congestion"], 2)
+        for key in ("mean_vehicle_count", "std_vehicle_count", "mean_congestion"):
+            if agg.get(key) is not None:
+                agg[key] = round(agg[key], 2)

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@main.py`:
- Around line 759-761: The call to session_summary is using the live shared
deque traffic_history instead of the thread-local snapshot history_list, causing
a race; change the call at the ANALYTICS_AVAILABLE block to pass the
already-snapshotted history_list (not traffic_history) into session_summary.
Locate the call to session_summary and replace the argument traffic_history with
history_list, and if session_summary's parameter type or processing assumes a
deque, adapt session_summary (function definition) to accept the snapshot format
of history_list so no concurrent access to the live deque occurs.
🧹 Nitpick comments (4)
analytics.py (2)

75-77: Magic number 15 in congestion ratio denominator.

The hard-coded 15 on line 77 silently caps the congestion ratio for low-traffic scenarios. Consider extracting it as a parameter (like threshold_high) or a module-level constant with a comment explaining the rationale, so future maintainers understand the cap.


122-126: Fieldnames derived solely from the first row may cause issues with heterogeneous dicts.

If any subsequent row contains keys not present in rows[0], DictWriter will raise ValueError by default. If rows could have varying keys, consider collecting the union of all keys:

♻️ Suggested hardening
-    keys = list(rows[0].keys())
+    keys = list(dict.fromkeys(k for row in rows for k in row))
main.py (2)

753-756: Side-effectful I/O inside a Dash callback.

_log_snapshot_to_csv performs file writes and potentially a blocking HTTP POST (up to 1 s) on the Dash callback thread, which delays the UI response. Consider offloading this to a background thread or a dcc.Interval-driven non-blocking mechanism to keep the dashboard responsive.


44-46: traffic_snapshots list is declared but never used.

This list is initialized on line 45 but is never appended to or read anywhere in this file. If it's intended scaffolding for future use, consider adding a # TODO comment; otherwise remove it to avoid confusion.

…docs

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@traffic_manage/api/views.py`:
- Around line 37-43: TrafficSnapshotViewSet is read-only but main.py posts to
/api/traffic-snapshots/, causing 405s; update the view to accept creates by
adding CreateModelMixin (e.g., inherit from mixins.CreateModelMixin plus
viewsets.ReadOnlyModelViewSet’s base or switch to viewsets.ModelViewSet) so POST
requests from main.py succeed—modify the TrafficSnapshotViewSet class
declaration to include CreateModelMixin (or use ModelViewSet) and ensure
serializer_class and queryset remain unchanged.
🧹 Nitpick comments (5)
traffic_manage/api/views.py (1)

45-76: Summary action looks correct after prior fixes.

The hours query param parsing with try/except is now safe. Aggregation and rounding logic handles None values properly (e.g., empty querysets or SQLite StdDev returning None).

One minor note: the from datetime import timedelta on line 58 is an inline import — consider moving it to the top-level imports for consistency.

main.py (4)

44-46: traffic_snapshots list is declared but never used.

The traffic_snapshots list on line 45 is populated nowhere in the visible code. If it was intended for in-memory snapshot accumulation, it's dead code currently. Either wire it up or remove it to avoid confusion.


49-60: _snapshot_row reads the global vehicle_types directly — consider passing it as a parameter.

This function implicitly depends on the global vehicle_types dict, which is written by the video thread. Making it an explicit parameter improves testability and makes the data flow clearer.


86-97: No URL scheme validation on TRAFFIC_API_URL.

urllib.request.urlopen (line 97) will follow any scheme (file:, ftp:, etc.) provided via the environment variable. While this is admin-controlled, a simple scheme check would harden it.

🛡️ Proposed fix
     api_url = os.environ.get("TRAFFIC_API_URL")
-    if api_url:
+    if api_url and api_url.startswith(("http://", "https://")):

757-759: Side-effect (CSV I/O + network POST) inside a Dash callback.

_log_snapshot_to_csv performs file I/O and potentially a blocking HTTP POST (0.5s timeout) on every 5th callback invocation. This runs on the Dash server's callback thread and can delay dashboard updates. Consider offloading this to a background thread or queue for better responsiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants