Skip to content

Conversation

@talboren
Copy link
Member

@talboren talboren commented Mar 22, 2025

close #4165

Improves the performance of mapping by moving the logic to the DB side.
Since rows is just a simple "table" (e.g., a CSV file), we can flatten the JSON column with DB functions (i.e., JSON_TABLE, or

WITH flattened AS (
                SELECT
                    value AS row_data
                FROM mappingrule, json_each(mappingrule.rows)
                WHERE mappingrule.id = :rule_id
            )
            SELECT row_data FROM flattened

in the case of SQLite, and run the respective query in the DB.

In cases where we have a lot of rows (>50K), it's much more performant then loading everything in memory on every event..

TBD:

  • Write tests for MySQL
  • Write tests for SQlite
  • Write tests for PG

Useful: https://docs.sqlalchemy.org/en/14/orm/loading_columns.html#deferred-column-loading

@vercel
Copy link

vercel bot commented Mar 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
keep ⬜️ Ignored (Inspect) Visit Preview Apr 7, 2025 3:21pm

@codecov
Copy link

codecov bot commented Mar 22, 2025

Codecov Report

Attention: Patch coverage is 7.72059% with 251 lines in your changes missing coverage. Please review.

Project coverage is 30.59%. Comparing base (b1d0c31) to head (b8dedc0).
Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
keep/api/bl/mapping_rule_matcher.py 8.29% 177 Missing ⚠️
keep/api/bl/enrichments_bl.py 5.26% 72 Missing ⚠️
keep/api/core/db.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4166      +/-   ##
==========================================
- Coverage   30.95%   30.59%   -0.36%     
==========================================
  Files          92       93       +1     
  Lines       10257    10613     +356     
==========================================
+ Hits         3175     3247      +72     
- Misses       7082     7366     +284     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@talboren
Copy link
Member Author

@skynetigor i'm not sure if cel_to_sql is the right place to put it, but it kinda reminded me with what you're doing with formatting the query to work on the DB. That's why I put it over there.
Would love your review btw, because I think you have a lot of context about formatting queries ;)

@VladimirFilonov
Copy link
Contributor

@talboren crazy question - don't we want to convert CSV with table on upload, but not on runtime?

@skynetigor
Copy link
Contributor

@talboren Great idea about utilizing DB for this 👍
To help keep the folder structure clean and organized, I believe this file aligns more with the mapping rule business logic rather than being a generic utility.
Since CEL to SQL is a generic mechanism that can be reused for querying or faceting data across the DB, I’d suggest placing mapping_rule_matcher closer to the bl folder instead.

@talboren talboren force-pushed the chore/mapping-perf branch 3 times, most recently from 71d4156 to d533568 Compare March 25, 2025 20:04
@talboren talboren changed the title feat: improve performance of mapping [draft] feat: improve performance of mapping Mar 27, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Mar 27, 2025

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "[draft] feat: improve performance of mapping". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@talboren
Copy link
Member Author

Still working on it! Will try to finalize next week.

@talboren talboren closed this Apr 14, 2025
@talboren talboren deleted the chore/mapping-perf branch May 6, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[🐛 Bug]: Mapping performance improvements

4 participants