Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(sql): use org and course_key filters to use primary key indexes #957

Merged
merged 4 commits into from
Sep 30, 2024

Conversation

Ian2012
Copy link
Contributor

@Ian2012 Ian2012 commented Sep 30, 2024

Description

Based on a performance check, we found that using the course_key filter instead of course_name better uses ClickHouse primary key indexes, reducing the number of rows processed and improving memory usage and execution time. This PR updates most of the datasets to make use of the primary key indexes.

It also deletes unused .sql template files.

Benchmarks

Numbers speak for themselves.

Before:

1. learner-groups - Problem Attempts
Superset time: 3.416288 (s).
Query duration: 1.818 (s).
Result rows: 160
Memory Usage (MB): 212.7149143218994
Row count (superset) 160
Filters: [{'column': 'emission_time'}, {'column': 'course_name'}, {'column': 'org'}]
SQL:

6. learner-groups - Subsection Summary
Superset time: 1.602989 (s).
Query duration: 0.845 (s).
Result rows: 30
Memory Usage (MB): 351.30071449279785
Row count (superset) 30
Filters: [{'column': 'visited_on'}, {'column': 'course_name'}, {'column': 'org'}]
SQL:

After:

13. learner-groups - Problem Attempts
Superset time: 1.983724 (s).
Query duration: 0.31 (s).
Result rows: 160
Memory Usage (MB): 166.58758544921875
Row count (superset) 160
Filters: [{'column': 'emission_time'}, {'column': 'course_name'}, {'column': 'org'}]
SQL:

16. course-dashboard - Subsection Summary
Superset time: 0.742814 (s).
Query duration: 0.247 (s).
Result rows: 30
Memory Usage (MB): 113.08792209625244
Row count (superset) 30
Filters: [{'column': 'visited_on'}, {'column': 'course_name'}, {'column': 'org'}]
SQL:

before.txt
after.txt

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Sep 30, 2024
@openedx-webhooks
Copy link

Thanks for the pull request, @Ian2012!

What's next?

Please work through the following steps to get your changes ready for engineering review:

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

🔘 Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently maintained by @bmtcril. Tag them in a comment and let them know that your changes are ready for review.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@Ian2012 Ian2012 force-pushed the cag/course-key-indexes branch 2 times, most recently from 73556b6 to d1dd27d Compare September 30, 2024 17:46
@Ian2012 Ian2012 force-pushed the cag/metrics-improves branch from b9394bf to 20b4f53 Compare September 30, 2024 17:46
@Ian2012 Ian2012 merged commit 0a00530 into main Sep 30, 2024
9 checks passed
@Ian2012 Ian2012 deleted the cag/course-key-indexes branch September 30, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants