Skip to content

2u/course optimizer #35887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Feb 6, 2025
Merged

2u/course optimizer #35887

merged 50 commits into from
Feb 6, 2025

Conversation

rayzhou-bit
Copy link
Contributor

@rayzhou-bit rayzhou-bit commented Nov 20, 2024

Description

This PR creates the backend for Course Optimizer Link Checker, which will scan through a published course and check for broken links. This functionality imitates what is currently in place for export. 2 apis are created here:

link_check POST

  • Queues a task to start the link check process through the celery queue.
  • Results for the broken links scan is stored as a list of lists with the following format: [block_id, broken_link, *is_locked]
  • A locked link is a locked studio asset. When the celery tasks validates the asset link, the request 403's since celery does not have permissions to the locked assets.

link_check_status GET

  • Returns the status of the link check process.
  • Returns results of link_check if process is successful.
  • Result Data Transfer Object returns broken links along with relevant ancestor data for the block they are found in.

Technical considerations:

  • The results of link check scan is currently saved as a UserTaskArtifact file. While this is the simplest for implementation, arguments can be made to save this data in tables instead.
  • Benefits for using UserTaskArtifact file: Easy implementation as this mimics the current export functionality.
  • Benefits for using a database table: Good foundation for accessing thinner slices of data for broken links. While not needed for the current functionality being developed, it could be useful for future updates. For example, authors could be notified on the broken links of a quiz a couple of days before learners take the quiz. Another example is it would be easier to analyze data such as finding the average number of broken links per course.
  • Celery task cannot validate locked studio assets. For now, this will be recorded in scan results and presented to the user as a locked link that they must check themselves.

Supporting information

https://2u-internal.atlassian.net/browse/TNL-11782

Testing instructions with frontend PR

  1. Make sure you're you have the frontend code in frontend-app-authoring: Feat course optimizer page frontend-app-authoring#1533
  2. In devstack, run make dev.up.large-and-slow.
  3. In frontend-app-authoring, run npm start.
  4. Enable waffleflag contentstore.enable_course_optimizer.
  5. Navigate to the Course Optimizer page by going to the Tools dropdown menu and selecting the Optimize Course option.
  6. Click Start Scanning to run a scan of your course. Any broken links will be delayed in the Broken Links Scan section when the scan completes.
  7. Navigate directly to the blocks with the broken links through the links.
  8. Update and publish your course with broken links. Scan again to see these new entries in the Broken Links Scan section.

Testing instructions without frontend PR

The following example is for demo course course-v1:edX+DemoX+Demo_Course.

  1. Find and copy the curl for an export call in your local environment.
curl 'http://localhost:18010/export/course-v1:edX+DemoX+Demo_Course' \
  -X 'POST' \
  -H 'Accept: application/json, text/javascript, */*; q=0.01' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 0' \
  ...
  1. Replace export with link_check.
  2. Make this call in the terminal.
  3. This should return the following if successful.
{
  "LinkCheckStatus": 1
}
  1. Access http://localhost:18010/link_check_status/course-v1:edX+DemoX+Demo_Course in your browser. You should see the results of the link check scan.
{
  "LinkCheckStatus": "Succeeded",
  "LinkCheckCreatedAt": "2025-01-14T19:36:53.178488Z",
  "LinkCheckOutput": {
"sections": [
      {
        "id": "d8a6192ade314473a78242dfeedfbf5b",
        "displayName": "Introduction",
        "subsections": [
          {
            "id": "edx_introduction",
            "displayName": "Demo Course Overview",
            "units": [
              {
                "id": "vertical_0270f6de40fc",
                "displayName": "Introduction: Video and Sequences",
                "blocks": [
                  {
                    "id": "030e35c4756a4ddc8d40b95fbbfff4d4",
                    "displayName": "Blank HTML Page",
                    "url": "/course/course-v1:edX+DemoX+Demo_Course/editor/html/block-v1:edX+DemoX+Demo_Course+type@html+block@030e35c4756a4ddc8d40b95fbbfff4d4",
                    "brokenLinks": [
                      "/definitely.does.notwork",
                      "/block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7",
                      "https://testing123.whatever",
                      "google.com",
                      "/container/block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7",
                      "block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7"
                    ],
                    "lockedLinks": [
                      "/assets/courseware/v1/506da5d6f866e8f0be44c5df8b6e6b2a/asset-v1:edX+DemoX+Demo_Course+type@asset+block/getting-started_x250.png",
                      "/assets/courseware/v1/506da5d6f866e8f0be44c5df8b6e6b2a/asset-v1:edX+DemoX+Demo_Course+type@asset+block/getting-started_x250.png"
                    ...

Other information

@rayzhou-bit rayzhou-bit marked this pull request as draft November 20, 2024 00:23
@rayzhou-bit
Copy link
Contributor Author

@bszabo I updated a lot of the organization in tasks.py. I agree with you on the iffy code practices (using max / min / integer for status), but this is currently how UserTaskStatus is used and I feel it's better to follow it for now.

@rayzhou-bit rayzhou-bit requested a review from bszabo November 21, 2024 18:35
@bszabo
Copy link
Contributor

bszabo commented Nov 21, 2024

Thanks for the editorial changes, Ray. I'm stepping away from this review with the expectation that Jesper will give it a lookover from a functional perspective. If it's possible to attend to the funky status definition before moving on to new things, I would strongly recommend that, even if it ends up being in a different PR.

@bszabo
Copy link
Contributor

bszabo commented Nov 21, 2024

If you take a step back, and look at this search for broken links as a first installment towards course optimization, you can see that course optimization will entail a sequence of activities being carried out, with each intended to potentially improvew a course. Viewed that way, the natural questions to ask will be "which activity is currently being worked on?" and "what is the status for activity X?". For the latter question the natural answers will be not started, in progress, succeeded, or failed with error message Y.

It seems to me that it would make sense to organize even this first installment somewhat in those lines. The solution you borrowed from import/export is conflating concepts in a way I don't think is good.

@rayzhou-bit rayzhou-bit marked this pull request as ready for review February 3, 2025 21:58
Copy link
Member

@jesperhodge jesperhodge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments, but they're all nice-to-haves only

broken_or_locked_urls.extend(retry_results)

try:
task_instance.status.increment_completed_steps()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's not clear at first glance what the code in the try function does - I need to go in there and parse it line by line.
To improve readability, it would be great to just extract all the contents of the try block into a named function, so I can see what it does at a glance.
For example, you could do something like this:

try:
    artifact = _persist_broken_links(broken_or_locked_urls, task_instance)
except:
...

Basically that would make it so I don't actually have to read the code in that function most of the time.

@rayzhou-bit rayzhou-bit merged commit 02fc9c9 into master Feb 6, 2025
50 checks passed
@rayzhou-bit rayzhou-bit deleted the 2u/course-optimizer branch February 6, 2025 17:47
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

jciasenza pushed a commit to jciasenza/edx-platform that referenced this pull request Feb 25, 2025
tonybusa pushed a commit to tonybusa/edx-platform that referenced this pull request Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants