2u/course optimizer #35887

rayzhou-bit · 2024-11-20T00:22:28Z

Description

This PR creates the backend for Course Optimizer Link Checker, which will scan through a published course and check for broken links. This functionality imitates what is currently in place for export. 2 apis are created here:

link_check POST

Queues a task to start the link check process through the celery queue.
Results for the broken links scan is stored as a list of lists with the following format: [block_id, broken_link, *is_locked]
A locked link is a locked studio asset. When the celery tasks validates the asset link, the request 403's since celery does not have permissions to the locked assets.

link_check_status GET

Returns the status of the link check process.
Returns results of link_check if process is successful.
Result Data Transfer Object returns broken links along with relevant ancestor data for the block they are found in.

Technical considerations:

The results of link check scan is currently saved as a UserTaskArtifact file. While this is the simplest for implementation, arguments can be made to save this data in tables instead.
Benefits for using UserTaskArtifact file: Easy implementation as this mimics the current export functionality.
Benefits for using a database table: Good foundation for accessing thinner slices of data for broken links. While not needed for the current functionality being developed, it could be useful for future updates. For example, authors could be notified on the broken links of a quiz a couple of days before learners take the quiz. Another example is it would be easier to analyze data such as finding the average number of broken links per course.
Celery task cannot validate locked studio assets. For now, this will be recorded in scan results and presented to the user as a locked link that they must check themselves.

Supporting information

https://2u-internal.atlassian.net/browse/TNL-11782

Testing instructions with frontend PR

Make sure you're you have the frontend code in frontend-app-authoring: Feat course optimizer page frontend-app-authoring#1533
In devstack, run make dev.up.large-and-slow.
In frontend-app-authoring, run npm start.
Enable waffleflag contentstore.enable_course_optimizer.
Navigate to the Course Optimizer page by going to the Tools dropdown menu and selecting the Optimize Course option.
Click Start Scanning to run a scan of your course. Any broken links will be delayed in the Broken Links Scan section when the scan completes.
Navigate directly to the blocks with the broken links through the links.
Update and publish your course with broken links. Scan again to see these new entries in the Broken Links Scan section.

Testing instructions without frontend PR

The following example is for demo course course-v1:edX+DemoX+Demo_Course.

Find and copy the curl for an export call in your local environment.

curl 'http://localhost:18010/export/course-v1:edX+DemoX+Demo_Course' \
  -X 'POST' \
  -H 'Accept: application/json, text/javascript, */*; q=0.01' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 0' \
  ...

Replace export with link_check.
Make this call in the terminal.
This should return the following if successful.

{
  "LinkCheckStatus": 1
}

Access http://localhost:18010/link_check_status/course-v1:edX+DemoX+Demo_Course in your browser. You should see the results of the link check scan.

{
  "LinkCheckStatus": "Succeeded",
  "LinkCheckCreatedAt": "2025-01-14T19:36:53.178488Z",
  "LinkCheckOutput": {
"sections": [
      {
        "id": "d8a6192ade314473a78242dfeedfbf5b",
        "displayName": "Introduction",
        "subsections": [
          {
            "id": "edx_introduction",
            "displayName": "Demo Course Overview",
            "units": [
              {
                "id": "vertical_0270f6de40fc",
                "displayName": "Introduction: Video and Sequences",
                "blocks": [
                  {
                    "id": "030e35c4756a4ddc8d40b95fbbfff4d4",
                    "displayName": "Blank HTML Page",
                    "url": "/course/course-v1:edX+DemoX+Demo_Course/editor/html/block-v1:edX+DemoX+Demo_Course+type@html+block@030e35c4756a4ddc8d40b95fbbfff4d4",
                    "brokenLinks": [
                      "/definitely.does.notwork",
                      "/block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7",
                      "https://testing123.whatever",
                      "google.com",
                      "/container/block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7",
                      "block-v1:edX+DemoX+Demo_Course+type@vertical+block@2152d4a4aadc4cb0af5256394a3d1fc7"
                    ],
                    "lockedLinks": [
                      "/assets/courseware/v1/506da5d6f866e8f0be44c5df8b6e6b2a/asset-v1:edX+DemoX+Demo_Course+type@asset+block/getting-started_x250.png",
                      "/assets/courseware/v1/506da5d6f866e8f0be44c5df8b6e6b2a/asset-v1:edX+DemoX+Demo_Course+type@asset+block/getting-started_x250.png"
                    ...

Other information

Waffle flag: contentstore.enable_course_optimizer
This PR is to be paired with a frontend PR in frontend-app-authoring: Feat course optimizer page frontend-app-authoring#1533

cms/djangoapps/contentstore/tasks.py

cms/djangoapps/contentstore/views/course_optimizer.py

cms/djangoapps/contentstore/tasks.py

rayzhou-bit · 2024-11-21T18:35:50Z

@bszabo I updated a lot of the organization in tasks.py. I agree with you on the iffy code practices (using max / min / integer for status), but this is currently how UserTaskStatus is used and I feel it's better to follow it for now.

bszabo · 2024-11-21T18:55:50Z

Thanks for the editorial changes, Ray. I'm stepping away from this review with the expectation that Jesper will give it a lookover from a functional perspective. If it's possible to attend to the funky status definition before moving on to new things, I would strongly recommend that, even if it ends up being in a different PR.

bszabo · 2024-11-21T19:13:36Z

If you take a step back, and look at this search for broken links as a first installment towards course optimization, you can see that course optimization will entail a sequence of activities being carried out, with each intended to potentially improvew a course. Viewed that way, the natural questions to ask will be "which activity is currently being worked on?" and "what is the status for activity X?". For the latter question the natural answers will be not started, in progress, succeeded, or failed with error message Y.

It seems to me that it would make sense to organize even this first installment somewhat in those lines. The solution you borrowed from import/export is conflating concepts in a way I don't think is good.

cms/djangoapps/contentstore/views/course_optimizer.py

cms/djangoapps/contentstore/tasks.py

* feat: TNL-11812 new tests * feat: TNL-11812 remove skipped tests and TODOs --------- Co-authored-by: Bernard Szabo <bszabo@edx.org>

jesperhodge

Approved with comments, but they're all nice-to-haves only

openedx/core/lib/api/view_utils.py

cms/djangoapps/contentstore/rest_api/v0/views/__init__.py

cms/djangoapps/contentstore/rest_api/v0/views/course_optimizer.py

jesperhodge · 2025-02-05T18:57:55Z

cms/djangoapps/contentstore/tasks.py

+        broken_or_locked_urls.extend(retry_results)
+
+    try:
+        task_instance.status.increment_completed_steps()


To me it's not clear at first glance what the code in the try function does - I need to go in there and parse it line by line.
To improve readability, it would be great to just extract all the contents of the try block into a named function, so I can see what it does at a glance.
For example, you could do something like this:

try: artifact = _persist_broken_links(broken_or_locked_urls, task_instance) except: ...

Basically that would make it so I don't actually have to read the code in that function most of the time.

edx-pipeline-bot · 2025-02-06T18:57:30Z

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot · 2025-02-06T19:15:08Z

2U Release Notice: This PR has been deployed to the edX production environment.

edx-pipeline-bot · 2025-02-06T19:15:08Z

2U Release Notice: This PR has been deployed to the edX production environment.

rayzhou-bit marked this pull request as draft November 20, 2024 00:23