Skip to content

kwongtn/CourseExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CourseExtractor

PRs Welcome HitCount contributions welcome

GitHub license GitHub Releases

GitHub Issues

Average time to resolve an issue Percentage of issues still open

This project is a proof of concept, however if you found this tool useful, why not
Buy me a coffee via PayPal? or Buy me a coffee via Grab?

How to use

Prerequisites

  1. Google Chrome (or any other browser that can output HAL files)
  2. NodeJS
  3. cURL (For video downloads)
  4. (Good to have) The save as file extension if you are mass downloading, as you can directly paste the link into the extension without needing to open a new tab.

Notes

We need to be more careful on using the new method. By only using it when absolutely necessary.
Due to the ease of getting the content, it got me thinking that PluralSight did this to benefit those that are not so fortunate, by providing them an alternative to view the courses.
If we abuse this, PluralSight might patch this bug (or not bug) for good.
Therefore, I urge everyone to use the new method wisely.

Brief (Old Method)

Getting course information

  1. Log in to your PluralSight account and navigate to your desired course.
  2. Open Google Chrome's Developer Tools. (Pressing F12 is a god way to do so.)
  3. Navigate to the "network" tab and:
    • Check preserve log and disable cache.
    • Clear the current captured data.
  4. In the course page, refresh the page with the description. You should see stuff going in Developer Tools network tab. You have now captured data for course information output.
  5. You may now export the HAL file and close the Developer Tool window.
  6. Run the program with the following command:
    node ./main.js path_to_HAL_file
    
  7. The outputs should be in the ./output directory.

Getting videos (Old method, or for non-public videos)

* You may need to be a little quick on this.

  1. Continuing from previous section (Getting Course Information), click on the first video in the course. A new tab should open.
  2. Open Google Chrome's Developer Tools in the new tab and navigate to the network tab.
  3. In the filter box, type in viewclip. There should be 1 result.
  4. Clear the log and refresh the page.
  5. Once the viewclip request completes, you may click on the next video and so on, until the viewclip files of all video has been loaded.
  6. You may now export the HAL file and close the Developer Tool window.
  7. Run the program with the following command:
    node ./main.js path_to_HAL_file
    
    or, if you want to download the videos together too, you can run the following:
    node ./main.js --videoDownload path_to_HAL_file
    
  8. An output of all the video URLs will be in the ./output/URLs.json file, and if you specified the --videoDownload parameter, videos will be downloaded alongside the subtitle files.
  9. Copy the links and paste into any downloader (or browser window) to download the videos. Do note that you would need to manually rename the files.

Brief (New Method)

  1. Log in to your PluralSight account and navigate to your desired course.
  2. Open Google Chrome's Developer Tools. (Pressing F12 is a god way to do so.)
  3. Navigate to the "network" tab and:
    • Check preserve log and disable cache.
    • Clear the current captured data.
  4. In the course page, refresh the page with the description. You should see stuff going in Developer Tools network tab. You have now captured data for course information output.
  5. You may now export the HAL file and close the Developer Tool window.
  6. To generate video URLs and course information, run the program with the following command:
    node ./main.js --new path_to_HAL_file
    
  7. Or, if you want to download the videos together, run the program with the following command:
    node ./main.js --new --videoDownload path_to_HAL_file
    
  8. The outputs should be in the ./output directory.

Parameters

Usage: node ./main.js [params] path_to_HAL_file

        --help                  Displays this help message.
        --license               Outputs the license of this project. (GNU General Public License)
        --new                   [TESTERS REQUIRED] Uses the new "CourseInfo-Only" methodology to facilitate for full course downloads. Not recommended for large courses.
        --noSubs                Disables output of subtitles.
        --noInfo                Disables output of course information.
        --noBB                  Disables output of BB code.
        --videoDownload         Downloads video using filenames as specified in './output/videoList.json'
        --noSizeCheck           Disables array size checking for video download. Filenames will be taken sequentially.
        --noURL                 Disables output of video URLs.

FAQ

  1. Why this project?
    Just for fun and also to test my NodeJS and programming skills. Aaand also because that PluralSight's courses are so good I want to download them for viewing later on.

  2. Are you sure this project is just for fun?
    Since you're asking, no. Not really. Its also a way to raise awareness on client-server security and how easily (given the right time, skills and tools) it is to scrape a website, even though it is behind a paywall.
    So apparently I've been caught testing this script out and according to the customer support representative:

    Some things to check to make sure this doesn't happen again are: make sure you don't click rapidly through videos; make sure you're not signed into multiple devices; and make sure only one person is using this account.
    

    So for anyone trying to use this script, please be vigilant. I've currently estimated they rate limit the video link requests to 50+ per set time period. So if there are courses that have many videos (you may check when outputting transcripts), do seperate the video requests to multiple sessions.

  3. Why NodeJS?
    Cause its asynchronous nature and because I hate Python. C++ would be much quicker as it is compiled, but does not have native JSON support so I kinda gave it up.

  4. What did I learn from this project?
    Well to start with are HAL file formats and how much information there is in there. Then NodeJS and Promises.

Issues

  1. API Problem As this repo is using an external service to do conversions for srt files, it limits us as the following:

    • 30 API calls per minute,
    • 50 API calls per 5 minutes, and
    • 100 API calls per hour. I will be putting in a direct converter, so that conversions can be done without having to rely on an external API. Completed coding the in-house converter.
  2. It is currently not possible to put courseInfo and transcript together with videoLinks. It will be investigated later on.

License

Copyright © 2020 kwongtn

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

About

Initially developed to download Pluralsight Courses.

Resources

License

Stars

Watchers

Forks

Packages

No packages published