Data Science for Public Good summer program 2021
-
In what ways can repositories be efficiently classified into “types” (e.g., operating systems, network services, database management, development tools, blockchain, etc.)? What information (e.g., tags, repos stats, or READMEs) is most helpful for classifying existing repositories?
-
How does GitHub activity change based on the type of software being developed? Which types of software have the most contributors? Which types of software requires more commits, additions or deletions?
-
How do different types of software affect collaboration tendencies? How do these tendencies change across the academic, business, or government sectors?
This won't take longer than 5 min.
We are collecting access token to speed up the process of scraping GitHub repositories. One access token can only scrape 5000 repositories in an hour, and our goal is to scrape about 10 million repositories. Having more acess token would help us tremendously.
Please refer to this document for detailed instruction in creating a personal access token for step 1-5.
For step 6 and 7, please refer to the following image.
Please private message the team:
- access token
- username
and make sure you delete the message (not the access token) afterwards. (Ex. if you messaged us on teams, you should delete that message.)
We appreciate your help!
National Center for Science and Engineering Statistics (NCSES)
- Carol Robbins, Senior Economist
- Ledia Guci, Science Resources Analyst