Skip to content

Data sourcing and data analysis for course enrollment at Pasadena City College

License

Notifications You must be signed in to change notification settings

perryzjc/pcc-course-enrollment-data-explorer

Repository files navigation


PCC Course Enrollment Data Explorer

Discover the fun facts of PCC course enrollment data and give it a analysis.

PROJlast commitSLIM

flowchart LR
subgraph pcc-course-enrollment-data-explorer
    DATA(((Features)))
    DATA --> Data-Sourcing
    DATA --> Data-Analysis
    subgraph Data-Sourcing
        S(fa:fa-camera-retro)
            style S fill:#98B476
        direction TB
        PCC[(PCC-COURSE-WEB)]
        A[requester]
        B[cleaner]
        C[storer]
        L[[log]]
        S --every 30 mins--> A
        A -.request data.-> PCC -.return html data.-> A
        A --pass html data--> B
        B --clean info--> C
        C .-> L
        C -.store cleaned data to local.-> S

    end
    subgraph Data-Analysis
        ANA(fa:fa-spinner)
            style ANA fill:#4E95B4
        direction TB
        ANA --> Info-Extraction
        ANA --> E["Visualization of data trend"]
        Info-Extraction --> Intepretation

        subgraph Info-Extraction
            CO[Popular courses]
            CO --> IRA[Growth rate of enrollment]
            IRA --> FC[Fastestclosed class]
            NL[Courses that students do not like] --> DR[Course drop info]
            UP[Unpopular courses] --> EL[Enrollment is low and fluctuates very little]
            MO[More]
        end

        subgraph Intepretation
            Q1["Why it happened?"]
            DM[What these data mean] --> Student-Point-of-View
            DM[What these data mean] --> School-Point-of-View
            subgraph Student-Point-of-View
                CH[Courses helpful to students]
                Q21["chance to enroll a closed class"]
                M2[More]
            end
            subgraph School-Point-of-View
                BA[Optimize the budget of the course plan] --> CC[Appropriate course capacity]
                CC --> HD[Add courses that are in high demand]
                CC --> LD[Courses that may require reduced capacity]
            end
        end
    end
end
Loading
flowchart LR
subgraph GPA Analysis
    HELP(((How to help GPA?)))
    HELP --> A[Find Popular Courses]
    HELP --> ENROLL[How to enroll a closed course?]
    HELP --> F[Disliked Courses]
    subgraph Popular Courses
        A --> B[Courses Close Fast]
        A --> C[Enrollment Increment Rate]
        A --> D[Visualization of Data Trend]
        A --> GA[Which professor usually give add code]
        B --> E[Ratemyprofessor.com Truthfulness]
        C --> E
        D --> E
        GA --> E
    end
    subgraph Disliked Courses
        F --> G[Course Drop Rate]
    end
    subgraph Enroll Closed Course
        ENROLL --> H[Traditional Way]
        ENROLL --> LATER["New Way: Fact behind Data Science"]
        subgraph Traditional Way
            H --> I[Registration Priority]
            H --> J[Waitlist]
            H --> K[Add Code]
        end
        subgraph New Way
            LATER --> L[Student can be Dropped for many Reasons]
            LATER --> P[New Courses added after Semester Start]
            LATER --> O[Course Seat Monitor]
            subgraph Course Seat Monitor
                O --> Q[Sending Email Alerts]
            end
            L --> M[International Student Tuition not paying in time]
            L --> N[Miss course Check-in Time]
        end
    end
end
Loading

This project wanted to explore PCC's (Pasadena City College) course enrollment data, to learn interesting facts about it and try to give it an analysis and interpretation. For example, a course is likely to have a large number of students drop at a certain time due to students not paying tuition. Students can use this opportunity to enroll in this closed course. This project hopes to support this conjecture with a large data set and discover more data phenomena.

PCC Data Science Website PCC Course Schedule Website

Features

  • Data Sourcing: Automatically captures and stores up-to-date information from the PCC Course Web every 30 minutes. This data is then get cleaned and extracted for further analysis.
  • Data Analysis: Provide a range of capabilities for analyzing the data, including:
    • Info Extraction: Extract and organize information such as the number of course statuses and course drop rates.
    • Interpretation: Provide context for the data by answering questions such as "Why did this happen?" and "How can this data be used to benefit the user?" Examples include identifying potential opportunities for enrolling in a closed class.
    • Visualization of Data Trends: Plan to provide clear visualization of data trend (TODO).

Contents

Quick Start

This guide provides a quick way to get started with this project.

Requirements

  • conda to be installed on the machine where the project will be run. Please make sure to have conda installed before running the project.

Setup Instructions

  1. Clone this project repository via git@github.com:perryzjc/pcc-course-enrollment-data-explorer.git

  2. Navigate to the directory where the repository is installed, i.e. pcc-course-enrollment-data-explorer

    cd pcc-course-enrollment-data-explorer

  3. To create a suitable environment for running this project, I recommend using the environment.yml file with conda. This file contains all the necessary dependencies and can be easily created using the following command:

    conda env create -f environment.yml

Run Instructions

  1. Once the environment is created, you can activate it by running the command

    conda activate <environment_name>.

    Replace <environment_name> with the actual name of the environment, which can be found in the environment.yml file, the prompt given by your shell (after running the command conda env create -f environment.yml), or by running the command conda info --envs.

  2. Run the code that you are interested in. For example,

    python3 main.py

Usage Examples

(pcc-course-enrollment-data-explorer) perryzjc@MBP pcc-course-enrollment-data-explorer % python3 frontend.py

Expected results:

1.Course data are obtained every 30 minutes and are store to the location based on current time

(assume current time 2023-01-09-23-03-08)

pcc-course-enrollment-data-explorer
  - data_analysis
  - data_sourcing
  - output
    - data_analysis
    - data_source
      - 2023
        - 01
          - 09
            - 2023-01-09-23-03-08.csv
    - log
      - log.txt
      - README.md
  - tests
  - environment.yml
  - main.py
  - ...
  1. log.txt got updated. For example, a new line got added to the file log.txt:

successfully store all course data as a csv file at time: 2023-01-09 23:03:08

Changelog

See our CHANGELOG.md for a history of our changes.

Frequently Asked Questions (FAQ)

  1. How to install conda and use it?

Contributing

Interested in contributing to this project? Please see our: CONTRIBUTING.md

  1. Create an GitHub issue ticket describing what changes you need (e.g. issue-1)
  2. Fork this repo
  3. Make your modifications in your own fork
  4. Make a pull-request in this repo with the code in your fork and tag the repo owner / largest contributor as a reviewer

Working on your first pull request? See guide: How to Contribute to an Open Source Project on GitHub

For guidance on how to interact with our team, please see our code of conduct located at: CODE_OF_CONDUCT.md

License

See our: LICENSE

Support

@perryzjc

About

Data sourcing and data analysis for course enrollment at Pasadena City College

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages