Skip to content

Latest commit

 

History

History
48 lines (28 loc) · 1.48 KB

README.md

File metadata and controls

48 lines (28 loc) · 1.48 KB

EMSE_DevInt

By Cassandra, Nimmi, Yiming, and Jory.

This research was conducted as part of SENG 480A @ UVic (EMSE).

The included PDF presents the motivation, methodology, results, and conclusions of our work and findings.

Dependencies

Download the following packages needed for the included python modules and Jupyter notebooks:

pip install stackapi sklearn numpy nltk pandas seaborn wordcloud pyLDAvis

Alternatively, try

pip install -r requirements.txt

(Rough) Procedural Overview

  1. Use StackAPI to grab SO data.

    a. Grab maximum questions & answers daily. Do over couple days.

    b. Collate JSONs into single data file.

    c. Remove duplicates

    d. Format into input file for LDA.

  2. Use LDA to process data.

    • LDA does not label topics. This will need to be done manually.
  3. Additional statistics on questions, answers, and users.

Usage

Ad-Hoc Python Scripts

Grabbing Data

Resources

StackAPI

JGibbLabeledLDA

Refactored JGibbLabeledLDA

Preprocess

LDA