Skip to content

Sangram-More/clickstreamwiki.github.io

Repository files navigation

This repository is dedicated to developing a Data Mining Project for the CSCI 5502 Data Mining course at CU Boulder. Exploring Wikipedia Clickstream Data Analysis Description Wikipedia regularly releases clickstream statistics, which monitor the combined user paths between its pages. Traditional statistical approaches can yield interesting insights from the links between articles, but they often miss important details when working with such large datasets.

To glean insights from these linkages, we in this study employ network analysis. By creating a graph model of the clickstream data, we may study the network's structure, pinpoint the most significant nodes, find themes or subjects in the data using community identification, and investigate patterns of unusual Wikipedia browsing behavior using shell decomposition.

Results The initial findings will be shared by next milestones along with the visualizations and model implementation.

Data used: https://dumps.wikimedia.org/other/clickstream/ Contact For any questions, suggestions, or collaboration inquiries, feel free to reach out via: GitHub Issues: You can also open an issue in this repository for project-related queries.

The dataset is too big to be uploaded on github. Here's a link to the dataset used in Milestone 3: - https://drive.google.com/drive/folders/1CbydT5X0wt6yvxGH-RVj5A8p6i8NdEjX?usp=sharing

Link to website: https://sites.google.com/colorado.edu/clickstream/models-implemented

This repository is dedicated to developing a Data Mining Project for the CSCI 5502 Data Mining course at CU Boulder.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •