What is this?

This is my third year project, also called a "dissertation", made for credit for the University of Warwick. The project, and its associated code, are designed to identify communities of discussion within a given subreddit by capturing throughthe Python Reddit API (PRAW) sets of posts, comments and their angement data and compiling them to generate graph theory networks, which in short, can then be used to run community detection algorithms to separate clusters of communities. These communities are then analysed and studied using data analytics methods to determine their subject matter, users, keywords, engagement metrics (custom-built), measures of spread and other key metrics. The goal of this is to generate a novel social analytics tool that can accurately capture the internal divisions of subject matter within a subreddit, a subject known as intra-subreddit dynamics (IRSD), as opposed to the more ubiquitous capture of relationships between subreddits, known as inter-subreddit dynamics (IESD). The hope is that this will produce a useful social analytics tool.

_An example screenshot of one of the post graphs developed for this project._

Why Reddit?

I chose Reddit for a variety of reasons, the most prominent being my own substantial personal experience with the site and interest in developing further modularity within the subreddit structure. The second most important reason is that unlike most social media sites which focus on communities around individuals, Reddit focusses on individuals around communities. This makes it much more amenable to a graph theory approach; there is a limited extent of information to gather around one person, but a practically infinite amount to gather around a subject matter or field; one is temporary, the other is permanent. In addition, Reddit also has an easy-to-use and free (for academic purposes) API, PRAW, which is easy to use and synergises well with Python, a language I am extensively familiar with and is also a key language for data science, which was one of the core aims of this project.

Why Python?

In addition to the above reasons, I wanted to get as much work completed as possible, both to satisfy my project supervisors and allocate more time for working on project documentation, as well as my own personal fulfilment. I knew that to develop a good project I would need to actually enjoy the experience, even at the cost of computational efficiency. That is why I chose a language I was well versed in and enjoyed for this. While this resulted in memory overhead issues going over about 67,000 posts, I do not regret the choice.

Why SQL?

This was chosen because almost universally across my education in Computer Science I have only ever studied with SQL as an implementable language for data storage, and it is also simple and easy to work with.

What tools were used to create this project?

The programming language was exclusively a mix of Python 3.9 and SQL (specifically SQLite3). The reason SQL does not appear in the "Languages" tab is because it is all embedded in the form of Prepared Statements.

_A selection of some of the Python modules used to work on this project._

NumPy

Used for computation of matrices and data manipulation (alongside Pandas).

Pandas

Used for storing program data that was retrieved from the SQLite3 databases.

SQLite3

I chose this because it is free, lightweight and easy to configure; I was looking for minimum overhead in the project to maximise development speed.

File structure and organisation

The file structure of the project can be visualised as follows:

There are also original test files that I have decided to leave in the repository in case of any future work, although this is unlikely.

The database diagram of the project is given by:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
pdf_boilerplate		pdf_boilerplate
programs		programs
storage		storage
.gitignore		.gitignore
README.md		README.md
plot_names.txt		plot_names.txt
requirements.txt		requirements.txt
schema.sql		schema.sql
setup_instructions.txt.txt		setup_instructions.txt.txt
ug_plot_names.txt		ug_plot_names.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is this?

Why Reddit?

Why Python?

Why SQL?

What tools were used to create this project?

NumPy

Pandas

SQLite3

File structure and organisation

About

Uh oh!

Releases

Packages

Languages

david-git-acc/3YPREPOS

Folders and files

Latest commit

History

Repository files navigation

What is this?

Why Reddit?

Why Python?

Why SQL?

What tools were used to create this project?

NumPy

Pandas

SQLite3

File structure and organisation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages