Compare channels (i.e. video count, video views, video tags, links in descriptions etc.) and analyze video comments (time series, sentigment, used words count etc.)
The script has a browser-based user interface and runs on flask. It downloads data directly from YouTube using the Data API v3. The data gets extracted and transformed into a pandas dataframe. Matplotlib, Seaborn and WordCloud are used to save plots as images.
- Compare channel KPIs such as video count, views, duration, un-/clickable link count and video tags and show visualizations.
- Analyse video comments with time series and sentiments and show visualizations.
- You need a YouTube Data API key to make this work. I do not publish mine here. Get your own for free at !!!
- On Windows 10 open the comand line and type
setx YOUTUBE_API_KEY “REPLACE_THIS_TEXT_WITH_YOUR_YOUTUBE_DATA_API_KEY”
- Clone this repository
pip install -r requirements.txt
The following example plots can be found in the repository in the folder /example_plots
.
The browser interface lets user compare up to 3 channels. Theoretically a comparison of more channels is possible by modifying the URL. Please check the 'To Dos' for the expected limits of this work around.
The number of publicly available videos is counted in visualized.
In YouTube's video descriptions links need to be formatted with an https://
to be clickable. This condition is checked and counted.
Video Tags are not visible in the first place for users. There are browser plug-ins that can make them visible.
For each channel in the comparison a single histogram is plotted, saved and displayed.
A table shows the top 5 viewed videos for each channel.
No image of the table is served at this point. Sorry.
This plot shows how the number of comments and their sentiments evolved over time.
The plot shows comments with their sentiment and on a logarithmic scaled y-axis the number of likes for these comments.
The word cloud visuaizes the most used words considering all comments and sub-comments.English stopwords are removed by default.
In 1m27s the video walks you through the process of comparing channels.
In 1m24s the video walks you through the process analysing comments.
Working as an analyst and consultant gave me repeated tasks that took a lot of time. I wanted to automate tasks that are frequently required in the industry, especially in video production companies for the content intelligence.
- Some plots show floats instead of integers, which does not make too much sense at the specific points so far and needs to be changed.
- The plots get stored but not deleted afterwards. Deleting should be scheduled or another way needs to be found, that files do not take to much disc space.
- The filenames for plots used in channel comparisons consist of the channel ids right now. Technically it works fine as long as there are only three channels to compare. The more channels get compared, the longer the file names get. This could be a problem at some point for operating systems.
- For now I was using a free API with a limit of 10.000 queries per day. This could be easily exceeded if channels have a lot of videos or videos have a lot of comments. Channels with up to 4.000 videos and videos with up 1.000 comments worked fine. Solutions would be to get a paid API or at least to estimate the query costs and predict, if the query could be executed until the end.
- The Word Clouds exclude english stopwords right now. To make the tool applicable to more different languages, an option should be added to exclude also stopwords from other languages.