Skip to content

The interactive notebooks of the Twitter Account Classification Using Account Metadata: Organization vs. Individual paper

License

Notifications You must be signed in to change notification settings

tweetpie/twitter-account-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Account Classification Using Account Metadata:
Organization vs. Individual

Abstract

Organizations present their existence on social media to gain followers and reach out to the crowds. For social media related tasks and applications, such as social media graph construction, sentiment analysis and bot detection, it is required to identify the entities' account types. Some of the applications focus on personal accounts, whereas others need only non-personal accounts. This paper addresses the account classification problem by using minimum amount of data, namely using only account’s profile metadata. The proposed approach classifies accounts either as organization or individual without collecting tweet data of these accounts in a language-independent manner. The model uses a Long Short Term Memory (LSTM) network for processing the textual properties and a fully-connected neural network for processing the numerical features. We apply our solution on a collection of Twitter accounts, as it is one of the most widely used social networks. Our classifier, which is based solely on the account metadata, achieves an average of 97.4 % accuracy under 7-fold cross-validation. The experiments show that the account metadata is a qualified resource for a very accurate estimation of the account types.

Datasets

You can find all datasets mentioned in the paper from here. Each model is applied on different versions of the datasets, such as the Humanizr model applied on the balanced, unbalanced, and full version of the Humanizr dataset. The balanced version of the data set includes equal number of individual and organization accounts. The number of accounts is accomplished by undersampling the majority class.

Dataset Name Number of Original Accounts Number of Collected Accounts Collection Percent
Humanizr 18,922 17,790 94 %
Demographer 227,277 214,236 94 %
Table: Humanizr and Demographer dataset statistics.

The collected Humanizr dataset consists of 17,790 user accounts, in which 16,012 of them are labeled as individuals, and 1,778 of them are labeled as organizations. In the Demographer dataset, there are 214,236 accounts in which 185,224 are labeled as individuals, and 29,012 are labeled as an organization. Since the Demographer dataset comprises the Humanizr dataset, the figures are based on the Demographer dataset.

Notebooks

Interactive Data Analysis

This notebook contains interactive plots to visualize Followers-Following, Followers-Tweets, Followers-Likes, Tweets-Likes and Tweets-Media relationships.

Interactive Demo

This notebook provides an interactive service that predict the given Twitter account's label.

About

The interactive notebooks of the Twitter Account Classification Using Account Metadata: Organization vs. Individual paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published