Dataset and R script used to produce analysis of the relationship between online terrorist activity and real-world outcomes by global jihadist movement (GJM) groups. The dataset contains estimates of GJM groups' online activity from 1998 to 2019, membership estimates for that period, aggregated yearly data on the number of attacks by GJM groups for the period from ACLED and START (for those raw data values, you will need to go to ACLED and START and request those datasets as they cannot be reproduced in full per terms of service), a list of the GJM groups that data collection was done on, and a sheet of values imported into R for analysis.
The R script contains the various models and code used for the graphics and tables found in the report. All values that appear in the report can be reproduced with the code in the script.
The dataset provided is licensed under CC-BY-SA-4.0. The R script provided is licensed under an MIT License. Original data from the Armed Conflict Location & Event Data Project (ACLED) is bound by its own terms of service. Original data from the National Consortium for the Study of Terrorism and Responses to Terrorism's Global Terrorism Database is bound by its own terms of service as well.
Many studies have examined ISIS’ usage of Twitter and other Internet platforms to cultivate their “online caliphate.” Despite all of this attention on the online sphere, relatively few studies have examined how online propaganda influences real-world behavior. Of those studies that exist, few are on a generalizable scale. The following aims to provide early findings of the relationship between the broader Global Jihadist Movement’s online and offline activity, particularly regarding membership and real-world violence. This mixed-methods study collected data on online activity, membership, and group-perpetrated violence for almost seventy groups for the period of 1998 to 2018. Online activity was compared separately to membership and group-perpetrated violence. Then, multiple regression models quantified the overall impact on real-world events. The results suggest online activity exerts a moderately strong, statistically significant effect on violence and a moderate effect on membership. A correlation appeared to be present between all three variables, which suggests a reinforcing cycle of attacks producing online propaganda, online propaganda producing new members, and new members carrying out yet more attacks. This may be the driver of the upward trend found in all three data sets.
This project was conducted in late 2020 as an entry for the Dr. Jonathan Fine Essay Contest put on by the International Counter-Terrorism Review. The competition's topic was "technology and terrorism." In early 2022, the paper was uploaded to the Social Science Research Network, where it was added to several elibraries focusing on the study of terrorism. The full paper can be found on SSRN here: https://dx.doi.org/10.2139/ssrn.4035908.
Online Activity is the measure of the total or estimated total of online content put out each year by al-Qaeda, IS, and their direct affiliates during the collection period (1998-2019). The platforms included in each year will vary over time as technologies have evolved dramatically over time. Some examples, however, include message boards, chat rooms, websites, social media sites, videos and video sites, how-to manuals and online magazines, etc. These data will be collected as secondary sources from prior research on since deleted social media accounts, archived lists of suspected jihadist accounts, social media postings preserved for research purposes, reports estimating activity on particular platforms, transparency reports from the platforms themselves noting the number of accounts taken down in a particular period, government reports, and sworn testimony. It is hypothesized based on previous works on terrorist recruitment that the first half of the collection period will have limited online activity. These years are intentionally included to serve as a control period during which recruitment and planning largely took place offline. In each year where online activity is found, the collected data will first be broken down by the platform where activity is present. For example, for the year 2016 activity on Instagram, activity on Twitter, and activity on Facebook might be present. Therefore, data for the year 2016 will list the estimates and their sources for each platform. For data collection purposes, each estimate for a particular year and platform will be listed down according to the source of the value. In instances where two sources contradict, the values will first be weighed according to the credibility of the source and the supporting evidence provided. If the sources are of equal credibility, the recorded value to be used for analysis will be the median of these values and will be annotated in a separate column as a derived value. These recorded values for each platform will then be aggregated together to derive a value of estimated total online activity for a particular year. It is important to note that this very much will be an estimate for the year as individual platform activity is constantly in flux during certain periods, owing to rapid account creation or account deletion on various platforms, also owing to recent developments of activity increasing "going dark" on encrypted platforms which are much harder to observe. Additionally, platform specific activity will be recorded when information is available. This will primarily be used for the online impact scale (if one is made). Accounts however will be the focus of this research as accounts are platform agnostic. Likes, retweets, mentions, reactions, etc. are all platform specific and may skew activity toward one platform simply because it has more ways to interact with content. It is certainly valuable and can be used to determine the weight of a particular platform but for measuring overall activity across platforms, I believe accounts, number of websites, number of online publications, and very specific videos known for their prior radicalizing quality, such as Anwar al-Awlaki videos on YouTube, are the best option as each number recorded has the potential to be interacted with and viewed by potential recruits but is not limited to a platform specific metric by which the potential interaction is measured.
For the purposes of this research, membership is used as a metric to assess real-world direct connection to either al-Qaeda, IS, or their direct affiliates during each year of the collection period (1998-2019). This category and the associated data, to the best of my ability given the uncertainty surrounding actual membership numbers, refers to informed estimates of those materially supporting the groups in question, those who have openly claimed alliegiance to the groups either before or after their violent attacks, and those who have mobilized to violence or have otherwise become operationally involved in the continuation of the groups or their ideology. This category does not refer to individuals who may have radicalized but have otherwise not engaged with the group beyond simply consuming propaganda or liking online content. That said, where appropriate, those who have created online accounts specifically to amplify a group's message are treated as actively participating. Even if they have not been formally recruited as a propagandist for the group in question, they are making themselves operationally involved in a similar capacity to many of a group's formal propagandists. The data in this category are drawn from estimates given by experts in either the literature or other secondary source material, government reports, and expert or legal testimony either in court documents or in hearings before governing bodies. For data collection purposes, each estimate for a particular year will be listed down according to the source of the estimate. In instances where two sources contradict, the estimates will first be weighed according to the credibility of the source and the supporting evidence provided. If the sources are of equal credibility, the recorded value to be used for analysis will be the median of these estimates and will be annotated in a separate column as a derived value.
Real World Activity data is to be referred to as and drawn from data sources collating data on terrorist attacks, battles, and strategic developments, either confirmed or strongly suspected to involve al-Qaeda, the Islamic State, or their direct affiliates. To gather this data, two datasets will be used: 1) The Global Terrorism Database will be used to acquire data on attacks credited or suspected to be perpetrated by al-Qaeda, ISIS, or affiliated groups; 2) The Armed Conflict and Location Event Data Project will be used to acquire data on battles between affiliated groups, attacks by affiliated groups against military targets that may not otherwise be included in the GTD, and strategic developments involving the affiliated groups such as major declarations, alliances, recruitment drives, etc. as classified using ACLED's criteria for a strategic development. This latter dataset will focus on the following regions as they are areas where affiliated group activity beyond terrorist attacks is the most likely to be found due to major territorial control: South Asia (particularly Afghanistan and Pakistan where al-Qaeda and IS in Khorasan Province has historically been most active), the Middle East (particularly Iraq and Syria), and Africa (where IS has been expanding its activity). The total activity will be aggregated by year throughout the collection period (1998-2019). For more detailed reference, each year will be broken down into the three categories.
For some data sources, data estimates are given over multiple years without specifying individual year data. Based upon the context of these data, one of four data recording methods are employed. In instances where a value sums to a total over a number of years but no indication is given of an increasing or decreasing trend, the value is simply divided by the number of years and the quotient is recorded for each year. For data suspected of being static either because it is unchanging or simply growth is equal to decay, the same value is listed across the years in question. For any values where movement is indicated by prior reporting but not specified in the particular multi-year estimate, the recorded data will either be linearly ascending or linearly descending according to the direction of the trend. To do this, a simple linear sequence will be used in the R software package: seq(from = 0, to = x, length.out = y), where x is the reported maximum and y is the number of years listed plus one. This is done to allow zero to be removed from the sequence, as there are no instances where zero membership or zero online activity is assumed. The output of the function is recorded as counting up or down to denote linear increase or decrease, respectively. Values increasing to the reported value are denoted with a greater-than sign while values decreasing from the reported value are denoted with a less-than sign.
To better create a normalized level of activity, an online activity scale may be created. In addition to simply looking at the raw potential for content generation online, this metric will use the collected "interactivity" measures also collected for each platform and, combined with other factors such as the relative importance of a platform to the overall propaganda strategy in a given time period as established in the literature, will produce a weight value for each platform which will indicate its potential value to jihadists and the potential to amplify their message. This weighted value will then be applied to the accounts to assess a more accurate potential for online reach and influence. A function may then be created that will be normalize the values which will denote not only the raw activity for a particular year but also the value of the raw activity for that year to jihadists hoping to recruit or incite violence. Based upon the literature and resources reviewed during data collection the following "phases" will be used to apply the primary weight to particular platforms of importance: Website phase (1998 to 2003-04) -> Unaffiliated Forum phase (2003-04 to 2006) -> Official Forum Phase (2006 to 2013) -> "Golden Age" of Twitter (2014 to 2015) -> The Rise of Telegram (2016 to 2017) -> "Going Dark" on Telegram and other Encrypted Apps (2017 to 2018) -> Decentralization (2019 to present).