Skip to content

This repository presents a crawler that queries Twitter backend to retrieve tweets geolocated at the subnational level. The generated dataset can contain more than 4 times as many tweets compared to geotagged tweets only dataset.

License

Notifications You must be signed in to change notification settings

maelteir/Subnational-Level-Geolocated-Tweets-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subnational-Level Geolocated Tweets Crawler

This repository presents a crawler that queries Twitter backend to retrieve tweets geolocated at the subnational level. The retrieved tweets are not only geotagged tweets, but also tweets whose user location belongs to the target subnational region. Generating such dataset is essential for building a surveillance system that provides national-level and subnational-level insights during crises and epidemics.

Applying this crawler to Kingdom of Saudi Arabia during COVID-19 epidemic, the generated dataset reaches 262,178 unique geolocated tweets compared to only 61,711 unique geotagged tweets i.e., 4.25 times as many tweets. Additionally, the dataset successfully predicted two COVID-19 outbreaks in June 2021 and January 2022. The Pearson correlation coefficient between WHO weekly reported cases and weekly returned tweets, with a one-week lag, is r = 0.733; p < 0.001 for Arabic tweets and r = 0.814; p < 0.001 when including English tweets, indicating a very strong correlation at the national level. At the subnational level, top-populated provinces show strong correlations (r = 0.64 to 0.74; p < 0.003).

Similar datasets can be generated for different regions and events by:

  • Changing the files in the Circles directory to represent the regions/ subregions of interest.
  • Changing the files in the Keywords directory to include the keywords representing the target event.

The dataset generated using this crawler can be accessed at KSAGeoCOV.

Publications

 @article{elteir2025cost,
  title={Cost-effective time-efficient subnational-level surveillance using Twitter: Kingdom of Saudi Arabia case study},
  author={Elteir, Marwa K},
  journal={Discover Applied Sciences},
  volume={7},
  number={1},
  pages={60},
  year={2025},
  publisher={Springer}
}

About

This repository presents a crawler that queries Twitter backend to retrieve tweets geolocated at the subnational level. The generated dataset can contain more than 4 times as many tweets compared to geotagged tweets only dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages