A web-based application which crawls profiles on Twitter/X for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
- PHP 8.3 with extensions { curl, sqlite3 } enabled.
- A Twitter/X account, logged in via Google Chrome desktop.
-
Clone this project in a PHP server.
-
Sniff Twitter headers:
- Open Twitter in your browser and log in to the account by which you want to crawl Twitter.
- Open Chrome, go to Developer tools -> Network section and filter all requests by Fetch/XHR. Then make Twitter do something like a search or open someone's profile.
- Find the related request, right click on it, select Copy -> Copy as fetch (Node.js).
- Open a file called headers.json in an editor and paste the contents there.
- Now you have to carefully trim the headers after
"headers":
, select the opening curly bracket until where it ends, it'll usually be 22 lines (make sure to include the curly brackets themselves). - Now here's your headers file. Put it in the main directory of the project, beside the PHP files.
-
Run
frontend/install.php
to download the required front-end assets. -
Open the index page in a browser like it's a typical website. You'll see an empty table.
-
Enter the username of the profile you want to export its contents, then click on Add... It'll immediately start crawling that profile and download a bunch of latest tweets. You can download more later.
-
Now return to the index page and see all contents of your target profile.
-
manager.php : the first page with a table of target profiles you saw during the setup. You'll be able to add multiple profiles and delete them. Its data is stored in targets.json via the module config.php.
-
viewer.php : reads databases and shows their contents, also has a UI for using crawler.php.
-
crawler.php : crawls Twitter using the module API.php, parses responses, stores data inside databases via the module Database.php and downloads related media.
Expected GET parameters to be run via a web server:
t=
target Twitter ID number (required)search=
URL-encoded search querysect=
section number (defaults to 2)max_entries=
maximum entries allowed to be retrieved (entries not tweets; entries can be follow suggestions as well). Set to 0 in order to turn it off. (defaults to 0)use_cache=
whether it should store JSON responses in /cache/ and use them again (typically for debugging). Only values1
and0
(defaults to 0)update_only=
whether it should abandon crawling if it finds already parsed tweets. Only values1
and0
are valid as yes and no (defaults to 0(no))download_media=
whether it should download profile pictures, banners and tweet attachments or not. Only values1
and0
are valid as yes and no (defaults to 1(yes))delay=
delay (in seconds) between each API request in order not to be detected as a bot. (defaults to 10)sse=
whether or not it must send Server-Sent Events. Only values1
and0
(defaults to 0)
To run via command line (especially as a cron job):
~$ php crawler.php <*TARGET_TWITTER_ID_NUMBER> <*UPDATE_ONLY[0,1]> {SEARCH QUERY}
-
printer.php : creates a TXT file out of main tweets of a profile, usually in order to be analysed by an AI like ChatGPT.
Expected GET parameters to be run via a web server:
t=
target Twitter ID number (required)
To run via command line:
~$ php printer.php <*TARGET_TWITTER_ID_NUMBER>
-
cleaner.php : removes old profile pictures and banners.
Expected GET parameters to be run via a web server:
t=
target Twitter ID number (required)
To run via command line:
~$ php cleaner.php <*TARGET_TWITTER_ID_NUMBER>
-
organiser.php : sorts all records of a database by their IDs. Use it carefully!
To run via command line:
~$ php organiser.php <*TARGET_TWITTER_ID_NUMBER>
-
config.php : controls targets.json containing a list of target profiles.
-
API.php : connects to the Twitter API and gets JSON responses (but doesn't parse them).
-
Database.php : controls SQLite databases containing all data from Twitter profiles.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2024 Mahdi Parastesh <fulcrum1378@gmail.com>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.