parler_text_archive_parser

Converts the partial parler text archive html files to a csv suitable for loading in a SQL database

Installation

Clone the repository
Inside the repository directory run pip install -r requirements.txt to install dependencies
Set files paths in parler_file_path_template.json. Rename this file to parler_file_path.txt.
Run python parler_parse_combo.py

Description

The code in this repsository parses the html files in the partial text archive of the website parler.com as collected by the hacktivist organization DDoSecrets Collective and produces a unified csv for the entire dataset. The raw data can be accessed here: https://ddosecrets.com/wiki/Parler. The Create Table and Copy Data commands for loading the resulting CSV into postgres are available in the Postgres directory of this repo.

Notes:

The html files do not contain a calendar data for the post publication. Instead the date is recorded as '1 day ago', '3 weeks ago', '6 months ago' etc. This parser uses this data to estimate the inital publication date assuming the data was collected on January 6th 2021. So, if an html file contains a date field '1 day ago' the csv will contains an 'Estimated Date' value of '01/05/2021'. If you wish to change the starting date from which the Esitmated Date is calulated, the start date variable is created in the file parse_parler_estimated_date_fun.py.
The 'media' field in the csv will contain any url linked to in the html file. If this url links to an image or video on parler.com, the unique id for that file is stored in the csv column 'video_id'. This id can be matched to the files in the parler video and image archive, also available from the DDoSecrets Collective: https://ddosecrets.com/wiki/Parler

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Postgres		Postgres
README.md		README.md
parler_file_path_template.json		parler_file_path_template.json
parler_parse_combo.py		parler_parse_combo.py
parse_parler_estimated_date_fun.py		parse_parler_estimated_date_fun.py
parse_parler_extract_video_id.py		parse_parler_extract_video_id.py
parse_parler_html_fun.py		parse_parler_html_fun.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parler_text_archive_parser

Installation

Description

Notes:

About

Releases

Packages

Languages

maahutch/parler_text_archive_parser

Folders and files

Latest commit

History

Repository files navigation

parler_text_archive_parser

Installation

Description

Notes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages