Constitutional Evolution: A Text Analysis Framework

A Python-based text analysis tool that visualizes linguistic patterns and thematic similarities across historical constitutional documents from 17 countries spanning 1787-1997.

Overview

This project applies NLP techniques to compare constitutional texts, revealing how political philosophies, governance structures, and rights frameworks evolved across different nations and time periods. Through custom parsers and visualization methods, it transforms dense legal documents into interpretable insights about constitutional design patterns.

Features

Multi-format text processing: Custom parsers for PDF and JSON formats with configurable stopword filtering
Sankey flow diagrams: Visualize most frequent terms by document to identify dominant themes
Topic modeling: LDA-based clustering to discover latent thematic patterns across constitutions
Document similarity mapping: TF-IDF + UMAP dimensionality reduction to plot constitutional texts in 2D semantic space

Technical Implementation

Text Processing Pipeline:

Extracts and normalizes text from PDF/JSON sources
Filters stopwords, punctuation, numbers, and Roman numerals
Generates word frequency distributions and document statistics

Analysis Methods:

TF-IDF vectorization for term importance weighting
Latent Dirichlet Allocation for topic extraction
UMAP for high-dimensional similarity visualization
Sankey diagrams for term flow analysis

Dataset

Analyzes 17 constitutional documents:

USA (1787), France (1791), Mexico (1917), Russia (1918, 1993)
Germany (1919, 1949), Japan (1947), India (1950)
North Korea (1972), Spain (1978), Iran (1979), China (1982)
South Korea (1987), Brazil (1988), South Africa (1996), Poland (1997)

Key Files

great_textpectations.py: Core analysis framework with visualization methods
textpectations_parsers.py: Custom parsers for PDF and JSON text extraction
main.py: Driver script that loads documents and generates all visualizations

Output Visualizations

Sankey Diagram (sankey_diagram.html): Interactive flow chart showing top-k words per document
Topic Distribution (topic_distribution.png): Bar plots showing LDA topic proportions across documents
Similarity Scatterplot (similarity_scatterplot.png): 2D projection of document similarity in semantic space

Technologies

Languages: Python
Libraries: NLTK, scikit-learn, UMAP, Matplotlib, Pandas, pypdf
Techniques: TF-IDF, LDA topic modeling, dimensionality reduction, text preprocessing

Running the Analysis

from great_textpectations import Textpectations
import textpectations_parsers as tp

# Initialize framework
tt = Textpectations()

# Load documents with custom parser
tt.load_text('pdfs/usa_1787.pdf', 'USA (1797)', parser=tp.pdf_parser)

# Generate visualizations
tt.similarity_scatterplot()
tt.wordcount_sankey()
tt.topic_bar_plots()

Insights

The framework reveals:

Linguistic clustering by political system (e.g., socialist vs. democratic constitutions)
Temporal evolution in constitutional language and priorities
Thematic patterns around rights, governance structures, and state powers
Cross-cultural influences through shared terminology and concepts

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
__pycache__		__pycache__
pdfs		pdfs
.DS_Store		.DS_Store
NLTK's list of english stopwords		NLTK's list of english stopwords
README.md		README.md
great_textpectations.py		great_textpectations.py
sankey.py		sankey.py
sankey_diagram.html		sankey_diagram.html
similarity_scatterplot.png		similarity_scatterplot.png
test.txt		test.txt
textpectations_app.py		textpectations_app.py
textpectations_parsers.py		textpectations_parsers.py
topic_distribution.png		topic_distribution.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constitutional Evolution: A Text Analysis Framework

Overview

Features

Technical Implementation

Dataset

Key Files

Output Visualizations

Technologies

Running the Analysis

Insights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

crbridget/constitutional-evolution-nlp

Folders and files

Latest commit

History

Repository files navigation

Constitutional Evolution: A Text Analysis Framework

Overview

Features

Technical Implementation

Dataset

Key Files

Output Visualizations

Technologies

Running the Analysis

Insights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages