An open-source implementation of a Ground News-like news aggregation service that desensationalizes public news through AI-powered clustering and fact extraction.
Infact demonstrates an intelligent news processing pipeline that takes sensationalized articles and transforms them into factual, neutral reporting. The system clusters similar stories, extracts key facts, removes editorial bias, and generates clean, informative articles.
Note
Coded to life with help from Claude 4.1 Opus
- Smart Article Clustering: Groups related news stories using advanced NLP embeddings
- Fact vs Opinion Separation: Automatically distinguishes between factual content and editorial musings
- Bias Reduction: Removes sensationalized language while preserving core information
- Duplicate Detection: Merges similar facts to eliminate redundancy
- Topic Modeling: Automatically names story clusters using LDA topic extraction
- Interactive Demo: Real-time processing pipeline with visual analytics
![]() |
![]() |
![]() |
![]() |
- NLP Engine: spaCy for text preprocessing and named entity recognition
- Embeddings: Sentence Transformers (all-mpnet-base-v2) for semantic similarity
- Clustering: Scikit-learn KMeans with TF-IDF feature enhancement
- Topic Modeling: Gensim LDA for automatic cluster naming
- Sentiment Analysis: TextBlob for opinion detection
- Similarity Matching: FuzzyWuzzy for duplicate detection
- LLM Integration: Google Gemini 2.5 Flash for article generation
- Orchestration: LangChain for prompt management
- GPU Acceleration: CUDA support for faster embeddings
- Frontend: Streamlit for interactive demo
- Charts: Plotly for cluster visualization and analytics
- Word Clouds: Visual topic representation
- Real-time Processing: Live pipeline monitoring
The pipeline processes news articles through six distinct stages:
- Preprocessing: Text cleaning, tokenization, and normalization
- Clustering: Semantic grouping of related articles
- Topic Extraction: Automatic naming using LDA topic modeling
- Fact Extraction: Separation of facts from opinions using NER and sentiment analysis
- Deduplication: Merging similar facts to reduce redundancy
- Article Generation: Creating neutral, factual articles using LLM
- Batch Processing: Embeddings generated in configurable batches (default: 32)
- Memory Management: Text truncation for large articles (1M char limit)
- GPU Acceleration: Automatic CUDA detection and utilization
- Caching: Model loading cached to prevent reinitialization
- Parallel Processing: Independent operations run simultaneously
- Modular Design: Each pipeline stage is independently scalable
- Configurable Clustering: Dynamic cluster count based on article volume
- Resource Limits: Built-in safeguards for memory and processing time
- Streaming Ready: Architecture supports real-time article ingestion
The project includes extremely clickbait sample articles to test the desensationalization process:
[
{"title":"You Wonât Believe What Governor Silverstone Is Hiding!","content":"BREAKING: Progressive Beacon Daily has uncovered SHOCKING evidence that Governor Silverstoneâs secret offshore accounts teem with illicit payoffs from corporate lobbyists! Documents obtained by our insider reveal hidden transactions totaling MILLIONS funneled through shell companies. Critics say this could spell the end of his political career. If you care about TRANSPARENCY, you NEED to read this exposĂŠ before itâs buried forever!"},
{"title":"Is Silverstone the Most Corrupt Governor Ever?","content":"In a STUNNING revelation, Centrist Times reports Governor Silverstoneâs office allegedly processed suspicious wire transfers linked to big-energy giants. Sources claim these funds influenced critical environmental votes. Lawmakers are demanding a full inquiryâcould this be the biggest scandal in state history? Our exclusive analysis breaks down every transaction and political ramification. You wonât believe how deep this rabbit hole goes!"},
{"title":"Expose: Silverstoneâs Shady Deals Threaten Our Values","content":"Conservative Watchdog News warns that Governor Silverstoneâs dereliction of duty isnât just immoralâitâs PATRIOTIC BETRAYAL! Leaked financial ledgers allegedly show collaboration with radical green groups aiming to dismantle traditional industries. Experts fear these payoffs will cost thousands of jobs and undermine national security. Lawmakers are mobilizing to strip him of office. Donât miss this fiery breakdown of treasonous politics!"},
{"title":"Incredible Breakthrough: Scientists Harness Sunlight Like Never Before!","content":"Progressive Beacon Daily celebrates a MIND-BLOWING invention: researchers at Meridian Institute have developed solar panels that convert 90% of sunlight into energy! This could CRUSH the fossil fuel industry and save the planet. Testing shows devices working under low-light conditionsâEVERY home can go green. Environmentalists call it the TECHNOLOGY of the century. Find out how this revolution could slash your bills to zero!"},
{"title":"Solar Miracle Poised to Rewire Energy Market","content":"Centrist Times reports that a team at Meridian Institute unveiled ultra-efficient solar cells boosting energy conversion rates by 50%. Investors are already lining up to fund mass production. Officials say this could stabilize electricity prices and reduce carbon emissions dramatically. Our experts break down what this means for everyday consumers and the global energy landscape. Could this be the energy shift weâve all waited for?"},
{"title":"New Solar Tech Sparks Fears of Industrial Collapse","content":"Conservative Watchdog News ALERT: Meridian Instituteâs latest solar innovation threatens to DESTROY American manufacturing! Reports indicate the technology could decimate traditional energy sectors, costing millions of jobs. Critics argue the government will FORCE companies to adopt this UNTESTED system, undermining economic stability. Industry leaders are mobilizing to resistâread our fiery take on how radical science is on track to wreck livelihoods."},
{"title":"Hollywoodâs Biggest Star Files for DivorceâMUST-SEE Details!","content":"Progressive Beacon Daily exposes the intimate details behind A-list actor Jordan Calibreâs shocking divorce filing from indie director Riley West. Sources say Calibre cited âirreconcilable creative differences,â but rumors of infidelity swirl like wildfire! Friends claim West discovered damning text messages. Our exclusive interviews delve into every tearful confrontation and trust-shattering betrayal, plus what it means for Calibreâs upcoming blockbuster release."},
{"title":"Celebrity Split Shocks Fans Worldwide","content":"Centrist Times reveals actor Jordan Calibre has petitioned for divorce from Riley West after a decade-long marriage. While the pair released a joint statement emphasizing mutual respect, insiders hint at deep artistic disagreements and financial disputes. We break down the timeline of their relationship, the terms of their prenuptial agreement, and what this could mean for their sprawling media empire."},
{"title":"Star Divorce: Hollywoodâs Moral Decay Exposed","content":"Conservative Watchdog News decries Jordan Calibreâs divorce from Riley West as yet another symbol of Hollywoodâs crumbling moral fabric. Sources allege Westâs radical ideology clashed with Calibreâs family values, prompting this public split. Experts warn this trend undermines societal cohesion. Our explosive report uncovers behind-the-scenes drama, lavish alimony demands, and the culture-war stakes at play in Tinseltownâs latest breakup."},
{"title":"SKY ALERT: Mysterious Comet Heads Straight for Earth!","content":"Progressive Beacon Daily warns: NASA scientists have detected Comet Talora hurtling toward Earth at BREAKNECK speed! Groundbreaking telescopes estimate a collision chance of 2%. While experts urge calm, conspiracy theorists speculate involvement of secret government satellites. Will we see a celestial spectacleâor total annihilation? Our live updates and expert interviews guide you through every astronomical twist before itâs too late!"},
{"title":"Comet Talora: Real Risk or Media Circus?","content":"Centrist Times highlights recent NASA data on Comet Talora, currently 70 million km away and tracking a near-Earth trajectory. Officials place the impact probability at less than 1%, forecasting a dazzling sky show rather than disaster. We clarify scientific jargon, weigh expert assessments, and outline safe viewing protocols. Learn what the public should REALLY know amid the swirling cosmic hype."},
{"title":"Armageddon Incoming? Comet Talora Doom Predictions!","content":"Conservative Watchdog News screams ALERT: Comet Talora might be Godâs final judgment on a morally bankrupt world! Prepper communities stockpile supplies as the celestial object grows ominously bright. Though NASA insists thereâs âno cause for alarm,â regional pastors call for national prayer days. Could this be the sign weâve ignored for too long? Discover how this cosmic visitor might expose societyâs spiritual failings!"},
{"title":"Hospital Crisis: ERs Drowning in PatientsâMUST READ!","content":"Progressive Beacon Daily uncovers a nationwide EMERGENCY as public hospitals report 200% ER capacity surges amid unprecedented flu and COVID-variant outbreaks. Frontline nurses sound the alarm on staff shortages and dwindling medical supplies. Patients wait HOURS for care. Health advocates demand major funding overhauls to save lives. Our exclusive testimonies reveal heartbreaking stories behind overcrowded wards and the real human cost you wonât believe!"},
{"title":"ER Overload: What You Need to Know","content":"Centrist Times examines the current strain on emergency departments across the country, attributing it to overlapping flu, COVID-19, and RSV seasons. Hospitals report bed shortages and extended wait times. Officials propose federal grants and rapid staffing incentives to alleviate pressure. We analyze policy options, compare regional responses, and provide practical tips for seeking timely medical attention during the crisis."},
{"title":"Hospitals on Brink: Government Failures EXPOSED","content":"Conservative Watchdog News BLASTS federal mandates for causing ER meltdowns, with hospitals forced to treat unlawful migrants and non-citizens, leaving locals to suffer. Staff report shutdown threats if they refuse care. Citizens face life-or-death delays while bureaucrats bicker. This is a TAXPAYER SCANDAL! Our fiery investigation names the officials responsible and outlines the radical reforms needed to save American healthcare."},
{"title":"Schoolâs New AI Pronoun Rules Spark Controversy!","content":"Progressive Beacon Daily reveals TechForward Academyâs radical introduction of AI-driven pronoun enforcementâa digital system that auto-corrects speech for inclusivity! Students are seeing real-time alerts and mandatory sensitivity training. Advocates hail it as the FUTURE of respect; critics decry Orwellian overreach. Our in-depth look explores student reactions, privacy concerns, and the impact on classroom culture you canât afford to miss."},
{"title":"AI Pronoun Tool: Balanced Perspectives","content":"Centrist Times reports that TechForward Academy is piloting an AI pronoun-assistant to promote inclusivity. The system flags misgendering and offers immediate guidance. Supporters argue it fosters empathy, while detractors question data security and free speech implications. We present viewpoints from educators, legal experts, and parent groups, plus a side-by-side analysis of the programâs benefits and potential pitfalls."},
{"title":"Schoolâs AI Pronoun Police: Free Speech Under Siege?","content":"Conservative Watchdog News warns TechForward Academyâs AI pronoun enforcement is the latest step toward THOUGHT CONTROL in schools! Students risk penalties for âunintentionalâ speech errors, and faculty face dismissal if they push back. This techno-authoritarian nightmare could spread nationwide, stifling dissent and bending education to radical ideology. Read our blistering critique on how AI is weaponized against fundamental liberties!"}
]These exaggerated examples demonstrate the system's ability to extract factual content from sensationalized reporting.
Download the Jupyter notebook (or copy paste content into a new Google Colab notebook cell). Hit run. If you're using Colab, you'll need to configure GOOGLE_API_KEY and NGROK_AUTH_TOKEN secrets (left panel, key icon).
The system provides comprehensive analytics:
- Processing Speed: < 1 minute for 20 articles
- Cluster Accuracy: Visualized through PCA projections
- Fact Extraction Rate: Typically 5-10 facts per article
- Deduplication Efficiency: 20-40% reduction in redundant content
- Bias Reduction: Measured through sentiment analysis
n_clusters = min(max(3, len(texts) // 20), 15) # Dynamic cluster sizing
threshold = 0.7 # Similarity threshold for deduplication
batch_size = 32 # Embedding batch sizesentence_model = 'all-mpnet-base-v2' # Embedding model
llm_model = 'gemini-2.5-flash' # Generation model
max_text_length = 1000000 # Processing limitWe welcome contributions! Areas for improvement:
- Additional news source integrations
- Enhanced bias detection algorithms
- Real-time processing capabilities
- Multi-language support
- Advanced fact-checking integration
This project is open source and available under the MIT License.
- News Organizations: Reduce editorial bias in reporting
- Research: Study media bias and sensationalization patterns
- Education: Demonstrate AI applications in journalism
- Fact-Checking: Automated extraction of verifiable claims
- Content Moderation: Identify opinion vs factual content
Infact: Making news more factual, one article at a time.



