Skip to content

hloe-ahn/Direct-Link-Google-News-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Direct Link Google News Scraper

This tool pulls fresh Google News articles from the last 24 hours and gives you their actual direct-source URLs. No proxies, no headaches—just clean, recent news data tied to the keywords you care about. If you need to track breaking stories or monitor topics in real time, this scraper keeps things simple and fast.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Direct Link Google News Scraper you've just found your team — Let's Chat. 👆👆

Introduction

The scraper fetches newly published news articles across the web based on keyword searches, using Google News as the discovery engine. It’s ideal for researchers, journalists, analysts, and anyone collecting timely updates without manually checking multiple sites.

What It Focuses On

  • Retrieves articles from the past 24 hours using real publication timestamps.
  • Pulls true article URLs instead of Google redirect links.
  • Supports multiple keywords in a single run.
  • Works without proxy rotation or complicated setup.
  • Returns structured results ready for databases, dashboards, or alerts.

Features

Feature Description
24-Hour News Filtering Extracts articles published within the last 24 hours using actual source timestamps.
Direct Article URLs Captures real article links instead of Google redirect wrappers.
Keyword-Based Search Accepts multiple keywords and returns matched stories per term.
Source Metadata Collects publisher name, domain, and associated metadata.
Lightweight Operation Runs efficiently without proxies or heavy browser automation.
Configurable Limits Supports item caps per keyword for flexible output sizes.

What Data This Scraper Extracts

Field Name Field Description
keyword Search term used to discover the article.
title Title of the news article.
link Direct article link from the source website.
source Name of the publishing outlet.
published Actual publication timestamp in ISO format.
domain Domain extracted from the article URL.
image URL of the article’s preview image if available.

Example Output

[
  {
    "keyword": "climate change",
    "title": "WASH considerations in key national climate change policies",
    "link": "https://www.graphic.com.gh/news/general-news/ghana-news-wash-considerations-in-key-national-climate-change-policies.html",
    "source": "Graphic Online",
    "published": "2024-03-25T06:50:08Z",
    "domain": "www.graphic.com.gh",
    "image": "https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSVaL97MoMZsWxSwZF9d0e-j0AgCxbyA8FVeEajWEGoT5U-cdq2PZXb0gQ0ojaAR0rhKvlNu_V0H7T9f9EAVY4"
  }
]

Directory Structure Tree

Direct Link Google News Scraper/
├── src/
│   ├── main.js
│   ├── scraper/
│   │   ├── google_news_parser.js
│   │   ├── article_extractor.js
│   │   └── request_handler.js
│   ├── utils/
│   │   ├── timestamp_filter.js
│   │   └── normalize_domain.js
│   └── config/
│       └── settings.example.json
├── data/
│   ├── sample_keywords.json
│   └── sample_output.json
├── package.json
└── README.md

Use Cases

  • Journalists track breaking stories tied to their beats so they can quickly review newly published content.
  • Research teams monitor topical keywords to gather up-to-date insights from diverse publishers.
  • Developers feed real-time news into dashboards, alerting tools, and search portals.
  • Marketing teams watch brand or industry keywords to react promptly to emerging coverage.
  • Analysts scrape daily keywords for sentiment, trend detection, and competitive research.

FAQs

Does it require proxies?
No. It’s designed to run without proxy setup, making it lightweight and easy to deploy.

How are articles filtered by date?
The scraper uses the original article's published timestamp, not Google’s indexing time.

Can I limit how many results I get?
Yes, you can set a maxItems value per keyword.

Does it return direct article links?
Absolutely—no redirect URLs, only real publisher links.


Performance Benchmarks and Results

Primary Metric:
Delivers keyword-based news results in seconds, even across multiple search terms.

Reliability Metric:
Maintains a consistent fetch rate above 98% accuracy for publication timestamps.

Efficiency Metric:
Runs with low overhead thanks to streamlined requests and minimal parsing overhead.

Quality Metric:
Returns clean, enriched article metadata including source, domain, and preview images for strong dataset usability.


Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★