Search Engine Ranking Comparator

Introduction

SECmp is a search analysis system designed to collect, normalize, and compare search engine rankings, using Google Search results as the relevance baseline.

The system issues real-world queries to multiple search engines, extracts the top organic results, and evaluates ranking similarity against Google using percent overlap and Spearman rank correlation. SECmp focuses on ranking behavior, relevance divergence, and signal consistency across heterogeneous search platforms.

This project reflects core problems in information retrieval, ranking systems, and signal evaluation, and mirrors workflows used in search quality analysis and automated sourcing systems.

Workflow

SECmp follows a two-stage pipeline:

1. Search Result Collection (Task 1)

Query Ingestion
- Load a predefined query set (e.g. 100 real-world queries).
Multi-Engine Search
- Issue queries to supported search engines:
  - Google (baseline, pre-collected)
  - Bing
  - Yahoo!
  - DuckDuckGo
  - Ask
- Apply randomized delays to avoid throttling.
Result Extraction
- Use Selenium for dynamic engines (Bing, Yahoo).
- Use Requests + BeautifulSoup for static or semi-static engines.
- Decode redirect and tracking URLs to obtain canonical destination links.
Result Normalization
- Clean and normalize URLs to remove:
  - Scheme differences (http / https)
  - Trailing slashes
  - Tracking and redirect wrappers
Structured Storage
- Persist top-10 results per query into engine-specific JSON files.

2. Ranking Comparison & Evaluation (Task 2)

Baseline Alignment
- Load Google Search results as the reference ranking.
Overlap Analysis
- Compute the number and percentage of overlapping URLs between each engine and Google.
Ranking Correlation
- Compute Spearman rank correlation coefficient for overlapping results.
- Apply re-ranking logic to handle partial overlaps correctly.
Reporting
- Generate a summary CSV containing:
  - Overlap count
  - Percent overlap
  - Spearman correlation
  - Per-query results and overall averages

Output

SECmp generates structured outputs for both stages:

Task 1 — Search Results

output/
└── task1/
    └── <timestamp>/
        ├── Bing_Results.json
        ├── Yahoo!_Results.json
        ├── DuckDuckGo_Results.json
        └── Ask_Results.json

Each JSON file maps:

Query → Top-10 ranked URLs

Task 2 — Ranking Evaluation

output/
└── task2/
    └── evaluation.csv

The CSV includes:

Query index
Number of overlapping results
Percent overlap with Google
Spearman rank correlation coefficient
Aggregate averages

Tech Stack

Language: Python
Networking: requests
HTML Parsing: BeautifulSoup (beautifulsoup4)
Browser Automation: Selenium (Chrome WebDriver)
Ranking Metrics: Spearman rank correlation
Data Formats: JSON, CSV

How to Run

1. Prerequisites

Python 3.8+
Google Chrome
ChromeDriver (compatible with your Chrome version)

Install dependencies:

pip install -r requirements.txt

2. Prepare Query & Baseline Data

Place query file under:
```
./assets/100QueriesSet*.txt
```
Place Google baseline results under:
```
./assets/results/Google_Result*.json
```

3. Run Search Result Collection (Task 1)

python secmp_task1.py

This will:

Query each supported search engine
Collect top-10 results per query
Store results as JSON files

4. Run Ranking Comparison (Task 2)

python secmp_task2.py

This will:

Compare each engine against Google
Compute overlap and Spearman correlation
Generate a summary CSV report

Notes

Google results are treated as the relevance baseline.
Randomized delays are enforced to reduce blocking risk.
URL normalization is critical to ensure fair comparison.
The system evaluates ranking behavior, not content quality.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
secmp_task1.py		secmp_task1.py
secmp_task2.py		secmp_task2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine Ranking Comparator

Introduction

Workflow

1. Search Result Collection (Task 1)

2. Ranking Comparison & Evaluation (Task 2)

Output

Task 1 — Search Results

Task 2 — Ranking Evaluation

Tech Stack

How to Run

1. Prerequisites

2. Prepare Query & Baseline Data

3. Run Search Result Collection (Task 1)

4. Run Ranking Comparison (Task 2)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Search Engine Ranking Comparator

Introduction

Workflow

1. Search Result Collection (Task 1)

2. Ranking Comparison & Evaluation (Task 2)

Output

Task 1 — Search Results

Task 2 — Ranking Evaluation

Tech Stack

How to Run

1. Prerequisites

2. Prepare Query & Baseline Data

3. Run Search Result Collection (Task 1)

4. Run Ranking Comparison (Task 2)

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages