Skip to content

Efficient crawling & data extraction from web pages using concurrency in multiple programming languages.

License

Notifications You must be signed in to change notification settings

siddhant-vij/Concurrent-Web-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concurrent Web Crawler

Designed to showcase the development and performance of web crawlers using concurrent programming techniques across Java, C++, Go, and Python. This project aims to demonstrate how concurrency can significantly enhance web crawling efficiency, allowing for faster data retrieval and processing.

By implementing the crawler in multiple programming languages, the project provides insights into the concurrency models of each language and their practical application in web scraping tasks.


Table of Contents

  1. Features
  2. Installation and Usage
  3. Contributing
  4. License

Features

  • Concurrent fetching of web pages to maximize data retrieval speed.
  • Configurable depth and domain restrictions for targeted crawling.
  • Efficient URL management to avoid processing duplicates.
  • Performance analysis comparing concurrent crawlers against sequential ones.
  • Implementations in Java, C++, Go, and Python to highlight language-specific concurrency strategies.

Installation and Usage

  1. Clone the Repository:
    git clone https://github.com/siddhant-vij/Concurrent-Web-Crawler.git
  2. Navigate to Language Directory:
    cd Concurrent-Web-Crawler/[language]
  3. Install Dependencies: Standard instructions to be followed for each language, if any external dependency.
  4. Build and Run the Application: Standard instructions to be followed for each language.

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch:
    git checkout -b feature/AmazingFeature
  3. Commit your Changes:
    git commit -m 'Add some AmazingFeature'
  4. Push to the Branch:
    git push origin feature/AmazingFeature
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.