Skip to content

Latest commit

 

History

History
69 lines (46 loc) · 1.94 KB

README.md

File metadata and controls

69 lines (46 loc) · 1.94 KB

Thingston Crawler

Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.

Requirements

Thingston Crawler requires:

Instalation

Add Thingston Crawler to any PHP project using Composer:

composer require thingston/crawler

Getting Started

Simply create a new Crawler instance and invoke start method with any public URI:

use Thingston\Crawler;

$crawler = new Crawler();
$crawler->start('https://www.wikipedia.org/');

In order to process results from the crawling process you may add as many many Observers. An Observer is a concrete class implement Thingston/Crawler/Observer/ObserverInterface.

Reporting Issues

In case you find issues with this code please open a ticket in Github Issues at https://github.com/thingston/crawler/issues.

Contributors

Open Source is made of contribuition. If you want to contribute to Thingston please follow these steps:

  1. Fork latest version into your own repository.
  2. Write your changes or additions and commit them.
  3. Follow PSR-2 coding style standard.
  4. Make sure you have unit tests with full coverage to your changes.
  5. Go to Github Pull Requests at https://github.com/thingston/crawler/pulls and create a new request.

Thank you!

Changes and Versioning

All relevant changes on this code are logged in a separated log file.

Version numbers follow recommendations from Semantic Versioning.

License

Thingston code is maintained under The MIT License.