Skip to content
/ crawler Public
forked from thingston/crawler

Web crawler based on PHP [Guzzle HTTP Client](http://docs.guzzlephp.org/) with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.

Notifications You must be signed in to change notification settings

katoni/crawler

 
 

Repository files navigation

Thingston Crawler

Web crawler based on PHP Guzzle HTTP Client with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.

Requirements

Thingston Crawler requires:

Instalation

Add Thingston Crawler to any PHP project using Composer:

composer require thingston/crawler

Getting Started

Simply create a new Crawler instance and invoke start method with any public URI:

use Thingston\Crawler;

$crawler = new Crawler();
$crawler->start('https://www.wikipedia.org/');

In order to process results from the crawling process you may add as many many Observers. An Observer is a concrete class implement Thingston/Crawler/Observer/ObserverInterface.

Reporting Issues

In case you find issues with this code please open a ticket in Github Issues at https://github.com/thingston/crawler/issues.

Contributors

Open Source is made of contribuition. If you want to contribute to Thingston please follow these steps:

  1. Fork latest version into your own repository.
  2. Write your changes or additions and commit them.
  3. Follow PSR-2 coding style standard.
  4. Make sure you have unit tests with full coverage to your changes.
  5. Go to Github Pull Requests at https://github.com/thingston/crawler/pulls and create a new request.

Thank you!

Changes and Versioning

All relevant changes on this code are logged in a separated log file.

Version numbers follow recommendations from Semantic Versioning.

License

Thingston code is maintained under The MIT License.

About

Web crawler based on PHP [Guzzle HTTP Client](http://docs.guzzlephp.org/) with concurrency support for faster operation. Includes support for any content-type download, link profiler and response observers.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%