This librairy provides utilities function to ease robots.txt manipulation. If you want to check if URLs respect robots.txt policy with optional cache then it's your lucky day ;)
Install package with composer
composer require hugsbrugs/php-robots-txt
In your PHP code, load library
require_once __DIR__ . '/../vendor/autoload.php';
use Hug\Robots\Robots as Robots;
Returns if a page is accessible by respecting robots.txt policy. Optionaly pass a user agent to also check against UA policy.
Robots::is_allowed($url, $user_agent = null);
With this simple method a call to remote robots.txt will be fired on each request. To enable a cache define following variables
define('HUG_ROBOTS_CACHE_PATH', '/path/to/robots-cache/');
define('HUG_ROBOTS_CACHE_DURATION', 7*86400);
Cache in seconds (86400: 1 day) Don't forget to make your path writable by webserver user robots.txt files are gzcompressed to save disk space
You Should not need following methods unless you want to play with code and tweak it !
Robots::download_robots($url, $user_agent);
Robots::get_robots($url, $user_agent);
Robots::is_cache_obsolete($file);
Robots::empty_cache();
phpunit --bootstrap vendor/autoload.php tests
Hugo Maugey visit my website ;)