Skip to content
This repository has been archived by the owner on Sep 12, 2021. It is now read-only.

A Laravel Wrapper for Chrome Headless. Get the DOM of any webpage.

Notifications You must be signed in to change notification settings

helloiamlukas/laravel-chrome

Repository files navigation

A Chrome Headless wrapper for Laravel

Build Status StyleCI

Get the DOM of any webpage by using headless Chrome.

💡 This is a Laravel wrapper of helloiamlukas/chrome-php.

Requirements

This package requires the Puppeteer Chrome Headless Node library.

If you want to install it on Ubuntu 16.04 you can do it like this:

sudo apt-get update
curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
sudo apt-get install -y nodejs gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
sudo npm install --global --unsafe-perm puppeteer
sudo chmod -R o+rx /usr/lib/node_modules/puppeteer/.local-chromium

Installation

You can install this package via composer by running:

composer require helloiamlukas/laravel-chrome

After that, the package will automatically register itself.

To publish the configuration file, you need to run:

php artisan vendor:publish --provider="ChromeHeadless\ChromeHeadlessServiceProvider"

This will create a config file at config/chrome.php.

Configuration

The configuration can be found at config/chrome.php.

Custom Chrome Path

You can specify a custom path to your Chrome installation.

/*
|--------------------------------------------------------------------------
| Chrome Path
|--------------------------------------------------------------------------
|
| Manually set the path where Google Chrome is installed.
|
*/
'exec_path' => '/path/to/chrome',

Custom User Agent

You can specify a custom user agent. By default the standard Chrome Headless user agent will be used.

/*
|--------------------------------------------------------------------------
| User Agent
|--------------------------------------------------------------------------
|
| Change the user agent that will be used by Google Chrome.
|
*/
'user_agent' => 'nice-user-agent',

Timeout

You can specify a timeout after which the process will be killed. The timeout should be given in seconds.

/*
|--------------------------------------------------------------------------
| Timeout
|--------------------------------------------------------------------------
|
| Specify a timeout in seconds.
| (null = no timeout)
|
*/
'timeout' => 10,

If the process runs out of time a Symfony\Component\Process\Exception\ProcessTimedOutException will be thrown.

Viewport

You can specify a custom viewport that will be used when you make a request. By default the Chrome Headless standard of 800x600px will be used.

/*
|--------------------------------------------------------------------------
| Viewport
|--------------------------------------------------------------------------
|
| Specify a viewport.
|
*/
'viewport' => [
                    'width' => 1920,
                    'height' => 1080
                ],

Blacklist

You can specify a list of regular expressions for files that should not be loaded when you request a website. These expressions will be checked against the url of the file.

/*
|--------------------------------------------------------------------------
| Blacklist
|--------------------------------------------------------------------------
|
| Specify a list of files that should not be loaded.
|
*/
'blacklist' => [
                    'www.google-analytics.com',
                    'analytics.js'
                ],

Custom Headers

You can specify custom headers which will be used for the request.

/*
|--------------------------------------------------------------------------
| Additional Request Headers
|--------------------------------------------------------------------------
|
| Specify additional headers.
|
*/
'headers' => [
                'DNT' => 1 // DO NOT TRACK
             ],

Usage

Here is a quick example how to use this package:

use ChromeHeadless\ChromeHeadless;

$html = ChromeHeadless::url('https://example.com')->getHtml();

Instead of getting the DOM as a string, you can also use thegetDOMCrawler() method, which will return a Symfony\Component\DomCrawler\Crawler instance.

use ChromeHeadless\ChromeHeadless;

$dom = ChromeHeadless::url('https://example.com')->getDOMCrawler();
    
$title = $dom->filter('title')->text();

This makes it easy to filter the DOM for specific elements. Check the full documentation here.

Testing

You can run the tests by using

composer test