Automate full website screen shots and PDF generation with multiple view port support
- Crawls specified host and generates a
sitemap.xml
on the fly - Generates entire website screen shots based on
sitemap.xml
- Define multiple view ports
- Automated PDF generation
- Includes crawled meta data in generated PDF
- Reports on broken website links (404 http response)
- Supports HTTP basic authentication
- Supports Microsoft Online 3 step authentication
- Supports Salesforce Visualforce 3 step authentication
- Supports site maps with HTTP, HTTPS, and FTP protocol URLs
- Follows HTTP 301 redirects
- Custom JavaScript inject file - injects into page prior to screen shooting
- Trigger page events by passing querystring values to custom inject.js file
In This Documentation
Install the following prerequisite on your development machine:
$ npm install siteshooter --global
If siteshooter is installed, make sure you have the latest version by running:
$ npm update siteshooter --global
- You may need to run these commands with elevated privileges, e.g.
sudo
, you will be prompted to do so if needed. - Installing with the
--global
flag affords you thesiteshooter
command on your machine's command line at any path. - Read more about the
--global
flag here.
$ siteshooter --init
View the full siteshooter.yml example
Inside siteshooter.yml
, add additional options.
- All Simple Web Crawler options can be added to
sitecrawler_options
and will pass through to the crawler process - Generated screenshot image files are optimized using imagemin and imagemin-pngquant modules, which reduce the overall size of generated PDFs. To adjust the image quality, update the image_quality option in your siteshooter.yml file.
domain:
name: https://www.devopsgroup.io
auth:
user:
pwd:
pdf_options:
excludeMeta: true
screenshot_options:
delay: 2000
image_quality: '60-80'
transparent_background: false
sitecrawler_options:
exclude:
- "pdf"
stripQuerystring: false
ignoreInvalidSSL: true
viewports:
- viewport: desktop-large
width: 1600
height: 1200
- viewport: tablet-landscape
width: 1024
height: 768
- viewport: iPhone5
width: 320
height: 568
- viewport: iPhone6
width: 375
height: 667
$ siteshooter --help
Usage: siteshooter [options]
OPTIONS
_______________________________________________________________________________________
-c --config Show configuration
-C --cwd Set working directory, which will load a siteshooter.yml file in the specified path
-e --debug Output exceptions
-h --help Print this help
-i --init Create siteshooter.yml template file in working directory
-p --pdf Generate PDFs, by defined view ports, based on screen shots created via Siteshooter
-q --quiet Only return final output
-s --screenshots Generate screen shots, by view ports, based on sitemap.xml file
-S --sitemap Crawl domain name specified in siteshooter.yml file and generate a local sitemap.xml file
-v --version Print version number
-V --verbose Verbose output
-w --website Report on website information based on Siteshooter crawled results
When running a siteshooter
command without any options, the following options will run in order by default:
--sitemap
--screenshots
--pdf
To manipulate the DOM, prior to the screen shot process, add a inject.js
file in the same working directory as the siteshooter.yml
.
Example: inject.js file
/**
* @file: inject.js
* @description: used to inject custom JavaScript into a web page prior to a screen shot.
*/
console.log('JavaScript injected into page.');
if ( typeof(jQuery) !== "undefined" ) {
jQuery(document).ready(function() {
console.log('jQuery loaded.');
});
}
When using the optional inject.js
file, events can be triggered based on the following querystring parameter - pevent
// Add URL with pevent querystring parameter in the generated sitemap.xml
<url>
<loc>https://www.devopsgroup.io?pevent=open-privacy-overlay</loc>
<changefreq>weekly</changefreq>
</url>
Example: Event detection & triggering
/**
* @file: inject.js
* @description: used to inject custom JavaScript into a web page prior to a screen shot.
*/
function getQueryVariable(variable) {
var query = window.location.search.substring(1);
var vars = query.split('&');
for (var i = 0; i < vars.length; i++) {
var pair = vars[i].split('=');
if (decodeURIComponent(pair[0]) == variable) {
return decodeURIComponent(pair[1]);
}
}
}
if ( typeof(jQuery) !== "undefined" ) {
jQuery(document).ready(function() {
var pageName = window.location.pathname.replace('/', ''),
pageEvent = getQueryVariable('pevent');
console.log('document ready.');
console.log('userAgent', navigator.userAgent);
console.log('Page: ', pageName);
console.log('Event: ', pageEvent);
switch (pageName) {
// home
case '':
switch (pageEvent) {
case 'open-privacy-overlay':
jQuery('a[data-target~="#modal-privacy"]').trigger('click');
break;
}
break;
}
});
}
Tests are written with Mocha and can be run with npm test
.
If you're having issues with Siteshooter, submit a GitHub Issue.
- Make sure you have a
siteshooter.yml
file in your working directory and the yaml file is well formatted - Experiencing font-loading issues? Try increasing the delay setting in your siteshooter.yml file
screenshot_options:
delay: 2000
- Trying to take a screenshot of a page with a video? Unfortunately, PhantomJS does not support videos. As such, here's one approach to showing a video's poster image.
/**
* @file: inject.js
* @description: used to display a video's poster image
*/
if( jQuery('video').length >0 ){
jQuery('video').parent().prepend('<img src="'+jQuery('video').attr('poster')+'"/>');
jQuery('video').remove();
}
- SimpleCrawler TypeError: The header content contains invalid characters
- Try setting the acceptCookies option to false
sitecrawler_options:
acceptCookies: false
Take a moment to read or Code of Conduct
We are always looking for quality contributions! Please check the CONTRIBUTING.md for contribution guidelines.