This package is the Node.js client for Headless-Render-API.com (formerly named prerender.cloud from 2016 - 2022)
Use it for pre-rendering (server-side rendering), or taking screenshots of webpages or converting webpages to PDFs.
npm install prerendercloud-server --save
// simplest possible example usage of this lib
const prerendercloud = require("prerendercloud");
// if you are pre-rendering a JavaScript single-page app
// served from express (or middleware compatible http server),
// use as middleware in your existing server - or try our all-in-one
// server https://github.com/sanfrancesco/prerendercloud-server
app.use(prerendercloud);
// or take a screenshot of a URL
const fs = require("fs");
prerendercloud
.screenshot("http://example.com")
.then((pngBuffer) =>
fs.writeFileSync("out.png", pngBuffer, { encoding: null })
);
// or create a PDF from a URL
prerendercloud
.pdf("http://example.com")
.then((pdfBuffer) =>
fs.writeFileSync("out.pdf", pdfBuffer, { encoding: null })
);
The pre-render/server-side rendering functionality of this package (as opposed to mere screenshots/pdfs) is meant to be included in an existing web server where 404s are rendered as index.html
- For an all-in-one single-page app web server plus server-side rendering see: https://github.com/sanfrancesco/prerendercloud-server
- Install
- Screenshots
- PDFs
- Scrape
- Prerendering or Server-side rendering with Express/Connect/Node http
npm install prerendercloud --save
Get a token after signing up at https://headless-render-api.com - it's necessary to move off of the rate-limited free tier
PRERENDER_TOKEN=mySecretToken node index.js
const prerendercloud = require("prerendercloud");
prerendercloud.set("prerenderToken", "mySecretToken");
DEBUG=prerendercloud node index.js
Capture a screenshot:
const prerendercloud = require("prerendercloud");
prerendercloud
.screenshot("http://example.com")
.then((pngBuffer) =>
fs.writeFileSync("out.png", pngBuffer, { encoding: null })
);
Capture specific element with padding:
Note: viewportQuerySelector causes viewportWidth/viewportHeight to be ignored
prerendercloud
.screenshot("http://example.com", {
viewportQuerySelector: "#open-graph-div",
viewportQuerySelectorPadding: 10,
})
.then((pngBuffer) =>
fs.writeFileSync("out.png", pngBuffer, { encoding: null })
);
Customize dimensions and format:
prerendercloud
.screenshot("http://example.com", {
deviceWidth: 800,
deviceHeight: 600,
viewportWidth: 640,
viewportHeight: 480,
viewportX: 0,
viewportY: 0,
format: "jpeg", // png, webp, jpeg
})
.then((jpgBuffer) =>
fs.writeFileSync("out.jpg", jpgBuffer, { encoding: null })
);
Set emulated media and viewport scale:
prerendercloud
.screenshot("http://example.com", {
// (screen, print, braille, embossed, handheld, projection, speech, tty, tv)
emulatedMedia: "print",
viewportScale: 2,
})
.then((pngBuffer) =>
fs.writeFileSync("out.png", pngBuffer, { encoding: null })
);
Generate a PDF:
const prerendercloud = require("prerendercloud");
prerendercloud
.pdf("http://example.com")
.then((pdfBuffer) =>
fs.writeFileSync("out.pdf", pdfBuffer, { encoding: null })
);
Disable page breaks and set emulated media:
prerendercloud
.pdf("http://example.com", {
// Note: using noPageBreaks forces the following
// - pageRanges: "1",
// - preferCssPageSize: true
noPageBreaks: true,
printBackground: true,
// emulatedMedia options: (screen, print, braille, embossed, handheld, projection, speech, tty, tv)
emulatedMedia: "screen",
})
.then((pdfBuffer) =>
fs.writeFileSync("out.pdf", pdfBuffer, { encoding: null })
);
Configure PDF options:
const prerendercloud = require("prerendercloud");
prerendercloud
.pdf("http://example.com", {
pageRanges: "1-3",
scale: 1.5,
preferCssPageSize: true,
printBackground: true,
landscape: true,
marginTop: 1.75,
marginRight: 0.5,
marginBottom: 1.75,
marginLeft: 0.5,
paperWidth: 8.5,
paperHeight: 11,
})
.then((pdfBuffer) =>
fs.writeFileSync("out.pdf", pdfBuffer, { encoding: null })
);
Scrape a webpage/URL. Optimized for scraping, this endpoint is based off our pre-rendering engine so it can correctly handle JavaScript apps.
Scrape just the HTML:
const { body } = await prerendercloud.scrape("https://example.com");
console.log(body);
fs.writeFileSync("body.html", body);
Or scrape HTML and take a screenshot, and parse important meta tags (title, h1, open graph, etc.):
const {
body, // Buffer
meta: {
title,
h1,
description,
ogImage,
ogTitle,
ogType,
ogDescription,
twitterCard,
},
links, // Array
screenshot, // Buffer
statusCode, // number
headers, // object of headers
} = await prerendercloud.scrape("https://example.com", {
withMetadata: true,
withScreenshot: true,
followRedirects: false,
});
if (statusCode === 301 || statusCode === 302) {
// get the redirect location from headers.location
// if instead you'd rather follow redirects, set followRedirects: true
} else {
console.log(body.toString());
console.log({
meta: {
title,
h1,
description,
ogImage,
ogTitle,
ogDescription,
twitterCard,
},
links,
});
fs.writeFileSync("body.html", body);
fs.writeFileSync("screenshot.png", screenshot);
// links is an array of all the links on the page
// meta is an object that looks like: { title, h1, description, ogImage, ogTitle, ogType, ogDescription, twitterCard }
// screenshot and body are Buffers (so they can be saved to file)
// call body.toString() for stringified HTML
}
The prerendercloud
middleware should be loaded first, before your other middleware, so it can forward the request to service.headless-render-api.com.
// the free, rate limited tier
// and using https://expressjs.com/
const prerendercloud = require("prerendercloud");
expressApp.use(prerendercloud);
The default behavior forwards all traffic through Headless-Render-API.com
We don't recommend this setting, instead use the default setting of pre-rendering all user-agents (because of performance boost and potential google cloaking penalties) but there may be a situation where you shouldn't or can't, for example: your site/app has JavaScript errors when trying to repaint the DOM after it's already been pre-rendered but you still want bots (twitter, slack, facebook etc...) to read the meta and open graph tags.
Note: this will add or append 'User-Agent' to the Vary header, which is another reason not to recommend this feature (because it significantly reduces HTTP cacheability)
const prerendercloud = require("prerendercloud");
prerendercloud.set("botsOnly", true);
You can also append your own agents to our botsOnly list by using an array:
const prerendercloud = require("prerendercloud");
prerendercloud.set("botsOnly", ["altavista", "dogpile", "excite", "askjeeves"]);
Note: this will NOT add or append 'User-Agent' to the Vary header. You should probably set the Vary header yourself, if using this feature.
const prerendercloud = require("prerendercloud");
prerendercloud.set("whitelistUserAgents", [
"twitterbot",
"slackbot",
"facebookexternalhit",
]);
Useful for your own caching layer (in conjunction with afterRender
), or analytics, or dependency injection for testing. Is only called when a remote call to service.headless-render-api.com is about to be made.
const prerendercloud = require("prerendercloud");
prerendercloud.set("beforeRender", (req, done) => {
// call it with a string to short-circuit the remote prerender codepath
// (useful when implementing your own cache)
done(null, "hello world"); // returns status 200, content-type text/html
// or call it with an object to short-circuit the remote prerender codepath
// (useful when implementing your own cache)
done(null, { status: 202, body: "hello" }); // returns status 202, content-type text/html
done(null, { status: 301, headers: { location: "/new-path" } }); // redirect to /new-path
// or call it with nothing/empty/null/undefined to follow the remote prerender path
// (useful for analytics)
done();
done("");
done(null);
done(undefined);
});
Prevent paths from being prerendered. Takes a function that returns an array. It is executed before the shouldPrerender option.
The primary use case is for CDN edge node clients (CloudFront Lambda@Edge) because they don't have the ability to quickly read the origin (AWS S3) filesystem, so they have to hard-code paths that shouldn't be prerendered.
Paths you may not want prerendered are non-SPA, large pages, or pages with JavaScript that can't rehydrate prerendered DOMs.
Trailing *
works as wildcard. Only works when at the end.
const prerendercloud = require("prerendercloud");
prerendercloud.set("blacklistPaths", (req) => [
"/google-domain-verification",
"/google-domain-verification.html",
"/google-domain-verification/",
"/image-gallery/*",
]);
Limit which URLs can trigger a pre-render request to the server.
Takes a function that returns an array of strings or regexes. It is executed before the shouldPrerender option. Passing an empty array or string will do nothing (noop).
Using this option will prevent bots/scrapers from hitting random URLs and increasing your billing. Recommended for Node.js server and Lambda@Edge (can be used with our without blacklist - blacklist takes precedent).
Even better if used with whitelistQueryParams
and/or removeTrailingSlash
.
const prerendercloud = require("prerendercloud");
prerendercloud.set("whitelistPaths", req => [
"/docs",
"/docs/"
/\/users\/\d{1,6}\/profile$/, // without the ending $, this is equivalent to startsWith
/\/users\/\d{1,6}\/profile\/?$/, // note the optional ending slash (\/?) and $
"/google-domain-verification.html",
"/google-domain-verification/",
]);
This is executed after the beforeRender
but if present, replaces userAgent detection (it would override botsOnly
).
const prerendercloud = require("prerendercloud");
prerendercloud.set("shouldPrerender", (req) => {
return req.headers["user-agent"] === "googlebot" && someStateOnMyServer();
// return bool
});
Runs in addition to the default user-agent check. Useful if you have your own conditions.
// time delay
const waitUntil = new Date() + 10000;
prerendercloud.set("shouldPrerenderAdditionalCheck", (req) => {
return new Date() > waitUntil;
});
// enable flag
let isEnabled = false;
prerendercloud.set("shouldPrerenderAdditionalCheck", (req) => {
return isEnabled;
});
The servers behind service.headless-render-api.com will cache for 5 minutes as a best practice. Adding the Prerender-Disable-Cache
HTTP header via this config option disables that cache entirely. Disabling the service.headless-render-api.com cache is only recommended if you have your own cache either in this middleware or your client, otherwise all of your requests are going to be slow.
const prerendercloud = require("prerendercloud");
prerendercloud.set("disableServerCache", true);
app.use(prerendercloud);
This middleware has a built-in LRU (drops least recently used) caching layer. It can be configured to let cache auto expire or you can manually remove entire domains from the cache. You proboably want to use this if you disabled the server cache.
const prerendercloud = require("prerendercloud");
prerendercloud.set("enableMiddlewareCache", true);
// optionally set max bytes (defaults to 500MB)
prerendercloud.set("middlewareCacheMaxBytes", 1000000000); // 1GB
// optionally set max age (defaults to forever - implying you should manually clear it)
prerendercloud.set("middlewareCacheMaxAge", 1000 * 60 * 60); // 1 hour
app.use(prerendercloud);
// delete every page on the example.org domain
prerendercloud.cache.clear("http://example.org");
// delete every page on every domain
prerendercloud.cache.reset();
These options map to the HTTP header options listed here: https://headless-render-api.com/docs/api
This option disables an enabled-by-default 5-minute cache.
The servers behind service.headless-render-api.com will cache for 5 minutes as a best practice. Adding the Prerender-Disable-Cache
HTTP header via this config option disables that cache entirely. Disabling the service.headless-render-api.com cache is only recommended if you have your own cache either in this middleware or your client, otherwise all of your requests are going to be slow.
const prerendercloud = require("prerendercloud");
prerendercloud.set("disableServerCache", true);
app.use(prerendercloud);
This option configures the duration for Headless-Render-API.com's server cache:
The servers behind service.headless-render-api.com will cache for 5 minutes as a best practice, configure that duration (in seconds):
const prerendercloud = require("prerendercloud");
// max value: 2592000 (1 month)
prerendercloud.set("serverCacheDurationSeconds", (req) => 300);
app.use(prerendercloud);
This option tells the server to only prerender the <title>
and <meta>
tags in the <head>
section. The returned HTML payload will otherwise be unmodified.
Example use case 1: your single-page app does not rehydrate the body/div cleanly but you still want open graph (link previews) to work.
Example use case 2: you don't care about the benefits of server-side rendering but still want open graph (link previews) to work.
const prerendercloud = require("prerendercloud");
prerendercloud.set("metaOnly", (req) =>
req.url === "/long-page-insuitable-for-full-prerender" ? true : false
);
app.use(prerendercloud);
This option tells the server to follow a redirect.
By default, if your origin server returns 301/302, Headless-Render-API.com will just return that outright - which is appropriate for the common use case of proxying traffic since it informs a bot that a URL has changed.
const prerendercloud = require("prerendercloud");
prerendercloud.set("followRedirects", (req) => true);
app.use(prerendercloud);
You can disable this if you're using CORS. Read more https://headless-render-api.com/docs and https://github.com/sanfrancesco/prerendercloud-ajaxmonkeypatch
const prerendercloud = require("prerendercloud");
prerendercloud.set("disableAjaxBypass", true);
app.use(prerendercloud);
This prevents screen flicker/repaint/flashing, but increases initial page load size (because it embeds the AJAX responses into your HTML). you can disable this if you manage your own "initial state". Read more https://headless-render-api.com/docs and https://github.com/sanfrancesco/prerendercloud-ajaxmonkeypatch
const prerendercloud = require("prerendercloud");
prerendercloud.set("disableAjaxPreload", true);
app.use(prerendercloud);
Removes a JavaScript monkeypatch from the prerendered page that is intended to prevent duplicate meta/title/script/style tags. Some libs/frameworks detect existing meta/title/style and don't need this, but in our experience this is still a worthwhile default. Read more https://github.com/sanfrancesco/prerendercloud-ajaxmonkeypatch#head-dedupe
const prerendercloud = require("prerendercloud");
prerendercloud.set("disableHeadDedupe", true);
app.use(prerendercloud);
The only valid values (right now) are: ['Prerendercloud-Is-Mobile-Viewer']
, and anything starting with prerendercloud-
. This feature is meant for forwarding headers from the original request to your site through to your origin (by default, all headers are dropped).
prerendercloud.set("originHeaderWhitelist", [
"Prerendercloud-Is-Mobile-Viewer",
]);
This removes all script tags except for application/ld+json. Removing script tags prevents any JS from executing at all - so your app will no longer be isomorphic. Useful when Headless-Render-API.com is used as a scraper/crawler or in constrained environments (Lambda @ Edge).
const prerendercloud = require("prerendercloud");
prerendercloud.set("removeScriptTags", true);
This is the opposite of what is often referred to "strict mode routing". When this is enabled, the server will normalize the URLs by removing a trailing slash.
e.g.: example.com/docs/ -> example.com/docs
The use case for this option is to achieve higher cache hit rate (so if a user/bots are hitting /docs/
and /docs
, they'll both be cached on Headless-Render-API.com servers as the same entity).
SEO best practices:
- 301 redirect trailing slash URLs to non trailing slash before this middleware is called (and then don't bother removingTrailingSlash because it should never happen)
- or use link rel canonical in conjunction with this
const prerendercloud = require("prerendercloud");
prerendercloud.set("removeTrailingSlash", true);
Headless-Render-API.com will wait for all in-flight XHR/websockets requests to finish before rendering, but when critical XHR/websockets requests are sent after the page load event, Headless-Render-API.com may not wait long enough to see that it needs to wait for them. Common example use cases are sites hosted on IPFS, or sites that make an initial XHR request that returns endpoints that require additional XHR requests.
const prerendercloud = require("prerendercloud");
prerendercloud.set("waitExtraLong", true);
When a function is passed that returns true, Headless-Render-API.com will return both the prerendered HTML, meta, and links
const prerendercloud = require("prerendercloud");
prerendercloud.set("withMetadata", (req) => true);
To make use of the meta and links, call res.meta
or res.links
from either afterRender
or afterRenderBlock
When a function is passed that returns true, Headless-Render-API.com will return both the prerendered HTML and a JPEG screenshot.
const prerendercloud = require("prerendercloud");
prerendercloud.set("withScreenshot", (req) => true);
To make use of the screenshot, call res.screenshot
from either afterRender
or afterRenderBlock
Self explanatory
const prerendercloud = require("prerendercloud");
prerendercloud.set("deviceWidth", (req) =>
req.url.match(/shareable\-cards/) ? 800 : null
);
Self explanatory
const prerendercloud = require("prerendercloud");
prerendercloud.set("deviceHeight", (req) =>
req.url.match(/shareable\-cards/) ? 600 : null
);
Force the middleware to hit your origin with a certain host. This is useful for environments like Lambda@Edge+CloudFront where you can't infer the actual host.
const prerendercloud = require("prerendercloud");
prerendercloud.set("host", "example.com");
Force the middleware to hit your origin with a certain protocol (usually https
). This is useful when you're using CloudFlare or any other https proxy that hits your origin at http but you also have a redirect to https.
const prerendercloud = require("prerendercloud");
prerendercloud.set("protocol", "https");
Whitelist query string parameters on each request.
The use case for this option is to achieve higher cache hit rate (so if a user/bots are hitting docs?source=other
or /docs
or docs?source=another&foo=bar
, they'll all be cached on Headless-Render-API.com servers as the same entity).
null
(the default), preserve all query params[]
empty whitelist means drop all query params['page', 'x', 'y']
only accept page, x, and y params (drop everything else)
const prerendercloud = require("prerendercloud");
// e.g., the default: example.com/docs?source=other&page=2 -> example.com/docs?source=other&page=2
prerendercloud.set("whitelistQueryParams", (req) => null);
// e.g., if you whitelist only `page` query param: example.com/docs?source=other&page=2 -> example.com/docs?page=2
prerendercloud.set("whitelistQueryParams", (req) =>
req.path.startsWith("/docs") ? ["page"] : []
);
// e.g., if your whitelist is empty array: example.com/docs?source=other&page=2 -> example.com/docs
prerendercloud.set("whitelistQueryParams", (req) => []);
Same thing as afterRender
, except it blocks. This is useful for mutating the response headers or body.
Since it blocks, you have to call the next
callback when done.
Example use case: use with the withMetadata
and/or withScreenshot
option to save metadata or the screenshot to disk and add it as an open graph tag.
const prerendercloud = require("prerendercloud");
prerendercloud.set("afterRenderBlocking", (err, req, res, next) => {
// req: (standard node.js req object)
// res: { statusCode, headers, body, screenshot, meta, links }
console.log({ meta: res.meta, links: res.links });
if (res.screenshot) {
fs.writeFileSync("og.jpg", res.screenshot);
res.body = res.body.replace(
/\<\/head\>/,
"<meta property='og:image' content='/og.jpg' /></head>"
);
}
next();
});
It's a noop because this middleware already takes over the response for your HTTP server. 2 example use cases of this: your own caching layer, or analytics/metrics.
const prerendercloud = require("prerendercloud");
prerendercloud.set("afterRender", (err, req, res) => {
// req: (standard node.js req object)
// res: { statusCode, headers, body }
console.log(`received ${res.body.length} bytes for ${req.url}`);
});
(note: 400 errors are always bubbled up, 429 rate limit errors are never bubbled up. This section is for 5xx errors which are usually either timeouts or Headless-Render-API.com server issues)
This must be enabled if you want your webserver to show a 500 when Headless-Render-API.com throws a 5xx (retriable error). As mentioned in the previous section, by default, 5xx errors are ignored and non-prerendered content is returned so the user is uninterrupted.
Bubbling up the 5xx error is useful if you're using a crawler to trigger prerenders and you want control over retries.
It can take a bool or a function(err, req, res) that returns a bool. The sync function is executed before writing to res
, or calling next
(dependending on what bool is returned). It's useful when:
- you want to bubble up errors only for certain errors, user-agents, IPs, etc...
- or you want to store the errors (analytics)
const prerendercloud = require("prerendercloud");
prerendercloud.set("bubbleUp5xxErrors", true);
const prerendercloud = require("prerendercloud");
prerendercloud.set("bubbleUp5xxErrors", (err, req, res) => {
// err object comes from https://github.com/sindresorhus/got lib
// examples:
// 1. if (err.statusCode === 503) return true;
// 2. if (req.headers['user-agent'] === 'googlebot') return true;
// 3. if (res.body && res.body.match(/timeout/)) return true;
// 4. myDatabase.query('insert into errors(msg) values($1)', [err.message])
// 5. Raven.captureException(err, { req, resBody: res.body })
return false;
});
HTTP errors 500, 503, 504 and network errors are retriable. The default is 1 retry (2 total attempts) but you can change that to 0 or whatever here. There is exponential back-off. When Headless-Render-API.com is over capacity it will return 503 until the autoscaler boots up more capacity so this will address those service interruptions appropriately.
const prerendercloud = require("prerendercloud");
prerendercloud.set("retries", 4);
If a request fails due to a retryable error (500, 503, 504) - typically a timeout, then this option will prevent pre-rendering that page for 5 minutes.
It's useful if some of of your pages have an issue causing a timeout, so at least the non-prerendered content will be returned most of the time.
Use this option with a function for bubbleUp5xxErrors
so you can record the error in your error tracker so you can eventually fix it.
Note, if you're using this with bubbleUp5xxErrors
function that returns true (or a bool value of true), then a 503 error will be bubbled up.
const prerendercloud = require("prerendercloud");
prerendercloud.set("throttleOnFail", true);
- when used as middleware
- when Headless-Render-API.com service returns
- 400 client error (bad request)
- e.g. try to prerender a localhost URL as opposed to a publicly accessible URL
- the client itself returns the 400 error (the web page will not be accessible)
- 429 client error (rate limited)
- the original server payload (not prerendered) is returned, so the request is not interrupted due to unpaid bills or free accounts
- only happens while on the free tier (paid subscriptions are not rate limited)
- the error message is written to STDERR
- if the env var: DEBUG=prerendercloud is set, the error is also written to STDOUT
- 500, 503, 504 (and network errors)
- these will be retried, by default, 1 time
- you can disable retries with
.set('retries', 0)
- you can increase retries with
.set('retries', 5)
(or whatever) - 502 is not retried - it means your origin returned 5xx
- 5xx (server error)
- when even the retries fail, the original server payload (not prerendered) is returned, so the request is not interrupted due to server error
- the error message is written to STDERR
- if the env var: DEBUG=prerendercloud is set, the error is also written to STDOUT
- 400 client error (bad request)
- when Headless-Render-API.com service returns
- when used for screenshots/pdfs
- retriable errors are retried (500, 503, 504 and network errors)
- the errors are returned in the promise catch API
- the errors are from the
got
library- see URL
.catch(err => console.log(err.url))
- see status code
.catch(err => console.log(err.response.statusCode))
- see err response body
.catch(err => console.log(err.response.body))
- see URL