Node-Fetch is a popular library that brings the Fetch API to Node.js. With it, you can connect to pages, send post data, and request contents, which makes it a suitable tool for many tasks, including Node.js web scraping. In addition, Node.js includes default support for the fetch function since v18.
But there’s a problem. Out of the box, there’s no node-fetch proxy option. Therefore, you can get blocked quickly when using only your own IP address.
There are two main ways to use a node-fetch proxy. You can use a code library that creates a custom user agent, or you can use a reverse proxy.
One of the options to implement a node fetch proxy is to use a code library for a custom user agent. A popular library to implement this is https-proxy-agent by Nathan Rajlich
In terms of implementation, it’s quite simple. You just need to install node-fetch (you don’t need it if you are running NodeJS v18 or higher) and https-proxy-agent
You can check the current version of Node with the node -v command
node -vA good way to install nodejs is by using NVM. Once NVM is installed use the following command to install a specific nodejs version (make sure its higher than 18 to run the code in this guide)
nvm install 22.20 Run the npm init command to create a new Node.js project.
npm initAlternatively, you can use the following package.json file
{
"name": "nodejs-fetch-with-proxy",
"type": "module",
"version": "1.0.0",
"description": "Use nodejs fetch to make requests with a proxy",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"https-proxy-agent": "^7.0.6",
"node-fetch": "^3.3.2"
}
}
Finally, create a file called index.js and open it in your favorite code editor.
touch index.jsIf you used the provided package.json file, you can install dependencies by running the following command
npm installOtherwise, you can install each dependency manually
npm install node-fetch
npm install https-proxy-agentThen, in your index.js file, use the HttpsProxyAgent as the agent parameter of your fetch request. Like this:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
(async () => {
const proxyAgent = new HttpsProxyAgent('http://geo.iproyal.com:12321');
const scrape = await fetch('https://ipv4.icanhazip.com/', { agent: proxyAgent });
const html = await scrape.text();
console.log(html);
})();Notice that this code doesn't use any proxy server authentication. So if you want to use it this way, you need to whitelist your own IP address.
You can pass the authentication details to HttpsProxyAgent using a few methods. The simplest one is using the username and password as plain text in your request, like this:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
(async () => {
const proxyData = new HttpsProxyAgent('https://username:password@geo.iproyal.com:12321');
const scrape = await fetch('https://ipv4.icanhazip.com/', { agent: proxyData });
const html = await scrape.text();
console.log(html);
})();Other options for node-fetch proxy authentication with https-proxy-agent also exist, such as using auth or custom headers.
A working example can be found here
Although web scraping is legal and very useful, many sites try to block it. They use a few tools to do it, but two of the biggest points are the IP address and the headers.
They check if a single IP address is performing a large number of requests or if they are done around the same time. This is a telling sign that it's a bot and not a real user.
Regarding the headers, they check if the request metadata looks like a request made from a real browser. If you don't use any parameters, a bot will just request the URL without sending any information.
However, a real user sends a lot of data about themselves, such as browser, version, language and more. Therefore, requests without this metadata are quite suspicious.
You can fix the IP detection by using IPRoyal's residential proxy service. With it, you connect using different IP addresses from real residential users around the world. Website owners won't be able to tell that two different requests with different IP addresses are from the same user, so you'll be left alone.
In addition to authenticated requests, you can whitelist an IP address if you want. This feature allows you to use your proxy server without sending a username and password.
To improve your scraping efforts, follow these best web scraping practices:
- Rotate IPs regularly. Use services that offer multiple residential IPs to reduce the chance of bans.
- Handle CAPTCHAs. Use headless browsers or CAPTCHA-solving services.
- Implement retry logic. If a request fails, set up a retry logic that waits for a bit and then tries again.
- Respect robots.txt. It reduces your risk of getting blocked or blacklisted.
When using a proxy, it’s important to use the right authentication method depending on your proxy server setup. Here’s an overview for basic, digest, and SOCKS5 authentication with credentials in a node-fetch module context.
const { HttpsProxyAgent } = require('https-proxy-agent');
(async () => {
const proxyAgent = new HttpsProxyAgent('http://username:password@proxy.example.com:8080');
const response = await fetch('https://ipv4.icanhazip.com/', { agent: proxyAgent });
const html = await response.text();
console.log(html);
})();A working example can be found here
Digest needs extra handling. You can use digest-fetch to help
const DigestFetch = require('digest-fetch');
const { HttpsProxyAgent } = require('https-proxy-agent');
const client = new DigestFetch('username', 'password');
(async () => {
const proxyAgent = new HttpsProxyAgent('http://proxy.example.com:8080'); // optional proxy
const response = await client.fetch('https://protected.example.com', { agent: proxyAgent });
const data = await response.text();
console.log(data);
})();A working example can be found here
To use SOCKS5, you can use socks-proxy-agent:
const { SocksProxyAgent } = require('socks-proxy-agent');
(async () => {
const proxyAgent = new SocksProxyAgent('socks5://username:password@proxy.example.com:1080');
const response = await fetch('https://ipv4.icanhazip.com/', { agent: proxyAgent });
const html = await response.text();
console.log(html);
})();A working example can be found here
You can set user agents via the options argument in your node fetch proxy call. So instead of the simple fetch request:
const scrape = await fetch('https://ipv4.icanhazip.com/');You can do it using the options argument after the URL:
const scrape = await fetch('https://ipv4.icanhazip.com/', { headers: { /** request headers here **/ }});Therefore, in addition to user agents, you can use other header arguments. Here is a code sample using node-fetch proxy and custom user agents at the same time:
import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';
(async () => {
const proxyData = new HttpsProxyAgent('https://username:password@geo.iproyal.com:12321');
const options = {
agent: proxyData,
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_0_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15',
}
};
const scrape = await fetch('https://ipv4.icanhazip.com/', options);
const html = await scrape.text();
console.log(html);
})();A working example can be found here
In this case, we are using just the user-agent, but you can pass any headers you want.
Now you have a better understanding of how to use node fetch proxies to scrape websites without getting blocked. In addition, you saw how you could use custom headers to make your scraping functions even better.
Now you can connect your scraper with a parser and extract data from any site you want.
This happens when Node doesn't find the fetch function. If you are running Node.js under v18, you will need to install node fetch or other similar module.
If that's the case, make sure that you have installed node fetch and that you have included in your script using:
const fetch = require('node-fetch');If you are facing timeout issues or if you want to add a timeout option, you can use a timoutpromise, or use a library such as hpagent to manually control timeouts.
Before installing node-fetch make sure that npm itself is up to date. Try something like this:
npm install -g npm
npm cache clean
npm updateAnother option is to install it locally, instead of globally:
npm i node-fetchAnd you could try this:
import fetch from 'node-fetch';Instead of using require. Or you can try to load it like this:
const fetch = (...args) => import('node-fetch').then(({default: fetch}) => fetch(...args));Fetch in Node.js is a library that works just like the Fetch API for browsers. The difference is that the API is only available on the client side, while Node.js Fetch is available on the backend.
window.fetch(), also known as fetch, is a client-side function. Therefore, you can run it from your browser. Node-fetch is a backend library available in Node.js. You can run it programmatically from your Node.js server.