Cheerio Tree is a powerful utility built on Cheerio, designed for efficient DOM parsing. It enables rapid conversion of HTML data into JSON format. When paired with YAML, it provides an intuitive and streamlined approach to data handling and transformation.
npm run dev
# or
yarn dev
# or
pnpm dev
Now, Try Your First Api Scraper:
Localhost:
https://www.proxysites.ai/category
Online:
https://www.proxysites.ai/category
For example: data/wordpressCom/tags.yml
Please use camelCase for folder and file naming.
After saving the YAML file, it will be automatically converted to JSON in the development environment
and saved as app/lib/cheerio-tree/wordpressCom-tags.ts.
Make sure to configure the parsing settings in the predetermined format to avoid issues with file generation.
# data/wordpressCom/tags.yml
regexToI: ®exToI
regex: '[^\d]'
replace:
regexToF: ®exToF
regex: '[^\d\.]'
replace:
regexToK: ®exToK
regex: 'K'
replace: "000"
regexToM: ®exToM
regex: 'M'
replace: "000000"
# string to int
# eg. 1.1K will be 1100
toI: &toI
- <<: *regexToK
- <<: *regexToM
- <<: *regexToI
addHost: &addHost
regex: '^(.*)$'
replace: https://wordpress.com$1
# Main
# ==================================================
# ==================================================
tree:
# URL to match
url:
match: https://wordpress.com/tags
nodes:
trending:
wrapper:
list: true
selector: div.trending-tags__container .trending-tags__column
normal:
tag:
selector: a .trending-tags__title
link:
selector: a
attr: href
after_regular:
- <<: *addHost
count:
to_i:
selector: .trending-tags__count
after_regular: *toI
npm run build
# or
# pnpm build
git add dist && git commit -m "build"
Create your test at tests
pnpm test
or npm run test
You can deploy this project to Vercel with the following button:
# Config Your Api Key
SECRET_API_KEY=your_api_key
# You can find at https://www.proxysites.ai/
HTTP_PROXY=
You can use your API with two authentication methods: URL parameter and Header parameter. Here is how to use these two methods for authentication in detail.
Add the token parameter to the request URL and set your API key as the parameter value. For example, if your API endpoint is http://localhost:3000/api/v1/resource and your API key is your_api_key, you can call the API like this:
curl "http://localhost:3000/api/v1/resource?token=your_api_key"
Add X-Api-Key to the request header and set your API key as the value. You can use the curl command to send a request with a custom header:
curl -H "X-Api-Key: your_api_key" "http://localhost:3000/api/v1/resource"
Suppose you have an API endpoint https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type. You can authenticate using the following two methods:
API Key: expressapikey
curl "https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type&token=expressapikey"
curl -H "X-Api-Key: expressapikey" "https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type"
URL Param Authentication: Add the token
parameter to the request URL.
Header Authentication: Add X-Api-Key
to the request header.
Choose the appropriate authentication method based on your needs and use case. Generally, using header authentication is more secure as it does not expose the key in the URL.
You can call the API in your code as follows:
function urlEncode(url) {
return encodeURIComponent(url);
}
const encodedUrl = urlEncode('https://www.proxysites.ai/category/proxy-type');
const apiKey = 'expressapikey';
// Using URL Param
fetch(`https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=${urlEncode}&token=${apiKey}`)
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
// Using Header
fetch(`https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=${urlEncode}`, {
headers: {
'X-Api-Key': apiKey
}
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
import requests
import urllib.parse
def url_encode(url):
return urllib.parse.quote(url, safe='')
encoded_url = url_encode('https://www.proxysites.ai/category/proxy-type')
api_key = 'expressapikey'
# Using URL Param
url_param_response = requests.get(
'https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category',
params={'url': encoded_url, 'token': api_key}
)
print("URL Param Response:")
print(url_param_response.json())
# Using Header
headers = {
'X-Api-Key': api_key
}
header_response = requests.get(
'https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category',
headers=headers,
params={'url': encoded_url}
)
print("Header Response:")
print(header_response.json())