Skip to content

Latest commit

 

History

History
428 lines (382 loc) · 12.3 KB

README.md

File metadata and controls

428 lines (382 loc) · 12.3 KB

Random HTTP Header Generator

Description

The random-header-generator package can be used to generate random, yet realistic, http request headers.

It is inspired by the excellent ua-parser, apify, and fake-http-header repositories, and it emulates real browser behaviour for the following header fields:

  • User-Agent
  • Referer
  • Accept
  • Accept-Language
  • Accept-Encoding
  • Sec-Fetch-Site
  • Sec-Fetch-Mode
  • Sec-Fetch-User
  • Sec-Fetch-Dest
  • Upgrade-Insecure-Requests
  • Connection

In addition, it supports the following Client Hints:

  • Sec-CH-UA
  • Sec-CH-UA-Arch
  • Sec-CH-UA-Bitness
  • Sec-CH-UA-Full-Version-List
  • Sec-CH-UA-Mobile
  • Sec-CH-UA-Model
  • Sec-CH-UA-Platform
  • Sec-CH-UA-Platform-Version

The generated headers conform to http-version specific ordering and support rules, and are browser-, version-, and country- specific.

In particular, the available headers cover the following:

  • Browsers: Chrome, Edge, Firefox, Safari, Opera
  • Device types: Desktop, Mobile
  • HTTP Versions: 1.x, 2.0
  • Countries: According to the alpha-2 ISO codes of the following table:
Supported alpha-2 ISO codes.
ad ae af ag al am ao ar as at au az ba bb bd
be bf bg bh bi bj bo br bs bt bw by bz ca cd
ch cl cm cn co cr cu cy cz de dj dk dm do dz
ec ee eg er es et fi fj fr ga gb gd ge gh gm
gn gq gr gt gw gy hn hr ht hu id ie il in iq
ir is it jm jo jp ke kg kh ki km kn kw kz lb
lc li lk lr ls lt lu lv ly ma mc md me mg mh
ml mn mr mt mu mw mx my mz ne ng ni nl no np
nr nu nz om pa pe pg ph pk pl ps pt py qa ro
rs ru rw sa sb sc sd se sg si sk sl sm sn so
sr ss sv sz td tg th tj tm tn to tr tt tv tw
tz ua ug us uy uz vc ve vn vu ye za zm zw

If any of the above inputs are not supplied by the user, they will be populated in line with browser and device market share data, as well as internet usage per country data (see data_notes.md for a complete list of data sources). In addition, for the headers that support relative quality factors, the latter have 50% chance of being included in the headers' values.

User agents can be generated in one of the following ways:

  • programmatically generated utilising most of the templates of the ua-generator repo, with additional modifications and extensions,
  • scraped from the user agent string website,
  • parsed from a user-provided .txt file.

Installation

The package can be easily installed via pip:

pip install random-header-generator

Usage

The generation of headers is very straight-forward, and can be performed in a variety of ways. The generator can be instantiated with one of the following:

from random_header_generator import HeaderGenerator

# Approach 1
generator = HeaderGenerator() # defaults to user_agents = 'program'

# Approach 2
generator = HeaderGenerator(user_agents = 'program')

# Approach 3
generator = HeaderGenerator(user_agents = 'scrape')

# Approach 4
generator = HeaderGenerator(user_agents = 'file', filename = 'path/to/agents/file.txt')
  • Methods 1-2 are equivalent and indicate that the user agents will be generated programmatically using built-in templates.
  • Method 3 indicates that the latest user agents will be scraped from https://www.useragentstring.com/
  • Method 4 indicates that the user agents will be read from the .txt file whose path is provided in the filename argument.

Regarding Method 4, the user agent .txt file is assumed to contain a list of user agents, each one followed by a newline character as follows:

Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36
...

Having instantiated a generator with one of the approaches outline above, the headers can be generated with a variety of ways, specifying any combination of the following input arguments:

  • browser: A string with one of the following values: 'chrome', 'edge', 'firefox', 'safari', 'opera'
  • device : A string indicating the device type, with applicable value being 'desktop' or 'mobile'
  • http_version: An integer indicating if the headers correspond to HTTP version 1.x (1), or 2.0 (2)
  • country: A string containing a supported alpha-2 ISO code, defined in the table above.
  • cookies: A dictionary with keys and values being the cookie names and corresponding values. If not specified or when empty cookies = {}, a 'Cookie' header will not be included in the output.

Example 1

The simplest approach is to just call the constructor method for the header HeaderGenerator class without any parameters:

# generator has been instantiated using one of the 4 approaches defined above...

headers = generator() # returns an ordered dict
    
for k, v in headers.items():
  print(f'{k}: {v}')

headers is an ordered dict, whose keys are the HTTP header fields along with their corresponding values. The code snipped prints the following (your output will most likely differ due to randomization):

Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Linux; Android 7.1; Nexus 9; Build/N9F27G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.203 Mobile Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Sec-Fetch-Mode: navigate
Sec-Fetch-Dest: document
Sec-Fetch-Site: same-site
Sec-Fetch-User: ?1
Referer: http://www.acoon.de
Accept-Encoding: br, identity, *
Accept-Language: en-US,en-GB;q=0.8,de-DE;q=0.5,en;q=0.2

When no arguments are provided the country, device, and browser are generated via a weighted random selection according to usage/market data (see /data/notes.md), and the headers are valid for HTTP version 1.1.

Example 2

An example with all possible user inputs being specified is the following:

# generator has been instantiated using one of the 4 approaches defined above...

headers   = generator(
  country     = 'us', 
  device      = 'desktop', 
  browser     = 'chrome',
  httpVersion = 1,
  cookies     = {'cookie_ID_1': 'cookie_Value_1', 'cookie_ID_2': 'cookie_value_2'},
)
    
for k, v in headers.items():
  print(f'{k}: {v}')

with the corresponding output being:

Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.38 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Sec-Fetch-Mode: navigate
Sec-Fetch-Dest: document
Sec-Fetch-Site: same-site
Sec-Fetch-User: ?1
Referer: http://vector.us
Accept-Encoding: gzip, identity, deflate, compress, *
Accept-Language: es-US,en-US;q=0.8,en-GB;q=0.5,en;q=0.2
Cookie: cookie_ID_2=cookie_value_2; cookie_ID_1=cookie_Value_1
Sec-CH-UA: "Chromium";v="94", "Google Chrome";v="94", " Not A;Brand";v="99"
Sec-CH-UA-Arch: ""
Sec-CH-UA-Bitness: ""
Sec-CH-UA-Mobile: ?0
Sec-CH-UA-Model: "Macintosh"
Sec-CH-UA-Platform: "Mac OS"
Sec-CH-UA-Platform-Version: "11.6.1"

Example 3

An example with partial input specification is the following:

# generator has been instantiated using one of the 4 approaches defined above...

headers = generator(country = 'de', httpVersion = 2)
    
for k, v in headers.items():
  print(f'{k}: {v}')

with possible outputs being:

sec-ch-ua: "Chromium";v="104", "Google Chrome";v="104", " Not A;Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Android"
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Linux; Android 6; Nexus 9; Build/MMB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.132 Mobile Safari/537.36
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
sec-fetch-site: same-site
sec-fetch-mode: navigate
sec-fetch-user: ?1
sec-fetch-dest: document
referer: http://www.financiero.de
accept-encoding: gzip, compress
accept-language: en,de-DE;q=0.7,en-US;q=0.5,en-GB;q=0.3
sec-ch-ua-arch: ""
sec-ch-ua-bitness: ""
sec-ch-ua-full-version-list: "Chromium";v="104.0.5112.132", "Google Chrome";v="104.0.5112.132", " Not A;Brand";v="99.0.0.0"
sec-ch-ua-model: "Nexus 9"
sec-ch-ua-platform-version: "6"
connection: keep-alive

or

upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) CriOS/77.0.3865.181 Mobile/15E148 Safari/537.36
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
sec-fetch-site: same-site
sec-fetch-mode: navigate
sec-fetch-user: ?1
referer: https://www.google.com
accept-encoding: compress
accept-language: en-US,de-DE,en-GB,en
connection: keep-alive