Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stealth mode example #66

Merged
merged 3 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@

You will need your [API Key](https://dev.agentql.com/) and the [AgentQL SDK](https://docs.agentql.com/installation/sdk-installation). You can get set up in less than five minutes with the [AgentQL Quick Start](https://docs.agentql.com/quick-start).

You may want to [set up a python virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) before you begin. Then you can [install Agentql in the virtual env](https://pypi.org/project/agentql/).
## Virtual Environment

This project uses [Poetry](https://python-poetry.org/docs/) for dependency and virtual environment management.
You don't have to use Poetry to run the examples, but it will make it easier to manage dependencies and isolate the project environment.
If you choose to use Poetry follow these simple steps to get everything setup:

- **Install Poetry**. Follow [Poetry official guidelines here](https://python-poetry.org/docs/#installing-with-the-official-installer)
- **Install dependencies**. Run `poetry install` in the project root directory
- **Activate the virtual environment**. Run `poetry shell` to activate the virtual environment

## Examples

Expand Down
21 changes: 21 additions & 0 deletions examples/stealth_mode/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Stealth mode: Running AgentQL in stealth mode and avoiding bot detection

This example demonstrates how to lower the risk of being detected by a anti-bot system by running AgentQL in stealth mode.

There are several techniques we use in this script to avoid detection:

- randomize various HTTP headers browser sends to the server. This includes `User-Agent`, `Accept-Language`, `Referer`, etc. This helps with consecutive requests looking more like they are coming from different users.
- randomize browser window size. This is important because some websites track the window size and if it's the same for all requests, it's a sign of a bot.
- randomize timezone and geolocation. This is important because some websites track the timezone and geolocation and if it's the same for all requests, it's a sign of a bot.
- (Optional) use a proxy server. You would need to get a Proxy configuration (host, username, password) separately from an external proxy provider (e.g. [NetNut](https://netnut.io), [BrightData](https://brightdata.com/) or similar)

## Run the script

- [Install AgentQL SDK](https://docs.agentql.com/installation/sdk-installation)
- If you already have SDK installed, make sure to update to the latest version: `pip3 install agentql --upgrade`
- Save this python file locally as **stealth_mode.py**
- Run the following command from the project's folder:

```bash
python3 stealth_mode.py
```
99 changes: 99 additions & 0 deletions examples/stealth_mode/stealth_mode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import asyncio
import logging
import random

import agentql
from playwright.async_api import Geolocation, ProxySettings, async_playwright

logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger(__name__)

BROWSER_IGNORED_ARGS = [
"--enable-automation",
"--disable-extensions",
]
BROWSER_ARGS = [
"--disable-xss-auditor",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
"--disable-infobars",
]

USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4.1 Safari/605.1.15",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:130.0) Gecko/20100101 Firefox/130.0",
]


LOCATIONS = [
("America/New_York", Geolocation(longitude=-74.006, latitude=40.7128)), # New York, NY
("America/Chicago", Geolocation(longitude=-87.6298, latitude=41.8781)), # Chicago, IL
("America/Los_Angeles", Geolocation(longitude=-118.2437, latitude=34.0522)), # Los Angeles, CA
("America/Denver", Geolocation(longitude=-104.9903, latitude=39.7392)), # Denver, CO
("America/Phoenix", Geolocation(longitude=-112.0740, latitude=33.4484)), # Phoenix, AZ
("America/Anchorage", Geolocation(longitude=-149.9003, latitude=61.2181)), # Anchorage, AK
("America/Detroit", Geolocation(longitude=-83.0458, latitude=42.3314)), # Detroit, MI
("America/Indianapolis", Geolocation(longitude=-86.1581, latitude=39.7684)), # Indianapolis, IN
("America/Boise", Geolocation(longitude=-116.2023, latitude=43.6150)), # Boise, ID
("America/Juneau", Geolocation(longitude=-134.4197, latitude=58.3019)), # Juneau, AK
]

REFERERS = ["https://www.google.com", "https://www.bing.com", "https://duckduckgo.com"]

ACCEPT_LANGUAGES = ["en-US,en;q=0.9", "en-GB,en;q=0.9", "fr-FR,fr;q=0.9"]
PROXIES: list[ProxySettings] = [
# TODO: replace with your own proxies
# {
# "server": "http://ip_server:port",
# "username": "proxy_username",
# "password": "proxy_password",
# },
]


async def main():
user_agent = random.choice(USER_AGENTS)
header_dnt = random.choice(["0", "1"])
location = random.choice(LOCATIONS)
referer = random.choice(REFERERS)
accept_language = random.choice(ACCEPT_LANGUAGES)
proxy: ProxySettings | None = random.choice(PROXIES) if PROXIES else None

async with async_playwright() as playwright, await playwright.chromium.launch(
headless=False,
args=BROWSER_ARGS,
ignore_default_args=BROWSER_IGNORED_ARGS,
) as browser:
context = await browser.new_context(
proxy=proxy,
locale="en-US,en,ru",
timezone_id=location[0],
extra_http_headers={
"Accept-Language": accept_language,
"Referer": referer,
"DNT": header_dnt,
"Connection": "keep-alive",
"Accept-Encoding": "gzip, deflate, br",
},
geolocation=location[1],
user_agent=user_agent,
permissions=["notifications"],
viewport={
"width": 1920 + random.randint(-50, 50),
"height": 1080 + random.randint(-50, 50),
},
)

page = await agentql.wrap_async(context.new_page())

await page.enable_stealth_mode(nav_user_agent=user_agent)

await page.goto("https://bot.sannysoft.com/", referer=referer)
await page.wait_for_timeout(30000)


if __name__ == "__main__":
asyncio.run(main())
Loading
Loading