Marreta is a tool for analyzing URLs and accessing web content without hassle.
- Automatically cleans and fixes URLs
- Removes annoying tracking parameters
- Forces HTTPS to keep everything secure
- Changes user agent to avoid blocks
- Smart DNS
- Keeps HTML clean and optimized
- Fixes relative URLs automatically
- Allows custom styles
- Removes unwanted elements
- Cache, cache!
- Blocks domains you don't want
- Allows custom headers and cookies configuration
- Everything with SSL/TLS
- PHP-FPM
- OPcache enabled
You only need:
- Docker and docker compose
curl -o ./docker-compose.yml https://raw.githubusercontent.com/manualdousuario/marreta/main/docker-compose.yml
If needed
nano docker-compose.yml
services:
marreta:
container_name: marreta
image: ghcr.io/manualdousuario/marreta:latest
ports:
- "80:80"
environment:
- SITE_NAME=
- SITE_DESCRIPTION=
- SITE_URL=
- DNS_SERVERS=
- SELENIUM_HOST=
SITE_NAME
: Your Marreta's nameSITE_DESCRIPTION
: Tell what it's forSITE_URL
: Where it will run, full address withhttps://
. If you change the port in docker-compose (e.g., 8080:80), you must also include the port in SITE_URL (e.g., https://yoursite:8080)DNS_SERVERS
: Which DNS servers to use1.1.1.1, 8.8.8.8
SELENIUM_HOST
: Selenium host server:PORT (e.g., selenium-hub:4444)
Now you can run docker compose up -d
- First, clone the project:
git clone https://github.com/manualdousuario/marreta/
cd marreta
- Create the configuration file:
cp app/.env.sample app/.env
- Configure it your way in
app/.env
:
SITE_NAME="Marreta"
SITE_DESCRIPTION="Paywall hammer!"
SITE_URL=http://localhost
DNS_SERVERS=1.1.1.1, 8.8.8.8
DEBUG=true
SELENIUM_HOST=selenium-hub:4444
LANGUAGE=pt-br
- Run everything:
docker-compose up -d
Done! It will be running at http://localhost
🎉
The DEBUG
option when true
will not generate cache!
The configurations are organized in data/
:
domain_rules.php
: Site-specific rulesglobal_rules.php
: Rules that apply to all sitesblocked_domains.php
: List of blocked sitesuser_agents.php
: User Agents configurations
/languages/
: Each language is in its ISO id (pt-br, en, es or de-de
) and can be defined in theLANGUAGE
environment
Cache storage support in S3. Configure the following variables in your .env
:
S3_CACHE_ENABLED=true
S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_REGION=us-east-1
S3_FOLDER_=cache/
S3_ACL=private
S3_ENDPOINT=
Possible configurations:
## R2
S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_ENDPOINT=https://{TOKEN}.r2.cloudflarestorage.com
S3_REGION=auto
S3_FOLDER_=cache/
S3_ACL=private
## DigitalOcean
S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_ENDPOINT=https://{REGION}.digitaloceanspaces.com
S3_REGION=auto
S3_FOLDER_=cache/
S3_ACL=private
Selenium integration for processing websites that require javascript or have more advanced protection barriers. To use this functionality, you need to set up a Selenium environment with Firefox. Add the following configuration to your docker-compose.yml
:
services:
selenium-firefox:
container_name: selenium-firefox
image: selenium/node-firefox:4.27.0-20241204
shm_size: 2gb
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_ENABLE_TRACING=false
- SE_NODE_MAX_SESSIONS=10
- SE_NODE_OVERRIDE_MAX_SESSIONS=true
entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
depends_on:
- selenium-hub
selenium-hub:
image: selenium/hub:4.27.0-20241204
container_name: selenium-hub
environment:
- SE_ENABLE_TRACING=false
- GRID_MAX_SESSION=10
- GRID_BROWSER_TIMEOUT=10
- GRID_TIMEOUT=10
ports:
- 4442:4442
- 4443:4443
- 4444:4444
Important settings:
shm_size
: Sets the shared memory size for Firefox (2GB recommended)SE_NODE_MAX_SESSIONS
: Maximum number of concurrent sessions per nodeGRID_MAX_SESSION
: Maximum number of concurrent sessions in the hubGRID_BROWSER_TIMEOUT
andGRID_TIMEOUT
: Timeouts in seconds
After setting up Selenium, make sure to set the SELENIUM_HOST
variable in your environment to point to the Selenium hub (typically selenium-hub:4444
).
Marreta uses Hawk.so, an open-source error monitoring platform. To configure monitoring, add the following variables to your .env
or docker:
HAWK_TOKEN=your_token
You can host your own Hawk.so instance or use the hosted service at hawk.so. The source code is available at github.com/codex-team/hawk.
See what's happening:
docker-compose logs app
When you need to clear:
docker-compose exec app rm -rf /app/cache/*
Made with ❤️! If you have questions or suggestions, open an issue and we'll help! 😉
Thanks to the project https://github.com/burlesco/burlesco which served as the basis for several rules!
Public instance at marreta.pcdomanual.com!