Skip to content

Latest commit

 

History

History
229 lines (173 loc) · 7.18 KB

README.en.md

File metadata and controls

229 lines (173 loc) · 7.18 KB

🛠️ Marreta

en pt-br

Forks Stars Issues

Marreta is a tool that breaks access barriers and elements that hinder reading!

Before and after Marreta

Public instance at marreta.pcdomanual.com!

✨ What's cool about it?

  • Cleans and corrects URLs automatically
  • Removes annoying tracking parameters
  • Forces HTTPS to keep everything secure
  • Changes user agent to avoid blockages
  • Leaves the HTML clean and optimized
  • Fixes relative URLs on its own
  • Allows you to put your own styles and scripts
  • Removes unwanted elements
  • Cache, cache!
  • Blocks domains you don't want
  • Allows you to configure headers and cookies your way
  • PHP-FPM and OPcache

🐳 Installing with Docker

Install Docker and Docker Compose

curl -o ./docker-compose.yml https://raw.githubusercontent.com/manualdousuario/marreta/main/docker-compose.yml

Now modify it with your settings:

nano docker-compose.yml

services:
  marreta:
    container_name: marreta
    image: ghcr.io/manualdousuario/marreta:latest
    ports:
      - "80:80"
    environment:
      - SITE_NAME=
      - SITE_DESCRIPTION=
      - SITE_URL=
  • SITE_NAME: Your Marreta's name
  • SITE_DESCRIPTION: What it's for
  • SITE_URL: Where it will run, complete address with https://. If you change the port in docker-compose (e.g. 8080:80), you must also include the port in SITE_URL (e.g. https://yoursite:8080)
  • DNS_SERVERS: Which DNS servers to use 1.1.1.1, 8.8.8.8
  • SELENIUM_HOST: Selenium host server:PORT (e.g. selenium-hub:4444)

Now you can run docker compose up -d

S3 Cache

Support for cache storage in S3. Configure the following variables in your .env:

S3_CACHE_ENABLED=true

S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_REGION=us-east-1
S3_FOLDER_=cache/
S3_ACL=private
S3_ENDPOINT=

Possible configurations:

## R2
S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_ENDPOINT=https://{TOKEN}.r2.cloudflarestorage.com
S3_REGION=auto
S3_FOLDER_=cache/
S3_ACL=private

## DigitalOcean
S3_ACCESS_KEY=access_key
S3_SECRET_KEY=secret_key
S3_BUCKET=bucket_name
S3_ENDPOINT=https://{REGION}.digitaloceanspaces.com
S3_REGION=auto
S3_FOLDER_=cache/
S3_ACL=private

Selenium Integration

Integration with Selenium allows processing sites that require JavaScript or have some more advanced protection barriers. To use this feature, you need to set up a Selenium environment with Firefox. Add the following configuration to your docker-compose.yml:

services:
  selenium-firefox:
    container_name: selenium-firefox
    image: selenium/node-firefox:4.27.0-20241204
    shm_size: 2gb
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_ENABLE_TRACING=false
      - SE_NODE_MAX_SESSIONS=10
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
    depends_on:
      - selenium-hub

  selenium-hub:
    image: selenium/hub:4.27.0-20241204
    container_name: selenium-hub
    environment:
      - SE_ENABLE_TRACING=false
      - GRID_MAX_SESSION=10
      - GRID_BROWSER_TIMEOUT=10
      - GRID_TIMEOUT=10
    ports:
      - 4442:4442
      - 4443:4443
      - 4444:4444

Important settings:

  • shm_size: Sets the shared memory size for Firefox (2GB recommended)
  • SE_NODE_MAX_SESSIONS: Maximum number of concurrent sessions per node
  • GRID_MAX_SESSION: Maximum number of concurrent sessions on the hub
  • GRID_BROWSER_TIMEOUT and GRID_TIMEOUT: Timeouts in seconds

After configuring Selenium, make sure to set the SELENIUM_HOST variable in your environment to point to the Selenium hub (usually selenium-hub:4444).

Development

  1. First, clone the project:
git clone https://github.com/manualdousuario/marreta/
cd marreta/app
  1. Install the project dependencies:
composer install
npm install
  1. Create the configuration file:
cp .env.sample .env
  1. Configure the environment variables in .env

  2. Use the default.conf as a base for NGINX or point your webservice to app/

Gulp is used to compile Sass to CSS, minify JavaScript, use: gulp

⚙️ Customizing

The settings are organized in data/:

  • domain_rules.php: Specific rules for each site
  • global_rules.php: Rules that apply to all sites
  • blocked_domains.php: List of blocked sites

Translations

  • /languages/: Each language is in its ISO id (pt-br, en, es or de-de) and can be defined in the environment LANGUAGE

🛠️ Maintenance

Logging System

Logs are stored in app/logs/*.log with automatic rotation every 7 days.

Log settings available in .env or docker:

LOG_LEVEL=WARNING

Available log levels:

  • DEBUG: Detailed information for debugging
  • INFO: General information about operations
  • WARNING: Warnings that deserve attention (default)
  • ERROR: Errors that do not interrupt operation
  • CRITICAL: Critical errors that need immediate attention

View the application logs:

docker-compose logs app
# or directly from the log file
cat app/logs/*.log

Clearing the cache

When you need to clear:

docker-compose exec app rm -rf /app/cache/*

🚀 Integrations


Made with ❤️! If you have any questions or suggestions, open an issue and we'll help! 😉

Thanks to the https://github.com/burlesco/burlesco and https://github.com/nang-dev/hover-paywalls-browser-extension/ projects that served as the basis for several rules!

Star History

Star History Chart