🐍 SnakyScraper

SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.

Fast. Accurate. Snake-style scraping. 🐍🎯

🚀 Features

✅ Extract metadata: title, description, keywords, author, and more
✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
✅ Extract HTML structures: h1–h6, p, ul, ol, img, links
✅ Powerful filter() method with class, ID, and tag-based selectors
✅ return_html toggle to return clean text or raw HTML
✅ Simple return values: string, list, or dictionary
✅ Powered by BeautifulSoup4 and Requests

📦 Installation

pip install snakyscraper

Requires Python 3.7 or later

🛠️ Basic Usage

from snakyscraper import SnakyScraper

scraper = SnakyScraper("https://example.com")

# Get the page title
print(scraper.title())  # "Welcome to Example.com"

# Get meta description
print(scraper.description())  # "This is the example meta description."

# Get all <h1> elements
print(scraper.h1())  # ["Welcome", "Latest News"]

# Extract Open Graph metadata
print(scraper.open_graph())  # {"og:title": "...", "og:description": "...", ...}

# Custom filter: find all div.card elements and extract child tags
print(scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", "p", ".title", "#desc"]
))

🧪 Available Methods

🔹 Page Metadata

scraper.title()
scraper.description()
scraper.keywords()
scraper.keyword_string()
scraper.charset()
scraper.canonical()
scraper.content_type()
scraper.author()
scraper.csrf_token()
scraper.image()

🔹 Open Graph & Twitter Card

scraper.open_graph()
scraper.open_graph("og:title")

scraper.twitter_card()
scraper.twitter_card("twitter:title")

🔹 Headings & Text

scraper.h1()
scraper.h2()
scraper.h3()
scraper.h4()
scraper.h5()
scraper.h6()
scraper.p()

🔹 Lists

scraper.ul()
scraper.ol()

🔹 Images

scraper.images()
scraper.image_details()

🔹 Links

scraper.links()
scraper.link_details()

🔍 Custom DOM Filtering

Use filter() to target specific DOM elements and extract nested content.

▸ Single element

scraper.filter(
    element="div",
    attributes={"id": "main"},
    multiple=False,
    extract=[".title", "#description", "p"]
)

▸ Multiple elements

scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", ".subtitle", "#meta"]
)

The extract argument accepts tag names, class selectors (e.g., .title), or ID selectors (e.g., #meta).
Output keys are automatically normalized:
.title → class__title, #meta → id__meta

▸ Clean Text Output

You can also disable raw HTML output:

scraper.filter(
    element="p",
    attributes={"class": "dark-text"},
    multiple=True,
    return_html=False
)

📦 Output Example

scraper.title()
# "Welcome to Example.com"

scraper.h1()
# ["Main Heading", "Another Title"]

scraper.open_graph("og:title")
# "Example OG Title"

🤝 Contributing

Contributions are welcome!
Found a bug or want to request a feature? Please open an issue or submit a pull request.

📄 License

🔗 Related Projects

💡 Why SnakyScraper?

Think of it as your Pythonic sniper — targeting HTML content with precision and elegance.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
snakyscraper		snakyscraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🐍 SnakyScraper

🚀 Features

📦 Installation

🛠️ Basic Usage

🧪 Available Methods

🔹 Page Metadata

🔹 Open Graph & Twitter Card

🔹 Headings & Text

🔹 Lists

🔹 Images

🔹 Links

🔍 Custom DOM Filtering

▸ Single element

▸ Multiple elements

▸ Clean Text Output

📦 Output Example

🤝 Contributing

📄 License

🔗 Related Projects

💡 Why SnakyScraper?

About

Uh oh!

Releases

Packages

Languages

License

riodevnet/snakyscraper

Folders and files

Latest commit

History

Repository files navigation

🐍 SnakyScraper

🚀 Features

📦 Installation

🛠️ Basic Usage

🧪 Available Methods

🔹 Page Metadata

🔹 Open Graph & Twitter Card

🔹 Headings & Text

🔹 Lists

🔹 Images

🔹 Links

🔍 Custom DOM Filtering

▸ Single element

▸ Multiple elements

▸ Clean Text Output

📦 Output Example

🤝 Contributing

📄 License

🔗 Related Projects

💡 Why SnakyScraper?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages