spiderbot

a spider bot, published as module spiderbot.

How to use?

Install

pip install spiderbot

install chrome browser and chromedriver, and put chromedrive binary into the PATH dir.

Config

Init the config_private.py, using config_private_sample.py as example, and update the value of XPATHS and DB_NAME arguments.

or,

When Generating an instance of SpiderBot, pass the value of xpaths and db_name arguments.

Run it

Init the database by pass the init=True to Generate an instance of SpiderBot. If successed, spiderbot.db was created.

from spiderbot import SpiderBot

bot = SpiderBot(skip_driver=True, init=True)

Then add users to scrawler. You can add users always as needed.

from spiderbot import SpiderBot

urls = ["https://example.com/user_a_homepage", "https://example.com/user_b_homepage"]

bot = SpiderBot()
bot.add_users(working_status=True, *urls)

At last, do the main job:

from spiderbot import SpiderBot

bot = SpiderBot()
bot.get_profiles()
bot.get_new_posturls()
bot.get_history_posturls(1, 9)
bot.get_posts()
bot.quit()

more examples

Code Format

isort .
black .
pylint spiderbot > pylint_spiderbot.log

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
spiderbot		spiderbot
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spiderbot

How to use?

Install

Config

Run it

Code Format

About

Releases

Packages

Languages

License

liujuanjuan1984/spiderbot

Folders and files

Latest commit

History

Repository files navigation

spiderbot

How to use?

Install

Config

Run it

Code Format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages