GitHub - gmanroney/scrapyCraigslistMongo: Uses scrapy to read section of craigslist site and store results in a mongo database

This code written in python uses scrapy to read section of craigslist site and store results in a mongo database

This code is an amalgamation of code and concepts described in a number of webpages

Extracting items with pagination using scrapy http://python.gotrained.com/scrapy-tutorial-web-scraping-craigslist/

https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/

Storing results of scrap in Mongo https://realpython.com/blog/python/web-scraping-with-scrapy-and-mongodb/

This code uses craigslist as the source and ammends records to the MongoDB collection each time it is run; in later versions the record will be checked to avoid occurence of duplicate records in the database

The code needs python to run; the specific requirements can be found in the file settings.txt

The code also needs the latest version of scrapy (1.4.0) and mongo (3.4.7) installed; for this implementation no username or password should be set in mongo. Again, something for improvement in the future

Once the above pre-requisites are met the scraper can be run using the command: scrapy crawl jobsIE

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
craigslist		craigslist
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

gmanroney/scrapyCraigslistMongo

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages