Edzapp Scraper

This is a little scraper I hacked together to pull job listings off of EdZapp to simplify the search process. It isn't beautiful, but hopefully it can be of some use to you!

Installation

Clone repository:

git clone git://github.com/jamesadney/edzapp-scraper.git

Install dependencies (Scrapy >= 0.14.4):
```
pip install -r requirements.txt
```

Simple Usage

From inside the cloned edzapp-scraper folder:

scrapy crawl edzapp -o jobs.csv -t csv

Django Site

I'm working on a Django frontend for the scraped data. Currently, all it does is store the scraped data in a database.

Setting up the Django project

Generate a `SECRET_KEY`

>>> import os
>>> os.urandom(24)
'\xf3!\xd5\x8d\x07\x98\xa2\x0b\xf4\xc0Y]$\x11\x8aJ\xb3\x8fk\'\xa4"\xe9P'
>>> # Of course, don't use this key!

Add the key to ./edzapp/django_edzapp/django_edzapp/settings.py.

Initialize the database

From ./edzapp/django_edzapp/ run:

python manage.py syncdb

Enable the `DjangoJobPipeline`

Uncomment 'edzapp.pipelines.DjangoJobPipeline', in settings.py.

ITEM_PIPELINES = [
#    'edzapp.pipelines.DjangoJobPipeline',
]

Customization

Disable scraping job pages

Disabling this will probably break the Django site

By default, the scraper opens the link for each job to pull information from the job's page. This significantly increases the amount of time required to scrape the site.

If you don't need the extra information, open settings.py and change

PARSE_JOB_PAGES = True

to

PARSE_JOB_PAGES = False

Change the job "role"

In order to minimize the number of pages scraped, the spider tells EdZapp to only show positions labeled as "TEACHER/CLASSIFIED". Other available search options are:

STUDENT_SUPPORT_SERVICES

ADMINISTRATOR

INSTRUCTIONAL_SUPPORT

NON-INSTRUCTIONAL_SUPPORT

PROFESSIONAL/EXECUTIVE

EXTRACURRICULAR

To change this setting, open settings.py and use the role as listed above as the keyword on this line instead of 'TEACHER/CLASSIFIED'.

ROLE = constants.ROLES['TEACHER/CLASSIFIED']

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
edzapp		edzapp
old_mechanize		old_mechanize
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
dev_requirements.txt		dev_requirements.txt
parse_jobs.py		parse_jobs.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edzapp Scraper

Installation

Simple Usage

Django Site

Setting up the Django project

Generate a `SECRET_KEY`

Initialize the database

Enable the `DjangoJobPipeline`

Customization

Disable scraping job pages

Change the job "role"

About

Releases

Packages

Languages

License

jamesadney/edzapp-scraper

Folders and files

Latest commit

History

Repository files navigation

Edzapp Scraper

Installation

Simple Usage

Django Site

Setting up the Django project

Generate a SECRET_KEY

Initialize the database

Enable the DjangoJobPipeline

Customization

Disable scraping job pages

Change the job "role"

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Generate a `SECRET_KEY`

Enable the `DjangoJobPipeline`

Packages