Skip to content

Commit

Permalink
[c] Upgrade to Python 3.9
Browse files Browse the repository at this point in the history
  • Loading branch information
lamle-ea committed Nov 3, 2023
1 parent 105ed65 commit 7a8332a
Show file tree
Hide file tree
Showing 8 changed files with 763 additions and 629 deletions.
29 changes: 14 additions & 15 deletions .github/workflows/archive.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ jobs:
sudo openvpn --config /etc/openvpn/ovpn.conf --daemon
sleep 120
- name: Set up Python 3.7
- name: Set up Python 3.9
uses: actions/setup-python@v1
with:
python-version: 3.7
python-version: 3.9

- name: Install Pipenv
uses: dschep/install-pipenv-action@v1
Expand All @@ -43,28 +43,27 @@ jobs:
uses: actions/cache@v1
with:
path: .venv
key: pip-3.7-${{ hashFiles('**/Pipfile.lock') }}
key: pip-3.9-${{ hashFiles('**/Pipfile.lock') }}
restore-keys: |
pip-3.7-
pip-3.9-
pip-
- name: Install dependencies
run: pipenv sync
env:
PIPENV_DEFAULT_PYTHON_VERSION: 3.7
PIPENV_DEFAULT_PYTHON_VERSION: 3.9

- name: Run scrapers
run: |
export PYTHONPATH=$(pwd):$PYTHONPATH
./.deploy.sh
- name: Notify slack on job failure
id: slack
uses: slackapi/slack-github-action@v1.17.0
with:
channel-id: 'C01A6HC2FU6'
slack-message: 'Archive Cronjob Failed'
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
if: ${{ failure() }}

- name: Notify slack on job failure
id: slack
uses: slackapi/slack-github-action@v1.17.0
with:
channel-id: 'C01A6HC2FU6'
slack-message: 'Archive Cronjob Failed'
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
if: ${{ failure() }}
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ jobs:
check:
runs-on: ubuntu-latest
strategy:
max-parallel: 3
max-parallel: 2
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.9]

steps:
- uses: actions/checkout@v1
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ jobs:
sudo openvpn --config /etc/openvpn/ovpn.conf --daemon
sleep 120
- name: Set up Python 3.7
- name: Set up Python 3.9
uses: actions/setup-python@v1
with:
python-version: 3.7
python-version: 3.9

- name: Install Pipenv
uses: dschep/install-pipenv-action@v1
Expand All @@ -48,15 +48,15 @@ jobs:
uses: actions/cache@v1
with:
path: .venv
key: pip-3.7-${{ hashFiles('**/Pipfile.lock') }}
key: pip-3.9-${{ hashFiles('**/Pipfile.lock') }}
restore-keys: |
pip-3.7-
pip-3.9-
pip-
- name: Install dependencies
run: pipenv sync
env:
PIPENV_DEFAULT_PYTHON_VERSION: 3.7
PIPENV_DEFAULT_PYTHON_VERSION: 3.9

- name: Run scrapers
run: |
Expand All @@ -67,7 +67,7 @@ jobs:
run: |
export PYTHONPATH=$(pwd):$PYTHONPATH
pipenv run scrapy combinefeeds -s LOG_ENABLED=False
- name: Notify slack on job failure
id: slack
uses: slackapi/slack-github-action@v1.17.0
Expand Down
2 changes: 1 addition & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ freezegun = "*"
pytest = "*"
"flake8" = "*"
isort = "*"
black = "==19.10b0"
black = "*"
1,242 changes: 695 additions & 547 deletions Pipfile.lock

Large diffs are not rendered by default.

76 changes: 36 additions & 40 deletions city_scrapers/spiders/cle_design_review.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,40 +21,40 @@ class CleDesignReviewSpider(CityScrapersSpider):

def parse(self, response):
"""
There's no element that wraps both the committee name/time and
the dropdown containing the agendas. As such we want to grab
each committee name/times and then use the following dropdown
to get the agendas. Luckily all of the committee name/times are
(and are the only thing in) divs with the class '.mt-3' so we can
grab all the divs with those classes and then look for the next sibling
div with the ".dropdown" class to get the links to all the agendas.
Note that the city planning meeting is handled by a different scraper so
we do look at it here. Luckily the name/times for the city planning
meeting are not currently wrapped in a div, so the list of nodes described
above won't include it.
There are three other points to keep in mind for this scraper:
1. The way the data is presented doesn't make it easy to know whether or
not a meeting occurred but doesn't have an agenda, or whether a meeting
is going to happen on a normal meeting date. The strategy I'm using is
to treat the agenda links as authoritative for past (and if listed
upcoming) meetings. So previous meetings are just read off of the agenda
links. For future meetings we take the date of the most recent agenda
and then calculate meetings for 60 days from that date. As dates
progress and agendas are added, those tentative meetings will either be
confirmed to exist or disappear based on the ways the agendas are
updated. For calculated meetings we add a line to the description
encouraging users to verify the meeting with staff before attempting to
attend.
2. There is no mention of the year anywhere in the text of the site. We
can extract it from the agenda link - at least for now. But it will
be important to keep an eye on how the site is changed in January.
3. Meetings are currently not being held in person but over webex. We've
included this information in the meeting description.
There's no element that wraps both the committee name/time and
the dropdown containing the agendas. As such we want to grab
each committee name/times and then use the following dropdown
to get the agendas. Luckily all of the committee name/times are
(and are the only thing in) divs with the class '.mt-3' so we can
grab all the divs with those classes and then look for the next sibling
div with the ".dropdown" class to get the links to all the agendas.
Note that the city planning meeting is handled by a different scraper so
we do look at it here. Luckily the name/times for the city planning
meeting are not currently wrapped in a div, so the list of nodes described
above won't include it.
There are three other points to keep in mind for this scraper:
1. The way the data is presented doesn't make it easy to know whether or
not a meeting occurred but doesn't have an agenda, or whether a meeting
is going to happen on a normal meeting date. The strategy I'm using is
to treat the agenda links as authoritative for past (and if listed
upcoming) meetings. So previous meetings are just read off of the agenda
links. For future meetings we take the date of the most recent agenda
and then calculate meetings for 60 days from that date. As dates
progress and agendas are added, those tentative meetings will either be
confirmed to exist or disappear based on the ways the agendas are
updated. For calculated meetings we add a line to the description
encouraging users to verify the meeting with staff before attempting to
attend.
2. There is no mention of the year anywhere in the text of the site. We
can extract it from the agenda link - at least for now. But it will
be important to keep an eye on how the site is changed in January.
3. Meetings are currently not being held in person but over webex. We've
included this information in the meeting description.
"""
committee_metas = response.css(
"div.mt-3"
Expand All @@ -80,12 +80,8 @@ def parse(self, response):

# Start by looking through the agendas for existing meetings
for agenda in commitee_agenda_list.css("div.dropdown-menu a.dropdown-item"):
month_str = (
agenda.css("*::text").extract_first().strip().split(" ")[0]
)
day_str = (
agenda.css("*::text").extract_first().strip().split(" ")[1]
)
month_str = agenda.css("*::text").extract_first().strip().split(" ")[0]
day_str = agenda.css("*::text").extract_first().strip().split(" ")[1]
year_str = self._parse_year_from_agenda_link(agenda)

start = self._parse_start(year_str, month_str, day_str, time_str)
Expand Down
4 changes: 3 additions & 1 deletion city_scrapers/spiders/cle_gateway_economic_development.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ def _parse_title(self, item):
def _parse_start(self, item):
"""Parse start datetime as a naive datetime object."""
item_str = re.sub(
r"\s+", " ", " ".join(item.css("td:first-child *::text").extract()),
r"\s+",
" ",
" ".join(item.css("td:first-child *::text").extract()),
).strip()
date_match = re.search(r"[a-zA-Z]{3,10} \d{1,2},? \d{4}", item_str)
if not date_match:
Expand Down
23 changes: 6 additions & 17 deletions city_scrapers/spiders/cle_planning_commission.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,17 +69,13 @@ def parse(self, response):

# Start by looking through the agendas for existing meetings
for agenda in commission_agendas.css("div.dropdown-menu a.dropdown-item"):
'''
"""
month_str, day_str = (
agenda.css("*::text").extract_first().strip().split(" ")
)
'''
month_str = (
agenda.css("*::text").extract_first().strip().split(" ")[0]
)
day_str = (
agenda.css("*::text").extract_first().strip().split(" ")[1]
)
"""
month_str = agenda.css("*::text").extract_first().strip().split(" ")[0]
day_str = agenda.css("*::text").extract_first().strip().split(" ")[1]

year_str = self._parse_year_from_link(agenda)

Expand Down Expand Up @@ -208,14 +204,7 @@ def _dropdown_to_key(self, item):
Transform a dropdown item into a text key representing the date in the format:
year-month-day such as 2021-dec-3rd.
"""
name = item.css("::text").extract_first()
#[month, day] = name.split(" ")
#month = month[0:3].lower()
month_str = (
item.css("*::text").extract_first().strip().split(" ")[0]
)
day_str = (
item.css("*::text").extract_first().strip().split(" ")[1]
)
month_str = item.css("*::text").extract_first().strip().split(" ")[0]
day_str = item.css("*::text").extract_first().strip().split(" ")[1]
year = self._parse_year_from_link(item)
return f"{year}-{month_str}-{day_str}"

0 comments on commit 7a8332a

Please sign in to comment.