Skip to content

Commit

Permalink
v2.5.0
Browse files Browse the repository at this point in the history
- Added to_sqlite flag argument
- Added logic to prevent scraping the day that already passed for scrape_until_month_end.py and thread_scrape.py
- Added utils.py
- Deleted automated_scraper.py
  • Loading branch information
sakan811 committed Jun 1, 2024
1 parent 773cf42 commit f8c6a75
Show file tree
Hide file tree
Showing 7 changed files with 25 additions and 218 deletions.
24 changes: 12 additions & 12 deletions .github/workflows/scrape.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,40 +23,40 @@ jobs:
run: pip install -r requirements.txt

- name: Run Scraper For January
run: python automated_scraper.py --month=1
run: python main.py --thread_pool=True --month=1

- name: Run Scraper For February
run: python automated_scraper.py --month=2
run: python main.py --thread_pool=True --month=2

- name: Run Scraper For March
run: python automated_scraper.py --month=3
run: python main.py --thread_pool=True --month=3

- name: Run Scraper For April
run: python automated_scraper.py --month=4
run: python main.py --thread_pool=True --month=4

- name: Run Scraper For May
run: python automated_scraper.py --month=5
run: python main.py --thread_pool=True --month=5

- name: Run Scraper For June
run: python automated_scraper.py --month=6
run: python main.py --thread_pool=True --month=6

- name: Run Scraper For July
run: python automated_scraper.py --month=7
run: python main.py --thread_pool=True --month=7

- name: Run Scraper For August
run: python automated_scraper.py --month=8
run: python main.py --thread_pool=True --month=8

- name: Run Scraper For September
run: python automated_scraper.py --month=9
run: python main.py --thread_pool=True --month=9

- name: Run Scraper For October
run: python automated_scraper.py --month=10
run: python main.py --thread_pool=True --month=10

- name: Run Scraper For November
run: python automated_scraper.py --month=11
run: python main.py --thread_pool=True --month=11

- name: Run Scraper For December
run: python automated_scraper.py --month=12
run: python main.py --thread_pool=True --month=12

- id: 'auth'
uses: 'google-github-actions/auth@v2'
Expand Down
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,12 @@ This script can also be used to scrape data from other cities.
```
python main.py --to_sqlite=True
```
- Month to scrape can be specified using ```--month=(month number as int)``` for Thread Pool and Month End Scraper.
- For example, to scrape data from June of the current year using Thread Pool Scraper, run the following command line:
```
python main.py --thread_pool=True --month=6
```
### Dataclass
[set_details.py](set_details.py)
- Dataclass that stores booking details, date, and length of stay.
Expand Down Expand Up @@ -105,4 +110,4 @@ This script can also be used to scrape data from other cities.
[automated_scraper.py](automated_scraper.py)
- Scrape Osaka hotel data daily using GitHub action for all 12 months.
- Save to CSV for each month.
- Save CSV to Google Cloud Storage
- Save CSV to Google Cloud Storage.
199 changes: 0 additions & 199 deletions automated_scraper.py

This file was deleted.

2 changes: 1 addition & 1 deletion japan_avg_hotel_price_finder/scrape_until_month_end.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def scrape_until_month_end(self, to_sqlite: bool = False) -> None | pd.DataFrame
while current_date <= end_date:
current_date_has_passed: bool = check_if_current_date_has_passed(self.year, self.month, self.start_day)
if current_date_has_passed:
logger.warning(f'The current day of the month to scrape was passed. Skip this day.')
logger.warning(f'The current day of the month to scrape was passed. Skip {self.year}-{self.month}-{self.start_day}.')
else:
check_in = current_date.strftime('%Y-%m-%d')
check_out = (current_date + timedelta(days=self.nights)).strftime('%Y-%m-%d')
Expand Down
4 changes: 2 additions & 2 deletions japan_avg_hotel_price_finder/thread_scrape.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def thread_scrape(self, to_sqlite: bool = False) -> None | pd.DataFrame:
results = []

# Define a function to perform scraping for each date
def scrape_each_date(day: int):
def scrape_each_date(day: int) -> None:
"""
Scrape hotel data of the given date.
:param day: Day of the month.
Expand All @@ -60,7 +60,7 @@ def scrape_each_date(day: int):

current_date = datetime(self.year, self.month, day)
if current_date_has_passed:
logger.warning(f'The current day of the month to scrape was passed. Skip this day.')
logger.warning(f'The current day of the month to scrape was passed. Skip {self.year}-{self.month}-{day}.')
else:
check_in: str = current_date.strftime('%Y-%m-%d')
check_out: str = (current_date + timedelta(days=self.nights)).strftime('%Y-%m-%d')
Expand Down
1 change: 0 additions & 1 deletion japan_avg_hotel_price_finder/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ def check_if_current_date_has_passed(year, month, day):
today_for_check = datetime.today().strftime('%Y-%m-%d')
current_date_for_check = datetime(year, month, day).strftime('%Y-%m-%d')
if current_date_for_check < today_for_check:
logger.warning(f'The current day of the month to scrape was passed. Skip {year}-{month}-{day}.')
return True
else:
return False
4 changes: 3 additions & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,11 @@
parser.add_argument('--month_end', type=bool, default=False, help='Scrape until month end')
parser.add_argument('--scraper', type=bool, default=True, help='Use basic scraper')
parser.add_argument('--to_sqlite', type=bool, default=False, help='Use basic scraper')
parser.add_argument('--month', type=int, help='Month to scrape data for (1-12)', required=True)
args = parser.parse_args()

details = Details()
month = args.month
details = Details(month=month)

if args.thread_pool:
logger.info('Using thread pool scraper')
Expand Down

0 comments on commit f8c6a75

Please sign in to comment.