-
Notifications
You must be signed in to change notification settings - Fork 5
Implement Caching #7 #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
4796a4e
Add technical documentation to README (task 17)
gmanhas12 f7efa55
Merge remote-tracking branch 'origin/primary' into update-readme
gmanhas12 d714607
Merge branch 'primary' of https://github.com/evuventures/cheaper into…
gmanhas12 a3b373e
added caching, used functools already built in python, and used lru c…
gmanhas12 84264a7
addressed comments on inital commit, creates a test file in the ./src…
gmanhas12 ba0a71a
Merge branch 'primary' of https://github.com/evuventures/cheaper into…
gmanhas12 fe6a85e
fixed failing build
gmanhas12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| { | ||
| "/": [ | ||
| "A Light in the Attic", | ||
| "Tipping the Velvet", | ||
| "Soumission", | ||
| "Sharp Objects", | ||
| "Sapiens: A Brief History of Humankind", | ||
| "The Requiem Red", | ||
| "The Dirty Little Secrets of Getting Your Dream Job", | ||
| "The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull", | ||
| "The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics", | ||
| "The Black Maria", | ||
| "Starving Hearts (Triangular Trade Trilogy, #1)", | ||
| "Shakespeare's Sonnets", | ||
| "Set Me Free", | ||
| "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)", | ||
| "Rip it Up and Start Again", | ||
| "Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991", | ||
| "Olio", | ||
| "Mesaerion: The Best Science Fiction Stories 1800-1849", | ||
| "Libertarianism for Beginners", | ||
| "It's Only the Himalayas" | ||
| ] | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
johnnvij marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| import requests | ||
| import logging | ||
| from functools import lru_cache | ||
| from typing import Optional | ||
|
|
||
|
|
||
| @lru_cache(maxsize=128) | ||
| def cached_get(url: str, user_agent: str) -> Optional[str]: | ||
| print(f"[HTTP Request] Fetching from web: {url}") | ||
| headers = {"User-Agent": user_agent} | ||
| try: | ||
| response = requests.get(url, headers=headers, timeout=10) | ||
| response.raise_for_status() | ||
| return response.text | ||
| except requests.RequestException as e: | ||
| logging.error(f"Error fetching {url}: {e}") | ||
| return None | ||
|
|
||
|
|
||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| import unittest | ||
| import time | ||
|
|
||
| from webscraper.src.Cheaper_Scraper import CheaperScraper | ||
| from webscraper.src.fetch_utils import cached_get | ||
|
|
||
| #to test, be in the webscraper directory and use the following command in terminal | ||
| # python -m unittest webscraper.src.tests.test_fetch_and_cache -v | ||
|
|
||
|
|
||
|
|
||
| class TestCheaperScraperFetchCache(unittest.TestCase): | ||
|
|
||
| def setUp(self): | ||
| self.scraper = CheaperScraper("https://books.toscrape.com") | ||
| cached_get.cache_clear() # Reset cache before each test | ||
|
|
||
| def test_valid_fetch(self): | ||
| html = self.scraper.fetch("/") | ||
| self.assertIsInstance(html, str) | ||
| self.assertIn("<html", html.lower()) | ||
|
|
||
| def test_invalid_path_fetch(self): | ||
| html = self.scraper.fetch("/this-page-does-not-exist") | ||
| # Even though it doesn't exist, the site may return a 200 with a 404 page | ||
| self.assertTrue(html is None or "<html" in html.lower()) | ||
|
|
||
| def test_cache_effectiveness(self): | ||
| start = time.time() | ||
| self.scraper.fetch("/") # First fetch | ||
| time1 = time.time() - start | ||
|
|
||
| start = time.time() | ||
| self.scraper.fetch("/") # Second fetch (should be cached) | ||
| time2 = time.time() - start | ||
|
|
||
| cache_info = cached_get.cache_info() | ||
| self.assertLess(time2, time1) | ||
| self.assertGreaterEqual(cache_info.hits, 1) | ||
|
|
||
| def test_non_http_url(self): | ||
| with self.assertRaises(ValueError): | ||
| CheaperScraper("not_a_real_url") | ||
|
|
||
| def test_cache_timing_and_stats(self): | ||
| print("\n=== Cache Timing and Stats Test ===") | ||
|
|
||
| # First fetch (expected to be slow and hit the network) | ||
| start = time.time() | ||
| html1 = self.scraper.fetch("/") | ||
| time1 = round(time.time() - start, 2) | ||
| print(f"First fetch took: {time1} seconds") | ||
|
|
||
| # Second fetch (expected to be fast due to cache) | ||
| start = time.time() | ||
| html2 = self.scraper.fetch("/") | ||
| time2 = round(time.time() - start, 2) | ||
| print(f"Second fetch took: {time2} seconds") | ||
|
|
||
| # Confirm that the second fetch was faster | ||
| self.assertLess(time2, time1, "Second fetch should be faster due to caching") | ||
|
|
||
| # Print and assert cache stats | ||
| stats = cached_get.cache_info() | ||
| print("Cache stats:", stats) | ||
| self.assertGreaterEqual(stats.hits, 1, "There should be at least 1 cache hit") | ||
|
|
||
|
|
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.