Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
venv/
62 changes: 62 additions & 0 deletions quotes_js_scraper/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[
{"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d", "author": "Albert Einstein", "tags": ["change", "deep-thoughts", "thinking", "world"]},
{"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d", "author": "J.K. Rowling", "tags": ["abilities", "choices"]},
{"text": "\u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.\u201d", "author": "Albert Einstein", "tags": ["inspirational", "life", "live", "miracle", "miracles"]},
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen", "tags": ["aliteracy", "books", "classic", "humor"]},
{"text": "\u201cImperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.\u201d", "author": "Marilyn Monroe", "tags": ["be-yourself", "inspirational"]},
{"text": "\u201cTry not to become a man of success. Rather become a man of value.\u201d", "author": "Albert Einstein", "tags": ["adulthood", "success", "value"]},
{"text": "\u201cIt is better to be hated for what you are than to be loved for what you are not.\u201d", "author": "Andr\u00e9 Gide", "tags": ["life", "love"]},
{"text": "\u201cI have not failed. I've just found 10,000 ways that won't work.\u201d", "author": "Thomas A. Edison", "tags": ["edison", "failure", "inspirational", "paraphrased"]},
{"text": "\u201cA woman is like a tea bag; you never know how strong it is until it's in hot water.\u201d", "author": "Eleanor Roosevelt", "tags": ["misattributed-eleanor-roosevelt"]},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin", "tags": ["humor", "obvious", "simile"]},
{"text": "\u201cThis life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.\u201d", "author": "Marilyn Monroe", "tags": ["friends", "heartbreak", "inspirational", "life", "love", "sisters"]},
{"text": "\u201cIt takes a great deal of bravery to stand up to our enemies, but just as much to stand up to our friends.\u201d", "author": "J.K. Rowling", "tags": ["courage", "friends"]},
{"text": "\u201cIf you can't explain it to a six year old, you don't understand it yourself.\u201d", "author": "Albert Einstein", "tags": ["simplicity", "understand"]},
{"text": "\u201cYou may not be her first, her last, or her only. She loved before she may love again. But if she loves you now, what else matters? She's not perfect\u2014you aren't either, and the two of you may never be perfect together but if she can make you laugh, cause you to think twice, and admit to being human and making mistakes, hold onto her and give her the most you can. She may not be thinking about you every second of the day, but she will give you a part of her that she knows you can break\u2014her heart. So don't hurt her, don't change her, don't analyze and don't expect more than she can give. Smile when she makes you happy, let her know when she makes you mad, and miss her when she's not there.\u201d", "author": "Bob Marley", "tags": ["love"]},
{"text": "\u201cI like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.\u201d", "author": "Dr. Seuss", "tags": ["fantasy"]},
{"text": "\u201cI may not have gone where I intended to go, but I think I have ended up where I needed to be.\u201d", "author": "Douglas Adams", "tags": ["life", "navigation"]},
{"text": "\u201cThe opposite of love is not hate, it's indifference. The opposite of art is not ugliness, it's indifference. The opposite of faith is not heresy, it's indifference. And the opposite of life is not death, it's indifference.\u201d", "author": "Elie Wiesel", "tags": ["activism", "apathy", "hate", "indifference", "inspirational", "love", "opposite", "philosophy"]},
{"text": "\u201cIt is not a lack of love, but a lack of friendship that makes unhappy marriages.\u201d", "author": "Friedrich Nietzsche", "tags": ["friendship", "lack-of-friendship", "lack-of-love", "love", "marriage", "unhappy-marriage"]},
{"text": "\u201cGood friends, good books, and a sleepy conscience: this is the ideal life.\u201d", "author": "Mark Twain", "tags": ["books", "contentment", "friends", "friendship", "life"]},
{"text": "\u201cLife is what happens to us while we are making other plans.\u201d", "author": "Allen Saunders", "tags": ["fate", "life", "misattributed-john-lennon", "planning", "plans"]},
{"text": "\u201cI love you without knowing how, or when, or from where. I love you simply, without problems or pride: I love you in this way because I do not know any other way of loving but this, in which there is no I or you, so intimate that your hand upon my chest is my hand, so intimate that when I fall asleep your eyes close.\u201d", "author": "Pablo Neruda", "tags": ["love", "poetry"]},
{"text": "\u201cFor every minute you are angry you lose sixty seconds of happiness.\u201d", "author": "Ralph Waldo Emerson", "tags": ["happiness"]},
{"text": "\u201cIf you judge people, you have no time to love them.\u201d", "author": "Mother Teresa", "tags": ["attributed-no-source"]},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor", "tags": ["humor", "religion"]},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson", "tags": ["humor"]},
{"text": "\u201cToday you are You, that is truer than true. There is no one alive who is Youer than You.\u201d", "author": "Dr. Seuss", "tags": ["comedy", "life", "yourself"]},
{"text": "\u201cIf you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.\u201d", "author": "Albert Einstein", "tags": ["children", "fairy-tales"]},
{"text": "\u201cIt is impossible to live without failing at something, unless you live so cautiously that you might as well not have lived at all - in which case, you fail by default.\u201d", "author": "J.K. Rowling", "tags": []},
{"text": "\u201cLogic will get you from A to Z; imagination will get you everywhere.\u201d", "author": "Albert Einstein", "tags": ["imagination"]},
{"text": "\u201cOne good thing about music, when it hits you, you feel no pain.\u201d", "author": "Bob Marley", "tags": ["music"]},
{"text": "\u201cThe more that you read, the more things you will know. The more that you learn, the more places you'll go.\u201d", "author": "Dr. Seuss", "tags": ["learning", "reading", "seuss"]},
{"text": "\u201cOf course it is happening inside your head, Harry, but why on earth should that mean that it is not real?\u201d", "author": "J.K. Rowling", "tags": ["dumbledore"]},
{"text": "\u201cThe truth is, everyone is going to hurt you. You just got to find the ones worth suffering for.\u201d", "author": "Bob Marley", "tags": ["friendship"]},
{"text": "\u201cNot all of us can do great things. But we can do small things with great love.\u201d", "author": "Mother Teresa", "tags": ["misattributed-to-mother-teresa", "paraphrased"]},
{"text": "\u201cTo the well-organized mind, death is but the next great adventure.\u201d", "author": "J.K. Rowling", "tags": ["death", "inspirational"]},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz", "tags": ["chocolate", "food", "humor"]},
{"text": "\u201cWe read to know we're not alone.\u201d", "author": "William Nicholson", "tags": ["misattributed-to-c-s-lewis", "reading"]},
{"text": "\u201cAny fool can know. The point is to understand.\u201d", "author": "Albert Einstein", "tags": ["knowledge", "learning", "understanding", "wisdom"]},
{"text": "\u201cI have always imagined that Paradise will be a kind of library.\u201d", "author": "Jorge Luis Borges", "tags": ["books", "library"]},
{"text": "\u201cIt is never too late to be what you might have been.\u201d", "author": "George Eliot", "tags": ["inspirational"]},
{"text": "\u201cA reader lives a thousand lives before he dies, said Jojen. The man who never reads lives only one.\u201d", "author": "George R.R. Martin", "tags": ["read", "readers", "reading", "reading-books"]},
{"text": "\u201cYou can never get a cup of tea large enough or a book long enough to suit me.\u201d", "author": "C.S. Lewis", "tags": ["books", "inspirational", "reading", "tea"]},
{"text": "\u201cYou believe lies so you eventually learn to trust no one but yourself.\u201d", "author": "Marilyn Monroe", "tags": []},
{"text": "\u201cIf you can make a woman laugh, you can make her do anything.\u201d", "author": "Marilyn Monroe", "tags": ["girls", "love"]},
{"text": "\u201cLife is like riding a bicycle. To keep your balance, you must keep moving.\u201d", "author": "Albert Einstein", "tags": ["life", "simile"]},
{"text": "\u201cThe real lover is the man who can thrill you by kissing your forehead or smiling into your eyes or just staring into space.\u201d", "author": "Marilyn Monroe", "tags": ["love"]},
{"text": "\u201cA wise girl kisses but doesn't love, listens but doesn't believe, and leaves before she is left.\u201d", "author": "Marilyn Monroe", "tags": ["attributed-no-source"]},
{"text": "\u201cOnly in the darkness can you see the stars.\u201d", "author": "Martin Luther King Jr.", "tags": ["hope", "inspirational"]},
{"text": "\u201cIt matters not what someone is born, but what they grow to be.\u201d", "author": "J.K. Rowling", "tags": ["dumbledore"]},
{"text": "\u201cLove does not begin and end the way we seem to think it does. Love is a battle, love is a war; love is a growing up.\u201d", "author": "James Baldwin", "tags": ["love"]},
{"text": "\u201cThere is nothing I would not do for those who are really my friends. I have no notion of loving people by halves, it is not my nature.\u201d", "author": "Jane Austen", "tags": ["friendship", "love"]},
{"text": "\u201cDo one thing every day that scares you.\u201d", "author": "Eleanor Roosevelt", "tags": ["attributed", "fear", "inspiration"]},
{"text": "\u201cI am good, but not an angel. I do sin, but I am not the devil. I am just a small girl in a big world trying to find someone to love.\u201d", "author": "Marilyn Monroe", "tags": ["attributed-no-source"]},
{"text": "\u201cIf I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.\u201d", "author": "Albert Einstein", "tags": ["music"]},
{"text": "\u201cIf you only read the books that everyone else is reading, you can only think what everyone else is thinking.\u201d", "author": "Haruki Murakami", "tags": ["books", "thought"]},
{"text": "\u201cThe difference between genius and stupidity is: genius has its limits.\u201d", "author": "Alexandre Dumas fils", "tags": ["misattributed-to-einstein"]},
{"text": "\u201cHe's like a drug for you, Bella.\u201d", "author": "Stephenie Meyer", "tags": ["drug", "romance", "simile"]},
{"text": "\u201cThere is no friend as loyal as a book.\u201d", "author": "Ernest Hemingway", "tags": ["books", "friends", "novelist-quotes"]},
{"text": "\u201cWhen one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.\u201d", "author": "Helen Keller", "tags": ["inspirational"]},
{"text": "\u201cLife isn't about finding yourself. Life is about creating yourself.\u201d", "author": "George Bernard Shaw", "tags": ["inspirational", "life", "yourself"]}
]
Binary file added quotes_js_scraper/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 35 additions & 31 deletions quotes_js_scraper/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,82 +7,86 @@
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'quotes_js_scraper'
BOT_NAME = "quotes_js_scraper"

SPIDER_MODULES = ['quotes_js_scraper.spiders']
NEWSPIDER_MODULE = 'quotes_js_scraper.spiders'
SPIDER_MODULES = ["quotes_js_scraper.spiders"]
NEWSPIDER_MODULE = "quotes_js_scraper.spiders"


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'quotes_js_scraper (+http://www.yourdomain.com)'
# USER_AGENT = 'quotes_js_scraper (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
# CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
# CONCURRENT_REQUESTS_PER_DOMAIN = 16
# CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
# COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
# }

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# SPIDER_MIDDLEWARES = {
# 'quotes_js_scraper.middlewares.QuotesJsScraperSpiderMiddleware': 543,
#}
# }

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'quotes_js_scraper.middlewares.QuotesJsScraperDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# }

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# ITEM_PIPELINES = {
# 'quotes_js_scraper.pipelines.QuotesJsScraperPipeline': 300,
#}
# }

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
# HTTPCACHE_ENABLED = True
# HTTPCACHE_EXPIRATION_SECS = 0
# HTTPCACHE_DIR = 'httpcache'
# HTTPCACHE_IGNORE_HTTP_CODES = []
# HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
58 changes: 53 additions & 5 deletions quotes_js_scraper/spiders/quotes.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,59 @@
import scrapy
from quotes_js_scraper.items import QuoteItem
from scrapy_playwright.page import PageMethod


class QuotesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/']
name = "quotes"
allowed_domains = ["quotes.toscrape.com"]

def parse(self, response):
pass
def start_requests(self):
url = "https://quotes.toscrape.com/scroll"
yield scrapy.Request(
url,
meta=dict(
playwright=True,
playwright_include_page=True,
playwright_page_methods=[
PageMethod("wait_for_selector", "div.quote"),
# PageMethod(
# "evaluate", "window.scrollBy(0, document.body.scrollHeight)"
# ),
# PageMethod("wait_for_selector", "div.quote:nth-child(11)"),
],
),
errback=self.errback,
)

async def parse(self, response):
page = response.meta["playwright_page"]
await page.close()

# for quote in response.css("div.quote"):
# quote_item = QuoteItem()
# quote_item["text"] = quote.css("span.text::text").get()
# quote_item["author"] = quote.css("small.author::text").get()
# quote_item["tags"] = quote.css("div.tags a.tag::text").getall()
# self.logger.info(f"Quote: {quote_item}")
# yield quote_item

# self.log("Saved all quotes")
# next_page = response.css("li.next a::attr(href)").get()
# if next_page is not None:
# next_page_url = "https://quotes.toscrape.com" + next_page
# yield scrapy.Request(
# next_page_url,
# meta=dict(
# playwright=True,
# playwright_include_page=True,
# playwright_page_methods=[
# PageMethod("wait_for_selector", "div.quote")
# ],
# ),
# errback=self.errback,
# )

async def errback(self, failure):
self.logger.error(f"Failed to load page: {failure}")
page = failure.request.meta["playwright_page"]
await page.close()