This repository has been archived by the owner on Dec 17, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 69
运行出问题 #11
Comments
1、5月29号,网站调整了几个搜索入口的地址,要修改一下。 |
能力有限,还没改好,急着用数据做毕业设计,有最新版本吗?万分感谢! |
我没有用scrapy,我自己的解析文件贴出来,你参考下。
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
2018-05-30 15:33:15 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: crawler)
2018-05-30 15:33:15 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.5.0, Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.0.2n 7 Dec 2017), cryptography 2.1.4, Platform Windows-10-10.0.16299-SP0
2018-05-30 15:33:15 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'crawler', 'COOKIES_DEBUG': True, 'DOWNLOAD_DELAY': 1.0, 'DOWNLOAD_TIMEOUT': 10, 'LOG_FILE': 'C:\Users\myh\Desktop\PatentCrawler-master\output\20180530_153315\PatentCrawler.log', 'NEWSPIDER_MODULE': 'crawler.spiders', 'RETRY_TIMES': 3, 'SPIDER_MODULES': ['crawler.spiders']}
2018-05-30 15:33:15 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-05-30 15:33:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'crawler.middlewares.PatentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-05-30 15:33:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-05-30 15:33:16 [scrapy.middleware] INFO: Enabled item pipelines:
['crawler.pipelines.CrawlerPipeline']
2018-05-30 15:33:16 [scrapy.core.engine] INFO: Spider opened
2018-05-30 15:33:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-05-30 15:33:16 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-05-30 15:33:17 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:17 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/pageIsUesd-pageUsed.shtml HTTP/1.1" 200 None
2018-05-30 15:33:17 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:17 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/patentsearch/tableSearch-showTableSearchIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/login-showPic.shtml HTTP/1.1" 200 None
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/wee/platform/wee_security_check HTTP/1.1" 302 None
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uilogin-loginSuccess.shtml?params=991CFE73D4DF553253D44E119219BF31366856FF4B15222669397E093A956A2C&j_loginsuccess_url= HTTP/1.1" 302 None
2018-05-30 15:33:18 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uiIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/showViewList-jumpToView.shtml HTTP/1.1" 200 None
2018-05-30 15:33:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml> (failed 1 times): unlogin
2018-05-30 15:33:19 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml>
Cookie: JSESSIONID=x1Sv9YxmnHdXesCJk04Y3SMqTX3yBIpnhcwf0uKlEOg9TlE-gYYY!309799008!187544033; IS_LOGIN=true; WEE_SID=x1Sv9YxmnHdXesCJk04Y3SMqTX3yBIpnhcwf0uKlEOg9TlE-gYYY!309799008!187544033!1527665495142
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/pageIsUesd-pageUsed.shtml HTTP/1.1" 200 None
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/patentsearch/tableSearch-showTableSearchIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:19 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/login-showPic.shtml HTTP/1.1" 200 None
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/wee/platform/wee_security_check HTTP/1.1" 302 None
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uilogin-loginSuccess.shtml?params=991CFE73D4DF553253D44E119219BF31366856FF4B15222669397E093A956A2C&j_loginsuccess_url= HTTP/1.1" 302 None
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uiIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/showViewList-jumpToView.shtml HTTP/1.1" 200 None
2018-05-30 15:33:20 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml> (failed 2 times): unlogin
2018-05-30 15:33:20 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml>
Cookie: JSESSIONID=enOv9ZPDdp7oeLhqlYjU_gHhiJA63dF52InwKDPUfwSJwT4OC0x4!309799008!187544033; IS_LOGIN=true; WEE_SID=enOv9ZPDdp7oeLhqlYjU_gHhiJA63dF52InwKDPUfwSJwT4OC0x4!309799008!187544033!1527665497027
2018-05-30 15:33:20 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/pageIsUesd-pageUsed.shtml HTTP/1.1" 200 None
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/patentsearch/tableSearch-showTableSearchIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/login-showPic.shtml HTTP/1.1" 200 None
2018-05-30 15:33:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/wee/platform/wee_security_check HTTP/1.1" 302 None
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uilogin-loginSuccess.shtml?params=991CFE73D4DF553253D44E119219BF31366856FF4B15222669397E093A956A2C&j_loginsuccess_url= HTTP/1.1" 302 None
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uiIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/showViewList-jumpToView.shtml HTTP/1.1" 200 None
2018-05-30 15:33:22 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml> (failed 3 times): unlogin
2018-05-30 15:33:22 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml>
Cookie: JSESSIONID=fdyv9Zmxa7oMcWvdvBHwiuh8nvKhmeaYnZ03iat0rUfX2SfDs-5E!309799008!187544033; IS_LOGIN=true; WEE_SID=fdyv9Zmxa7oMcWvdvBHwiuh8nvKhmeaYnZ03iat0rUfX2SfDs-5E!309799008!187544033!1527665498545
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/pageIsUesd-pageUsed.shtml HTTP/1.1" 200 None
2018-05-30 15:33:22 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/patentsearch/tableSearch-showTableSearchIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/login-showPic.shtml HTTP/1.1" 200 None
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/wee/platform/wee_security_check HTTP/1.1" 302 None
2018-05-30 15:33:23 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uilogin-loginSuccess.shtml?params=991CFE73D4DF553253D44E119219BF31366856FF4B15222669397E093A956A2C&j_loginsuccess_url= HTTP/1.1" 302 None
2018-05-30 15:33:24 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "GET /sipopublicsearch/portal/uiIndex.shtml HTTP/1.1" 200 None
2018-05-30 15:33:24 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.pss-system.gov.cn
2018-05-30 15:33:24 [urllib3.connectionpool] DEBUG: http://www.pss-system.gov.cn:80 "POST /sipopublicsearch/patentsearch/showViewList-jumpToView.shtml HTTP/1.1" 200 None
2018-05-30 15:33:24 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml> (failed 4 times): unlogin
2018-05-30 15:33:24 [scrapy.core.scraper] ERROR: Error downloading <POST http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml>
Traceback (most recent call last):
File "D:\Program Files (x86)\anaconda\lib\site-packages\twisted\internet\defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "D:\Program Files (x86)\anaconda\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\Program Files (x86)\anaconda\lib\site-packages\twisted\internet\defer.py", line 1363, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <404 http://www.pss-system.gov.cn/sipopublicsearch/patentsearch/executeTableSearch0402-executeCommandSearch.shtml>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Program Files (x86)\anaconda\lib\site-packages\twisted\internet\defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "D:\Program Files (x86)\anaconda\lib\site-packages\scrapy\core\downloader\middleware.py", line 56, in process_response
(six.get_method_self(method).class.name, type(response))
AssertionError: Middleware PatentMiddleware.process_response must return Response or Request, got <class 'NoneType'>
2018-05-30 15:33:24 [scrapy.core.engine] INFO: Closing spider (finished)
2018-05-30 15:33:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 4368,
'downloader/request_count': 4,
'downloader/request_method_count/POST': 4,
'downloader/response_bytes': 6301,
'downloader/response_count': 4,
'downloader/response_status_count/404': 4,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 5, 30, 7, 33, 24, 666286),
'log_count/DEBUG': 56,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'retry/count': 3,
'retry/max_reached': 1,
'retry/reason_count/unlogin': 3,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'start_time': datetime.datetime(2018, 5, 30, 7, 33, 16, 985230)}
2018-05-30 15:33:24 [scrapy.core.engine] INFO: Spider closed (finished)
The text was updated successfully, but these errors were encountered: