2024-01-24 06:56:50 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: WalkoverCrawler) 2024-01-24 06:56:50 [scrapy.utils.log] INFO: Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.8.10 (default, Nov 22 2023, 10:22:35) - [GCC 9.4.0], pyOpenSSL 23.2.0 (OpenSSL 3.1.1 30 May 2023), cryptography 41.0.2, Platform Linux-5.15.0-1038-gcp-x86_64-with-glibc2.29 2024-01-24 06:56:51 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_DEBUG': True, 'BOT_NAME': 'WalkoverCrawler', 'CONCURRENT_REQUESTS': 8, 'CONCURRENT_REQUESTS_PER_DOMAIN': 5, 'DOWNLOAD_DELAY': 5, 'FEED_EXPORT_ENCODING': 'utf-8', 'LOG_FILE': 'logs/WalkoverCrawler/viasocketCrawler/a725611aba8511eea727c3b4b62a9691.log', 'NEWSPIDER_MODULE': 'WalkoverCrawler.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['WalkoverCrawler.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} 2024-01-24 06:56:51 [asyncio] DEBUG: Using selector: EpollSelector 2024-01-24 06:56:51 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2024-01-24 06:56:51 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop 2024-01-24 06:56:51 [scrapy.extensions.telnet] INFO: Telnet Password: cbd0c969f41d5376 2024-01-24 06:56:54 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2024-01-24 06:56:54 [root] INFO: urls start url===>"https://developers.brevo.com/reference/createwebhook" 2024-01-24 06:56:56 [WDM] INFO: ====== WebDriver manager ====== 2024-01-24 06:57:03 [WDM] INFO: Get LATEST chromedriver version for google-chrome 2024-01-24 06:57:03 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): googlechromelabs.github.io:443 2024-01-24 06:57:04 [urllib3.connectionpool] DEBUG: https://googlechromelabs.github.io:443 "GET /chrome-for-testing/latest-patch-versions-per-build.json HTTP/1.1" 200 3818 2024-01-24 06:57:04 [WDM] INFO: Get LATEST chromedriver version for google-chrome 2024-01-24 06:57:04 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): googlechromelabs.github.io:443 2024-01-24 06:57:05 [urllib3.connectionpool] DEBUG: https://googlechromelabs.github.io:443 "GET /chrome-for-testing/latest-patch-versions-per-build.json HTTP/1.1" 200 3818 2024-01-24 06:57:05 [WDM] INFO: Driver [/home/khangori850/.wdm/drivers/chromedriver/linux64/118.0.5993.70/chromedriver-linux64/chromedriver] found in cache 2024-01-24 06:57:05 [selenium.webdriver.common.service] DEBUG: Started executable: `/home/khangori850/.wdm/drivers/chromedriver/linux64/118.0.5993.70/chromedriver-linux64/chromedriver` in a child process with pid: 2454192 2024-01-24 06:57:06 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://localhost:32877/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "pageLoadStrategy": "normal", "goog:chromeOptions": {"extensions": [], "args": ["--headless", "--no-sandbox", "--disable-dev-shm-usage", "--remote-debugging-port=9222"]}}}} 2024-01-24 06:57:06 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): localhost:32877 2024-01-24 06:57:14 [urllib3.connectionpool] DEBUG: http://localhost:32877 "POST /session HTTP/1.1" 200 0 2024-01-24 06:57:14 [selenium.webdriver.remote.remote_connection] DEBUG: Remote response: status=200 | data={"value":{"capabilities":{"acceptInsecureCerts":false,"browserName":"chrome","browserVersion":"118.0.5993.117","chrome":{"chromedriverVersion":"118.0.5993.70 (e52f33f30b91b4ddfad649acddc39ab570473b86-refs/branch-heads/5993@{#1216})","userDataDir":"/tmp/.org.chromium.Chromium.soviuH"},"fedcm:accounts":true,"goog:chromeOptions":{"debuggerAddress":"localhost:9222"},"networkConnectionEnabled":false,"pageLoadStrategy":"normal","platformName":"linux","proxy":{},"setWindowRect":true,"strictFileInteractability":false,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"unhandledPromptBehavior":"dismiss and notify","webauthn:extension:credBlob":true,"webauthn:extension:largeBlob":true,"webauthn:extension:minPinLength":true,"webauthn:extension:prf":true,"webauthn:virtualAuthenticators":true},"sessionId":"951c283187ae1481f42859f485d160fc"}} | headers=HTTPHeaderDict({'Content-Length': '848', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'}) 2024-01-24 06:57:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request 2024-01-24 06:57:14 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy_selenium_custom.SeleniumMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-24 06:57:15 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-24 06:57:20 [scrapy.middleware] INFO: Enabled item pipelines: ['WalkoverCrawler.pipelines.WalkovercrawlerPipeline'] 2024-01-24 06:57:20 [scrapy.core.engine] INFO: Spider opened 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): r3.o.lencr.org:80 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): r3.o.lencr.org:80 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): r3.o.lencr.org:80 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: http://r3.o.lencr.org:80 "POST / HTTP/1.1" 200 503 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: OCSP response status: 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Verifying response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Responder is issuer 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: http://r3.o.lencr.org:80 "POST / HTTP/1.1" 200 503 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: OCSP response status: 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Verifying response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Responder is issuer 2024-01-24 06:57:21 [urllib3.connectionpool] DEBUG: http://r3.o.lencr.org:80 "POST / HTTP/1.1" 200 503 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: OCSP response status: 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Verifying response 2024-01-24 06:57:21 [pymongo.ocsp_support] DEBUG: Responder is issuer 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: Caching OCSP response. 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: Caching OCSP response. 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: Caching OCSP response. 2024-01-24 06:57:23 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Using cached OCSP response. 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: Using cached OCSP response. 2024-01-24 06:57:24 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:25 [pymongo.ocsp_support] DEBUG: Peer did not staple an OCSP response 2024-01-24 06:57:25 [pymongo.ocsp_support] DEBUG: Requesting OCSP data 2024-01-24 06:57:25 [pymongo.ocsp_support] DEBUG: Trying http://r3.o.lencr.org 2024-01-24 06:57:25 [pymongo.ocsp_support] DEBUG: Using cached OCSP response. 2024-01-24 06:57:25 [pymongo.ocsp_support] DEBUG: OCSP cert status: 2024-01-24 06:57:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-24 06:57:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 2024-01-24 06:57:29 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading : Unsupported URL scheme '': no handler available for that scheme Traceback (most recent call last): File "/home/khangori850/myenv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks result = context.run( File "/home/khangori850/myenv/lib/python3.8/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/home/khangori850/myenv/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request return (yield download_func(request=request, spider=spider)) File "/home/khangori850/myenv/lib/python3.8/site-packages/scrapy/utils/defer.py", line 74, in mustbe_deferred result = f(*args, **kw) File "/home/khangori850/myenv/lib/python3.8/site-packages/scrapy/core/downloader/handlers/__init__.py", line 83, in download_request raise NotSupported( scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme 2024-01-24 06:57:29 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://localhost:32877/session/951c283187ae1481f42859f485d160fc/url {"url": "%22https://developers.brevo.com/reference/createwebhook%22"} 2024-01-24 06:58:46 [urllib3.connectionpool] DEBUG: http://localhost:32877 "POST /session/951c283187ae1481f42859f485d160fc/url HTTP/1.1" 400 0 2024-01-24 06:58:46 [selenium.webdriver.remote.remote_connection] DEBUG: Remote response: status=400 | data={"value":{"error":"invalid argument","message":"invalid argument\n (Session info: headless chrome=118.0.5993.117)","stacktrace":"#0 0x55b24d3d5fb3 \u003Cunknown>\n#1 0x55b24d0a92f6 \u003Cunknown>\n#2 0x55b24d091b75 \u003Cunknown>\n#3 0x55b24d08ff13 \u003Cunknown>\n#4 0x55b24d09035a \u003Cunknown>\n#5 0x55b24d0abb8e \u003Cunknown>\n#6 0x55b24d12c3b5 \u003Cunknown>\n#7 0x55b24d112942 \u003Cunknown>\n#8 0x55b24d12bc02 \u003Cunknown>\n#9 0x55b24d112713 \u003Cunknown>\n#10 0x55b24d0e518b \u003Cunknown>\n#11 0x55b24d0e5f7e \u003Cunknown>\n#12 0x55b24d39b8d8 \u003Cunknown>\n#13 0x55b24d39f800 \u003Cunknown>\n#14 0x55b24d3a9cfc \u003Cunknown>\n#15 0x55b24d3a0418 \u003Cunknown>\n#16 0x55b24d36d42f \u003Cunknown>\n#17 0x55b24d3c44e8 \u003Cunknown>\n#18 0x55b24d3c46b4 \u003Cunknown>\n#19 0x55b24d3d5143 \u003Cunknown>\n#20 0x7f511f23f609 start_thread\n"}} | headers=HTTPHeaderDict({'Content-Length': '856', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'}) 2024-01-24 06:58:46 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request 2024-01-24 06:58:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-24 06:58:46 [scrapy.core.scraper] ERROR: Error downloading scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/khangori850/myenv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks result = context.run(gen.send, result) File "/home/khangori850/myenv/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request method(request=request, spider=spider) File "/home/khangori850/WalkoverCrawler/walkover-crawler/scrapy_selenium_custom/middlewares.py", line 98, in process_request self.driver.get(request.url) File "/home/khangori850/myenv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 353, in get self.execute(Command.GET, {"url": url}) File "/home/khangori850/myenv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 344, in execute self.error_handler.check_response(response) File "/home/khangori850/myenv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidArgumentException: Message: invalid argument (Session info: headless chrome=118.0.5993.117) Stacktrace: #0 0x55b24d3d5fb3 #1 0x55b24d0a92f6 #2 0x55b24d091b75 #3 0x55b24d08ff13 #4 0x55b24d09035a #5 0x55b24d0abb8e #6 0x55b24d12c3b5 #7 0x55b24d112942 #8 0x55b24d12bc02 #9 0x55b24d112713 #10 0x55b24d0e518b #11 0x55b24d0e5f7e #12 0x55b24d39b8d8 #13 0x55b24d39f800 #14 0x55b24d3a9cfc #15 0x55b24d3a0418 #16 0x55b24d36d42f #17 0x55b24d3c44e8 #18 0x55b24d3c46b4 #19 0x55b24d3d5143 #20 0x7f511f23f609 start_thread 2024-01-24 06:58:47 [root] INFO: spider closed! 2024-01-24 06:58:47 [scrapy.core.engine] INFO: Closing spider (finished) 2024-01-24 06:58:47 [selenium.webdriver.remote.remote_connection] DEBUG: DELETE http://localhost:32877/session/951c283187ae1481f42859f485d160fc {} 2024-01-24 06:58:47 [urllib3.connectionpool] DEBUG: http://localhost:32877 "DELETE /session/951c283187ae1481f42859f485d160fc HTTP/1.1" 200 0 2024-01-24 06:58:47 [selenium.webdriver.remote.remote_connection] DEBUG: Remote response: status=200 | data={"value":null} | headers=HTTPHeaderDict({'Content-Length': '14', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'}) 2024-01-24 06:58:47 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request 2024-01-24 06:58:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/exception_count': 2, 'downloader/exception_type_count/scrapy.exceptions.NotSupported': 1, 'downloader/exception_type_count/selenium.common.exceptions.InvalidArgumentException': 1, 'downloader/request_bytes': 213, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'elapsed_time_seconds': 78.985364, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 1, 24, 6, 58, 47, 270509), 'log_count/DEBUG': 66, 'log_count/ERROR': 2, 'log_count/INFO': 17, 'memusage/max': 70991872, 'memusage/startup': 70991872, "robotstxt/exception_count/": 1, 'robotstxt/request_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2024, 1, 24, 6, 57, 28, 285145)} 2024-01-24 06:58:48 [scrapy.core.engine] INFO: Spider closed (finished)