Using loginform with scrapy
Mia Lopez
The scrapy framework () provides a library for use when logging into websites that require authentication, .
I have looked through the docs for both programs however I cannot seem to figure out how to get scrapy to call loginform before running. The login works fine with just loginform.
Thanks
2 Answers
loginform is just a library, totally decoupled from Scrapy.
You have to write the code to plug it in the spider you want, probably in a callback method.
Here is an example of a structure to do this:
import scrapy
from loginform import fill_login_form
class MySpiderWithLogin(scrapy.Spider): name = 'my-spider' start_urls = [ ' ' ] login_url = ' login_user = 'your-username' login_password = 'secret-password-here' def start_requests(self): # let's start by sending a first request to login page yield scrapy.Request(self.login_url, self.parse_login) def parse_login(self, response): # got the login page, let's fill the login form... data, url, method = fill_login_form(response.url, response.body, self.login_user, self.login_password) # ... and send a request with our login data return scrapy.FormRequest(url, formdata=dict(data), method=method, callback=self.start_crawl) def start_crawl(self, response): # OK, we're in, let's start crawling the protected pages for url in self.start_urls: yield scrapy.Request(url) def parse(self, response): # do stuff with the logged in response 0 I managed to get it working without the loginform library, my solution is below.
import scrapy
import requests
class Spider(scrapy.Spider): name = 'spider' start_urls = [ ' ] def start_requests(self): return [scrapy.FormRequest("login.php", formdata={'username': 'user', 'password': 'pass'}, callback=self.start_crawl)] def start_crawl(self, response): #start crawling 1