Traditional Culture Encyclopedia - Hotel franchise - Ask the great god for help to see why I, a simple little reptile, can't get the data. Urgent! ! !

Ask the great god for help to see why I, a simple little reptile, can't get the data. Urgent! ! !

Close the ROBOTSTXT _ OBEY function that comes with scrapy, find this variable in setting, and set it to False to solve it.

When we observe the output when scrapy grabs the package, we can find that it will first request a txt file from the server root directory, and then request the url we set:

This file specifies the range that the site allows crawler machines to crawl (for example, if you don't want Baidu to crawl your page, you can restrict it through robot). Because the default scrapy follows the robot protocol, you will first request this file to view your permissions.

We change ROBOTSTXT _ observe to False in the settings, so that scrapy will not follow the robot protocol, and then it can crawl normally.