Traditional Culture Encyclopedia - Weather forecast - What can python web crawler do?
What can python web crawler do?
Web crawler (also called web spider, web robot, and more often called web chaser in FOAF community) is a program or script that automatically crawls information on the World Wide Web according to certain rules. Other less common names are ant, automatic index, emulator or worm. Crawlers automatically traverse the pages of the website and download all the content.
Other names not commonly used by web crawlers are ant, automatic index, simulator or worm. With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, and how to effectively extract and use this information has become a huge challenge. Search engines, such as traditional general search engines AltaVista, Yahoo! As a tool to help people retrieve information, Google has become the entrance and guide for users to access the World Wide Web. However, these general search engines also have some limitations, such as:
(1) Users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines contain a large number of web pages that users don't care about.
(2) The goal of general search engine is to cover as many networks as possible, and the contradiction between limited search engine server resources and unlimited network data resources will be further deepened.
(3) With the rich data forms of the World Wide Web and the continuous development of network technology, a large number of different data such as pictures, databases, audio, video and multimedia appear, and general search engines are often unable to find and obtain these information-intensive and structured data.
(4) Most general search engines provide keyword-based retrieval, and it is difficult to support queries based on semantic information.
In order to solve the above problems, focused crawler came into being, and targeted to grab related web resources. Focus crawler is a program that automatically downloads web pages. It selectively accesses the web pages and related links on the World Wide Web according to the established crawling goal to obtain the required information. Compared with general reptiles (general? Purpose web crawler), focus crawler does not pursue large coverage, but aims to grab the web pages related to a specific topic content and prepare data resources for topic-oriented user queries.
- Related articles
- Greece belongs to Eastern Europe or Western Europe?
- There is heavy rain in the whole country 1 1 province. Will there be a new typhoon in August? What should I pay attention to?
- How to get rid of the appearance of advertisements on the screen
- Brief introduction of Zhao Kuangyin's military affairs
- Simplicity can also be very temperamental. Which sets of simple styles are worn and demonstrated, which are practical for leisure commuting?
- How to take beautiful photos of sunrise and sunset?
- Tips for Chatting 20 17
- How profound is Linyi's cultural heritage?
- There is an ancient poem about the first frost.
- East island weather