Traditional Culture Encyclopedia - Hotel accommodation - What can python web crawler do?
What can python web crawler do?
Web crawler (also called web spider, web robot, and more often called web chaser in FOAF community) is a program or script that automatically crawls information on the World Wide Web according to certain rules. Other less common names are ant, automatic index, emulator or worm. Crawlers automatically traverse the pages of the website and download all the content.
Other names not commonly used by web crawlers are ant, automatic index, simulator or worm. With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, and how to effectively extract and use this information has become a huge challenge. Search engines, such as traditional general search engines AltaVista, Yahoo! As a tool to help people retrieve information, Google has become the entrance and guide for users to access the World Wide Web. However, these general search engines also have some limitations, such as:
(1) Users in different fields and backgrounds often have different retrieval purposes and needs, and the results returned by general search engines contain a large number of web pages that users don't care about.
(2) The goal of general search engine is to cover as many networks as possible, and the contradiction between limited search engine server resources and unlimited network data resources will be further deepened.
(3) With the rich data forms of the World Wide Web and the continuous development of network technology, a large number of different data such as pictures, databases, audio, video and multimedia appear, and general search engines are often unable to find and obtain these information-intensive and structured data.
(4) Most general search engines provide keyword-based retrieval, and it is difficult to support queries based on semantic information.
In order to solve the above problems, focused crawler came into being, and targeted to grab related web resources. Focus crawler is a program that automatically downloads web pages. It selectively accesses the web pages and related links on the World Wide Web according to the established crawling goal to obtain the required information. Compared with general reptiles (general? Purpose web crawler), focus crawler does not pursue large coverage, but aims to grab the web pages related to a specific topic content and prepare data resources for topic-oriented user queries.
- Previous article:Fuji 23mm 1.4 or 35mm 1.4, which portrait works better?
- Next article:Home Inn Front Desk Reception Salary
- Related articles
- Where is aunt's office?
- Why did Danyang Jinghang International Hotel close down?
- What is the market prospect of lovers theme hotel?
- How about the New Hedong Hotel in Changgong Town, Jianli County?
- Where is Thank Inn Chain Hotel (Jianhu Wancai International Shopping Center)
- Management Strategies and Skills of Hotel Catering Marketing Team
- Shanghai Hotel Reservation Discount Shanghai Hotel Reservation Discount Activities
- Women's clothing stores are named after women's clothing stores.
- The cheapest hotel, guesthouse, apartment, hourly room near the Chinese Opera in Nanluoguxiang, Beijing
- How about Chengdu Meitian Trading Co., Ltd.?