Traditional Culture Encyclopedia - Weather inquiry - What library does python crawler use?

What library does python crawler use?

Python crawler, full name Python web crawler, is a program or script that automatically crawls information on the World Wide Web according to certain rules. It is mainly used to capture securities trading data, weather data, website user data, picture data and so on. In order to support the normal function of web crawler, Python has a large number of built-in libraries, mainly including several types. The following article will give you an introduction.

First, Python crawler network library

Python crawler network library mainly includes: urllib, requests, grab, pycurl, urllib3,: cloud execution R, Python and matlab codes.

Twelve. e-mail

● Flange: email address and Mime parsing library;

● talon: mailgun library is used to extract references and signatures of messages.

Thirteen. Website and URL operation

●furl: a small Python library, which simplifies the operation of url;

●purl: Simple and unchangeable url and clean API for debugging and operation;

● Website. Parsing: It is used to break the division between Uniform Resource Locator (URL) components, combine components into a URL string, and convert "relative URL" into absolute URL, which is called "basic URL";

●tldextract: accurately separate the registered domain and subdomain of TLD from URL, and use a public suffix list;

●etaddr: Python library for displaying and manipulating network addresses.

Fourteen Web content extraction

●ewspaper: news extraction, article extraction and content curation with Python;

●HTML2text: convert HTML into Markdown format text;

● Python-Goose: HTML content and article extractor;

●lassie: a humanized web content retrieval tool;

●micawber: a small library that extracts rich content from the website;

●sumy: a module for automatically summarizing text files and HTML pages;

●Haul: an extensible image crawler;

●Python- readability: the fast Python interface of ARC 90 readability tool;

●scrapely: a library for extracting structured data from HTML pages;

●youtube-dl: a small command line program for downloading videos from youtube;

● You-get: YouTube, Youku/Niconico video downloader of Python 3;

●WikiTeam: a tool for downloading and saving wikis.

Fifteen, WebSocket

●Crossbar: an open source application message router;

●AutobahnPython: provide Python implementation of WebSocket protocol and WAMP protocol and open source;

● web socket-for-Python: WebSocket client and server libraries for Python 2, 3 and PyPy.

Sixteen, DNS resolution

●dnsyo: check your own DNS on more than 500 DNS servers around the world/kloc-0;

● pycares: the interface of c-ares.

Seventeen, computer vision

●OpenCV: open source computer vision library;

●SimpleCV: an introduction and readable interface for cameras, image processing, feature extraction and format conversion;

●mahotas: a fast computer image processing algorithm, which is completely based on the array of numpy as its data type.

Eighteen. Agency service system

●shadowsocks: a fast tunnel proxy that can help you penetrate the firewall;

● tproxy: tproxy is a simple TCP routing proxy, based on Gevent and configured in Python.

Nineteen. List of other Python tools

● Awesome-Python

●pycrumbs

●python-github- project

●python_reference

●pythonidae