Traditional Culture Encyclopedia - Photography and portraiture - Zhihu Core User Big Data Report

Zhihu Core User Big Data Report

I recently wrote a crawler, which crawled the public information of the core users of Zhihu 3W. Although Zhihu claims that there are 65 million registered users and 18.5 million daily active users, a large part of them are three no users. Because this part of users do not disclose much data, and the new version of Zhihu server has a limit on the maximum number of requests for a single IP (about once per second), so I only climbed the core 3W users.

My reptile rule is this: randomly select 1 seeds from Zhihu V with tens of thousands of concerns, and then crawl the people they care about in turn, and then crawl the people they care about from the people they care about, so recursively. That is to say, the rules of crawler ensure that everyone who enters the database has at least one follower. The following data analysis comes from the information obtained by reptiles, so please forgive me if there is any deviation in the report.

Firstly, the word cloud analysis is carried out on the job descriptions of users in Zhihu, and the top 1 high-frequency words are listed, and the results are as follows.

In the analysis of high-frequency words in the job description, "Internet" won with a frequency of 4552 times, followed by "University" with a frequency of 2163. This is consistent with the main force that we usually see that Internet practitioners and famous school students occupy content output. These 1 high-frequency words also include the information of Zhihu users' interests, places of residence and so on, but we will analyze these carefully in the future.

let's take a look at all kinds of "best" in Zhihu. What are the highest approval numbers, the most followers and the most written answers?

first, the ranking with the highest approval number.

In the approval number, Teacher @ Zhang Jiawei surpassed the second place by one's own efforts, which is a sure champion. Then the top five are @ Fat Cat, @ Zhu Xuan, @ Tang Que and @ Pawn. Brother Wheel ranked sixth.

then let's look at the list of the most followers.

In the list of followers, Teacher @ Zhang Jiawei is still far ahead of Teacher @ Kaifu Lee. Further on, there are Zhihu bosses @ Huang Jixin and @ Kelly Y Zhou, and further on @ yolfil.

Let's look at the list with the most answers.

@Phil won the highest number of answers Top1 with extremely high output, while @vczh, who is known as "walking around", can only be ranked second. The top five are @ Wang Ruofeng, @ Chai Jianyi and @zhen-liang.

let's take a look at the list with the most questions.

@David Chang ranked first with 2684 questions, and @ Turing Don, which is famous for its future knowledge map, ranked second. The top five are @ Xin Yan, @ Cheng Han and @ Sean.

Then there is the comparison of the number of employees in BAT No.3 Factory, which is based on the crawled frequency of user's job descriptions.

It can be seen that Goose Factory has the highest proportion of employees in Zhihu, followed by Ali (word frequency: .4554), and Xiong Factory is slightly behind.

It is said that Zhihu is a place where 985/211 is flying all over the sky. So which is stronger after the resumption of diplomatic relations with Zhejiang?

It can be seen that Peking University and Tsinghua University have similar word frequencies, and the last three need to work harder.

In the era of mobile intelligence, front-end engineers of Android, iOS and WEB simply shine in today's software development. So what kind of programmers have the most in Zhihu?

As a result, the front-end word frequency is much higher than that of Android and iOS, which is actually the sum of Android and iOS. Let's put it this way: You may be a fan who firmly believes in Jobs less is more, or an Android fan who embraces open source, but everyone needs to browse the web, don't you?)

Then I'm curious about the general interests of Zhihu users.

As a result, it was found that fitness topped the list. It seems that in Zhihu, fitness is still generally advocated, so as to improve the value and attractiveness. But why is the proportion of reading the lowest? For this reason, I can only assume that the students in Zhihu are highly efficient in learning, and after completing the basic reading tasks, they go to other fields to explore a bigger world. Or reading, compared with traveling and fitness photography, is not very cost-effective, so people are more inclined to go to the gym, travel and take photos.

geographical distribution of users in Zhihu.

Word frequency is concentrated in the north, Guangzhou, Shenzhen, Hangzhou, Sichuan, Zhejiang, Jiangsu and other places. It is similar to personal subjective impression. After all, the above-mentioned darker colors are all provinces with relatively developed Internet industries.

Then there is the question of the ratio of men to women in Zhihu that everyone is most concerned about.

In the crawled user data, the proportion of men is 67.8%, while that of women is only 32.2%. That is to say, the ratio of male to female is greater than 2: 1.

After reading this, you may refute me that the gender of Zhihu's initial users is male, so it is impossible to make a big news like this. I also think it makes sense, so I further screened the core users in the core. The screening conditions are users with more than 2 fans and more than 4 approvals. This down sampling should be accurate. Then there is the picture below.

The proportion of women dropped to 3.1%, and the proportion of men was 69.9%. This data is more unbalanced than the previous data. Therefore, female users are more scarce and more precious in Zhihu.

Therefore, Zhihu is not so much a high-quality Q&A community as:

Author: Peng Jiajin Source: Zhihu.