Traditional Culture Encyclopedia - Weather forecast - How to establish a thinking framework for data analysis

How to establish a thinking framework for data analysis

Someone once asked me, what is data analysis thinking? If analytical thinking is a manifestation of structure, then data analysis thinking adds another criterion on top of it:

It’s not what I think, but the data proves

This is a rule The watershed moment is that "I think" is a kind of intuitive and experiential thinking. It is impossible to rely on your own intuition for work, and the development of the company is even less likely to depend on it. Data proof is the most direct manifestation of data analysis. It relies on data-oriented thinking rather than skills. The former is guidance, while the latter is just application.

As an individual, how should we establish data analysis thinking?

丨Build your indicator system

Before we talk about indicators, let’s go back a few decades. Peter Drucker, the father of modern management, once said a very classic Words:

If you can’t measure it, then you can’t grow it effectively.

The so-called measurement means that unified standards are needed to define and evaluate business. This standard is the indicator. Suppose Lao Wang opens a fruit shop next door. If you ask him how his business is every day, he can answer that the sales are good, very good, but the recent recession has been sluggish. These are very empty words, because he thinks that a good sale may be 50 units, but what you think is a good sale is 100 units.

This is the cognitive trap caused by "I think". When you put the case into the company, you will encounter more problems: If an operator tells you that the product performs well because many people comment and praise it every day, and he also shows you a few screenshots. Another operator said that there were some problems with the product and the products he promoted were not selling well. Who should you believe?

In fact, it is difficult for anyone to believe that these unanimous judgments are all caused by a lack of data analysis thinking.

If Lao Wang wants to describe the business, he should use sales volume, which is his indicator. If the Internet wants to describe the product, it should also use indicators such as activity rate, usage rate, and conversion rate.

If you can't describe your business using metrics, then you can't grow it effectively.

Understanding and using indicators is the first step in data analysis and thinking. Next, you need to establish an indicator system. Isolated indicators cannot bring out the value of data. Like analytical thinking, indicators can also be structured and should be structured.

Let’s look at Internet products. A user will go through these steps from the beginning to leaving. E-commerce APPs and content platforms are all the same. Think about it, what indicators do you need to use?

The picture below explains what indexization is. This is the difference in thinking with and without data analysis. It is also a typical data-based operation. I can talk about this in more depth when I have time.

There is no one-size-fits-all template for the indicator system. Different business forms have different indicator systems. Mobile APPs are different from websites, SaaS is different from e-commerce, and low-frequency consumption is different from high-frequency consumption. For example, a wedding-related APP does not need to consider repurchase rate indicators; Internet finance requires risk control indicators; e-commerce has different indicators for sellers and buyers.

These require different industry experience and business knowledge to learn and master. Are there any general skills and precautions?

丨Good indicators and bad indicators

Not all indicators are good. This is a common mistake that newcomers make. Let’s go back to Lao Wang’s fruit shop and think about it, is the sales indicator a good one?

丨Prices have risen recently. Lao Wang has adjusted the price of fruits accordingly, but he does not dare to increase the price. Although the sales of fruits have not changed significantly, Lao Wang found that he did not make much in one month, and he did not even have enough money for his private life. live.

丨Lao Wang’s sales of various fruits this month were 2,000, but he still lost money in the end. After careful study, he found that although the sales were high, the fruit inventory was also high, with hundreds of units every month. The unsalable fruit eventually expires and loses money.

Both of these examples illustrate how unreliable it is to just look at sales. Sales volume is a metric, but not a good one. For a self-employed business owner like Lao Wang, the profit of the fruit shop should be the core factor.

Good indicators should be core driving indicators. While metrics are important, some metrics need to be more important. Just like sales and profits, the number of users and the number of active users, the latter is more important than the former.

Core indicators are not just numbers written in the weekly report, but goals that the entire operations team, product team and even the R&D team work towards.

The core driving indicators are related to the company's development and are the company's key directions in a stage. Remember, it is a stage, and the core driving indicators are different in different periods. The core driving indicators of different businesses are also different.

The common core indicators for Internet companies are the number of users and activity rate. The number of users represents the size and occupation of the market, and the activity rate represents the health of the product, but these are core indicators in the development stage. During product 1.0, we should focus on polishing the product and improving product quality before the big promotion. At this time, retention rate is a core indicator. In the later stage of a product with a certain user base, commercialization is more important than activity. We will focus on money-related indicators, such as advertising click-through rates, profit margins, etc.

Core driving indicators are generally the company's overall goals. If you look at individual job responsibilities, you can also find your own core indicators. For example, content operations can focus on the number of readings and reading time.

Core driving indicators will definitely bring the greatest advantages and benefits to the company and individuals. Remember the 80/20 rule? 20% of the indicators will definitely bring 80% of the effect, and these 20% of the indicators are the core.

On the other hand, a good indicator has another characteristic, which should be a ratio or proportion.

Just take the number of active users to explain. We have 100,000 active users. What does this mean? This doesn't mean anything. If the product itself has tens of millions of registered users, then 100,000 users means it is very unhealthy and the product is in decline. If a product only has four to five million users, it means the product is very sticky.

Precisely because the number of active users alone does not mean much, operations and products will pay more attention to the activity rate. This metric is a ratio calculated by dividing the number of active users by the total number of users. So when setting up an indicator, we all try to think about whether it can be a ratio.

What are the bad indicators?

One is the vanity indicator, which has no practical meaning.

The product has hundreds of thousands of exposures in the app store. Does it make sense? No, what I need is the actual download. Does it make sense to download it? It's not big, I hope the user registration is successful. Exposure and downloads are both vanity indicators, but the degree of vanity is different.

New media are pursuing the number of readings of WeChat public accounts. If you rely on readings for advertising, then readings are meaningful. If you rely on pictures and text to sell products, then you should pay more attention to conversion rates and product sales. After all, it is an exaggeration. The title can bring high reading volume, and the reading volume at this time is a vanity indicator. It’s a pity that many bosses still tirelessly pursue 10W+, even if the amount is brushed.

Vanity indicators are meaningless indicators. They often look good and can disguise the performance of operations and products, but we must avoid using them.

The second bad indicator is the posterior indicator, which often only reflects what has already happened.

For example, I have a definition of lost users: those who have not opened the APP for three months are considered lost. Then the number of lost users counted by the operation every day is that they have not been opened for a long time. In terms of timeliness, it has happened for a long time and it is difficult to recover through measures. I know that I have hurt users because of a bad operation method, but is it still useful?

The ROI (return on investment) of activity operations is also a posteriori indicator. The benefits of an activity can only be known after paying the cost. But the cost has been spent, and the success or failure of the activity is also determined. The activity cycle is long and there is room for adjustment. If the activity is short-term, this indicator can only be used for review, but it cannot drive business.

The third bad metric is the complexity metric, which traps data analysis in a trap created by a bunch of metrics.

Indicators can be subdivided and disassembled. For example, the activity rate can be subdivided into daily activity rate, weekly activity rate, monthly activity rate, old user activity rate, etc. Data analysis should select indicators based on the specific situation. If it is a weather tool, you can choose the daily activity rate. If it is a social APP, you can choose the weekly activity rate. For more frequent products, you can choose the monthly activity rate.

Each product has several indicators suitable for it. Don’t install a bunch of indicators all at once. When you prepare twenty or thirty indicators for analysis, you will find that you have no way to start.

丨Indicator Structure

Since it is not good to have too many indicators and too complex, how should we choose the indicators correctly?

Like the pyramid structure of analytical thinking, indicators also have an inherent structure, showing a tree shape. The core of the indicator structure is based on business processes and structure-oriented.

Suppose you are a content operator and need to analyze your existing business and improve content-related data. What would you do?

We transform pyramid thinking into a data analysis method.

Start from the content operation process, which is: content collection – content editing and publishing – user browsing – user clicking – user reading – user commenting or forwarding – continuing to browse the next article.

This is a standard process, and each process has indicators that can be established. Content collection can establish a hotspot index to see which piece of content is more popular. User browsing and user clicks are standard PV and UV statistics, and user reading is the reading time.

Building an indicator framework from a process perspective can comprehensively include user-related data without omissions.

The indicators listed in this framework must still follow the indicator principles: there need to be core driving indicators. Remove vanity indicators, delete them appropriately, don't add indicators for the sake of adding indicators.

丨Dimensional analysis method

When you have indicators, you can start to analyze. Data analysis can be roughly divided into three categories. The first category is to use dimensions to analyze data, and the second category is Use statistical knowledge such as data distribution hypothesis testing, and the last category is the use of machine learning. Let’s first take a look at dimensional analysis.

Dimensions are parameters that describe objects. In specific analysis, we can think of them as the angle from which to analyze things. Sales volume is a perspective, activity rate is a perspective, and time is also a perspective, so they can all be counted as dimensions.

When we have dimensions, we can form a data model through different combinations of dimensions. The data model is not a sophisticated concept, it is just a data cube.

The picture above is a data model/data cube composed of three dimensions. They are product type, time and region. We can not only obtain the sales volume of electronic products in the Shanghai area in the second quarter of 2010, but also know the sales volume of books in the Jiangsu area in the first quarter of 2010.

Data models organize complex data in a structured form. The indicators we talked about before can all be used as dimensions. The following is an example:

丨 Combine the three dimensions of user type, activity, and time to observe the use of the product by different user groups. Is the usage time of group A more obvious?

丨 Combine the three dimensions of product type, order amount, and region to observe whether there are sales differences among different products in different regions?

The data model can observe data from different angles and levels, which improves the flexibility of analysis and meets different analysis needs. This process is called OLAP (Online Analytical Processing). Of course, it involves more complex data modeling and data warehousing, etc., we don't need to know in detail.

The data model also has several common techniques called drilling, rolling, and slicing.

Selecting is to continue subdividing the dimensions. For example, Zhejiang Province is subdivided into Hangzhou City, Wenzhou City, Ningbo City, etc., and the first quarter of 2010 becomes January, February, and March. Rolling up is the opposite concept of drilling, which aggregates dimensions, such as Zhejiang, Shanghai, and Jiangsu into the Zhejiang-Shanghai dimension. Slicing is to select specific dimensions, such as only the Shanghai dimension, or only the first quarter of 2010 dimension. Because the data cube is multi-dimensional, we can only observe and compare data in two dimensions, that is, tables.

The tree structure in the above figure represents drilling (subdivision of source and time), and then obtaining specific data through air slicing of Route.

If you are smart, you may have thought that the commonly used pivot table is a kind of dimensional analysis, which puts the dimensions to be analyzed into rows and columns for calculations such as sums, counts, averages, etc. Here is a picture of a case that has been used before: using the city dimension and working years dimension to calculate the average salary.

In addition to Excel, BI, R, and Python, dimensional analysis can be used. BI is relatively the easiest.

When talking about the dimensional method, what I want to emphasize is one of the core ideas of analysis: comparison, comparison of different dimensions. This is probably one of the best shortcuts for newcomers to quickly improve. For example, the comparison of time trends between the past and the present, the comparison of different regional dimensions, the difference of product types, and the comparison of different user groups. Single data has no analytical significance. Only the combination of multiple data can bring out the maximum value of the data.

I want to analyze the company's profit, profit = sales - cost. Then find out the indicators/dimensions involved in sales, such as product type, region, user groups, etc., and through continuous combination and disassembly, find out the reasons for problems or good performance. The same goes for cost.

This is correct data analysis thinking. Let’s summarize: We establish and filter out indicators through business, use indicators as dimensions, and use dimensions for analysis.

Many people will ask, what is the difference between indicators and dimensions?

Dimensions are the angles from which things are explained and observed, and indicators are the standards for measuring data. Dimensions are a larger scope, not just data. For example, time dimensions and city dimensions cannot be represented by indicators, but indicators (retention rate, bounce rate, browsing time, etc.) can become dimensions. Popular understanding: Dimension>Indicator.

At this point, everyone already has a thinking framework for data analysis. The reason why it is a framework is because there is still a lack of specific skills, such as how to verify that a certain dimension is the key to affecting data, such as how to use machine learning to improve business. These involve data and statistical knowledge, which will be explained later.

I would like to emphasize here that data analysis is not a result, but a process. Remember the saying “If you can’t measure it, you can’t grow it effectively”? The ultimate goal of data analysis is to grow your business. If data analysis requires performance indicators, it will not be about whether the analysis is right or wrong, but the result of the final data improvement.

Data analysis requires feedback. When I analyze that a certain element affects the business results, then I will verify it. Tell the operations and product people to see how the improved data looks like, and everything will be based on the results. If the results don't improve, it's time to rethink the analysis process.

This is also an element of data analysis, which is result-oriented. If the analysis is only presented as a report without any subsequent follow-up and improvement measures, then the data analysis will be zero.

Business guides data and data drives business. This is the only method.

Author: Qin Lu