Traditional Culture Encyclopedia - Tourist attractions - The inter-group difference test is finally clear!

The inter-group difference test is finally clear!

What is the difference test between groups? It is the difference analysis and significance test between groups, and the statistical hypothesis test method is used to test whether there are differences between groups and their degree. Frankly speaking, all variance tests are based on one assumption: there is no difference between groups and there is no relationship between variables (that is, the original assumption). Teacher Wang Cheng of Shanghai Jiaotong University also said that ANOVA is actually a hypothesis test to study whether there are differences at different levels. Hypothesis test is a process of putting forward some assumptions about the overall parameters, and then judging whether these assumptions are valid by using sample information.

Therefore, in a responsible attitude, it is necessary for us to review the basic concepts of hypothesis testing in probability theory and mathematical statistics at the beginning of this paper.

Among them, the concept of parameter is the most worthy of our understanding, because today's difference test between protagonist groups can be divided into two categories: parameter test and nonparametric test. So what is parametric test and nonparametric test, and what is the difference between them? To understand the previous problems, we need to understand the concept of statistical inference first.

Statistical inference is a statistical method to study how to infer the overall characteristics from sample data, including parameter estimation and hypothesis testing. The parameters of the population are generally unknown, which can usually be estimated by sample statistics. For example, the sample mean can be used for point estimation and the distribution of sample mean can be used for interval estimation, which is called parameter estimation.

The difference between parametric test and nonparametric test;

So when to use parametric test and nonparametric test? Generally, nonparametric test does not directly analyze the observed values of the sample, and the calculation of statistics is based on the sorting of the original data in the whole sample, and the specific values of the observed values are discarded. Therefore, parametric testing should be the first choice for all materials suitable for parametric testing. However, if it is not clear whether it is suitable for parametric test data, nonparametric test should be adopted.

Maybe everyone here expects the author to take us back to the four steps of hypothesis testing (putting forward hypothesis; Structural inspection statistics; According to the significance level, the critical value and rejection domain are determined; However, some arrogant authors do not make up for the courses we have not learned well. Let's make up the lessons ourselves. He turned to what he thought was important: sampling distribution.

It is very important to know the overall state of our research object. The isomorphism of three sampling distributions (-distribution, distribution and-distribution) and normal distribution constitutes the basis of modern mathematical statistics, in which normal distribution and-distribution are about the mean value; Distribution and-distribution are distributions about variance. Many students have done statistics for many years, but they don't know why almost every analysis of variance is valuable. It can be seen that the last spell of statistics is the foundation.

Without distribution, hypothesis testing is impossible; Without hypothesis testing, ANOVA has no basis. Again, for humanitarian reasons, let's review the sample distribution.

Let X 1, X2, ... XN be independent of each other and obey the standard normal distribution N(0, 1), then the random variable χ 2 = X 12+X22+...+XN2 is said to obey the distribution with n degrees of freedom.

Let N(0, 1) obey the standard normal distribution and the distribution of n degrees of freedom, and they are independent of each other, then the distribution that variables obey is called-n degrees of freedom distribution.

Suppose that a distribution with degrees of freedom obeys and a distribution with degrees of freedom obeys and is independent of each other, then the distribution of variables obeys is called a distribution, where the first degree of freedom is 0 and the second degree of freedom is 0. Generally speaking, here f is the mean square ratio.

Whether parametric test or nonparametric test, hypothesis test should be based on a specific distribution. When the population distribution is known, for example, the population obeys normal distribution, we can look up the table to get the critical value according to the given significance level (usually 0.0 1 or 0.05). When the overall distribution is unknown, the empirical distribution can be constructed by permutation test first, and then the critical value can be obtained according to the significance level.

The traditional statistical test method is to determine the significance level before the test, that is, to determine the critical value and rejection domain in advance. In this way, no matter whether the value of the test statistic is large or small, as long as its value falls within the rejection domain, the original hypothesis will be rejected, otherwise it will not be rejected. This method with a given significance level cannot give an accurate measure of the inconsistency between the observed data and the original hypothesis. In order to measure the deviation between the observed data and the assumed value in the original hypothesis, it is necessary to calculate the pvalue value. The pvalue value, also known as the level of observation significance, is expressed as the probability of obtaining the actual observation sample results if the original hypothesis is correct. The smaller the pvalue value is, the greater the inconsistency between the actual observation data and, and the more significant the test results are.

There are many variables, and it is very common that many tests are needed to judge the differences between groups in metagenomic amplicon difference analysis. In this case, the test standard based on single comparison will become too loose, making the error rate (FDR value FalseDiscovery Rate) in positive results very large (unbearable). What can we do? The best way is to raise the standard of judgment (P value), so the probability of making a single judgment error will be reduced, and the probability of making an overall error will also be reduced. The method of improving the judgment standard in multiple tests is called multiple test correction. Since 1979, statisticians have put forward many methods of multiple tests and corrections. Correspondingly, the names of correcting P values are different, such as FDR, Q value and adjusting P value, which we all know need to be corrected in many tests. The author has time to teach you the specific usage (this author is really a skin ~ ~).

This is the end of the theoretical knowledge about metagenome or amplicon group difference test. The author thinks it is necessary to understand the above knowledge points, and also tells us that what we are discussing today is statistical inference. In other words, we are experts in finding differences.

People learn to do statistics in order to find differences. In order to illustrate the huge differences in data between groups, people have developed many pictures that are still in use today. Let's unveil these different pictures together.

In the toolbox of data scientists, this is a durable and commonly used Swiss army knife. Almost as long as you think of variance analysis, you will think of box chart. A tool similar to block diagram, such as violin diagram, has also been developed.

There are generally evolutionary trees and hierarchical clustering trees. If you want to express the distance difference between objects, it may be the most intuitive tree diagram. In order to express the kinship with a graph, the classification unit is placed at the top of the branches on the graph, and the kinship can be expressed according to the branches, which have two dimensions and three dimensions. The tree diagram used for phenotypic classification in quantitative taxonomy is called phenotypic diagram, and the tree diagram mixed with systematic reasoning is called branching diagram to show the difference.

Dear author, children have listed the commonly used R packages to realize these graphics, which can be used after installation.

The meaning of species analysis here is that through statistical analysis, we can find out the species with significant differences in abundance between groups and get the enrichment degree of different species among different groups. At the same time, we can compare the differences within and between groups and judge whether the differences in community structure between different groups are significant. In other words, a biomarker can be found to distinguish groups.

Generally, this test only outputs p value, and its purpose is simple, that is, to test whether there are differences in similar distances between comparison groups. Commonly used analysis methods include chi-square test, student t test, Wilcoxon rank sum test and so on.

If only two samples are compared, chi-square test is suitable, but to be honest, the test results are not reliable, because at this stage, it is "difficult to convince the public" that 16s research does not repeat. Not to mention the low price, it is not difficult to repeat, even from the perspective of biology and statistics, it is necessary to repeat.

If there are two groups of samples (at least 3 duplicates), you can try the rank sum test of Student t, Welch'st and Wilcoxon. Student T-test requires that the samples conform to normal distribution and the variance is aligned. When the number of samples between groups is different and the variance is misaligned, Welch t test is a good choice.

Wilcoxon rank sum test, also known as Mann-Whitney U test, is a statistical method based on variable ranking, which does not require samples to conform to normal distribution or sample variance alignment. It is a more extensive test method, but at the same time, because the test is too loose, it is easy to bring a lot of false positives.

If it is a multi-group sample comparison, one-way ANOVA, TURKEY and Kruskal-Wallis H test can be selected. One-way ANOVA and Turkey are actually based on ANOVA, but the latter has a posteriori, so we can know the contribution of the two groups to the overall difference.

Kruskal-Wallis H test is also a rank sum test in essence, which is different from the first two tests in that it does not need the alignment of sample number and variance, and is more widely used. Kruskal-Wallis test is also called one-way nonparametric analysis of variance.

Frankly speaking, the general rank sum test or permutation test belongs to nonparametric test. In this difference test, there are two integration methods that deserve our special attention: LEfSe and metastats.

The results are as follows, and the differences are reflected in the column chart and the tree chart. The histogram of LDA value distribution shows the species whose LDA value is greater than the set value (4 by default), that is, the biomarkers with statistical differences between groups. It shows the species with significant differences in abundance in different groups, and the length of histogram represents the influence of different species (that is, LDA score).

On the phylogenetic tree, the circle radiating from inside to outside represents the taxonomic hierarchy from phylum to genus (or species). Each small circle of different classification levels represents a classification of that level, and the diameter of the small circle is proportional to the relative abundance. Coloring principle: the species with no significant difference are uniformly colored as yellow, and the biomarkers of different species are colored with the group. Red nodes represent microbial groups that play an important role in the red group, and green nodes represent microbial groups that play an important role in the green group. If a group is missing in the picture, it means that there are no significantly different species in this group, so the group is missing. The species names represented by English letters in the picture are shown in the legend on the right.

Metastats results give the p and q values of different species (the data in the table is false! )

The so-called distance-based, that is, the test is community differences rather than species. In fact, the inspection methods mentioned above can only tell you whether there are significant differences between these groups (which can be simply understood as whether there are any). If you want to know the degree of these differences at the same time (more or less understandable), you need the test methods of Arnold Sim, adonis and MRPP. These methods can not only output the test significance result (P value), but also output the degree result (R value) to judge the contribution of the group. Anosim and Adonis, which can be used for multivariate statistical test, are very suitable. It should be noted that Anosim is essentially a ranking-based algorithm. In fact, it is most suitable for NMDS. If it is PCoA analysis, it is recommended to use Adonis to test the results together.

Anosim (similarity analysis) is a nonparametric test method. It first calculates the relationship (or similarity) between samples through variables, then calculates the relationship ranking, and finally judges whether the differences between groups are significant or not through ranking replacement test. This test has two important values, one is the p value, which can judge whether the comparison between groups is significant; One is the value of r, which can be used to get the degree of difference between groups and within groups. Anosim is used to test whether the difference between groups is significantly greater than the difference within groups, so as to judge whether the grouping is meaningful. Anosim analysis uses the anosim function of R vegan package, and generally carries out the row test of significant differences between groups according to the rank of Bray-Curtis distance values. See Anosim for the detailed calculation process.

This method mainly has two numerical results: one is r, which is used to judge whether there are differences between different groups; One is p, which is used to explain whether there is a significant difference. The following two values are explained separately:

The calculation formula of r value is as follows:

RB: the average grade of differences between groups.

RW: the average level of dissimilarity within a group.

N: number of samples.

The range of r is [- 1, 1].

R>0 means that the difference between groups is greater than the difference within groups, r

R is only a numerical representation of whether there are differences between groups, and it cannot provide a meaningful explanation.

P value indicates whether the difference between different groups is significant, which is obtained by permutation test.

General principles of substitution test: (assuming that the original group is the experimental group and the control group)

1. All samples were randomly divided into experimental group and control group.

2. Calculate the R value of the current packet, i.e. Ri.

3. Repeat the current operation for n times, and sort all Ri and original R from big to small. The position where R is divided by n is the p value of the replacement test.

ADONIS, also known as replacement variance analysis or nonparametric variance analysis, is a nonparametric multivariate variance analysis based on Bray-Curtis distance. In fact, similar to the use of Anosim, the explanatory degree (R value) and grouping significance (P value) of different grouping factors for sample differences can also be given. The difference lies in the different test modes. ADONIS is essentially an analysis of variance based on F statistics, so many details are similar to the above analysis of variance. This method can analyze the explanatory degree of different grouping factors to sample differences, and analyze the statistical significance of grouping by permutation test. Adonis analysis uses the adonis function of R vegan package for analysis, and the detailed calculation process can be ADONIS.

Mrpp analysis is similar to Anosim, but MRPP is based on Bray-Curtis parameter test, which is used to analyze whether the difference of microbial community structure between groups is significant. It is usually used in combination with dimensionality reduction graphs (such as PCA, PCoA and NMDS). The MRPP analysis uses the R prime package MRPP function, and the detailed calculation process can be viewed in MRPP.

Molecular variance analysis (AMOVA), similar to ANOVA, is a nonparametric analysis method based on weighted or unweighted Unifrac distance matrix, which is used to test the significance of differences between different groups. Generally, based on the Unifrac distance, the differences between groups are analyzed by using the Amova function of mothur software, and the detailed calculation process can be found in amova.

Mantell test, Mantell test is a test of the correlation between two matrices, as the name implies, is a test. Since it is a test, there must be an original assumption, and its original assumption is that there is no correlation between the two matrices. The test process is as follows: expand two matrices correspondingly, with two columns of variables, calculate the correlation coefficient (any correlation coefficient can be calculated theoretically, but Pearson correlation coefficient is usually used), then replace one or two columns at the same time, and then calculate a value, allowing thousands of times according to the position of the actual R value in the obtained R value distribution. If it is close to the result of random arrangement, it is irrelevant, if it is far more significant than random arrangement. See Mantel test for detailed calculation process.

The author is too lazy to stick to what others have said, just copy it. At the end of the article, he moved to the original version of Zhao's words:

No matter what kind of scientific research or statistical investigation you are engaged in, significance test is widely used in various scientific research fields as a method to judge whether there are differences between two or even more data sets. As a newcomer to scientific research, the author has suffered a lot in the significance test. Later, I was obsessed with statistical theory for more than half a year before I came into contact with the fur of significance test, and I was deeply impressed by the exquisiteness, diversity and rigor of significance test theory. This feature is for the reference of non-statistical scientific research colleagues who are still struggling in the quagmire of significance test. Because the author himself is not a graduate of statistics, his views are rough and shallow, and I hope the predecessors in the industry and the leaders in the field can give me advice. Xiaoke is here to thank everyone.

Reference: