Traditional Culture Encyclopedia - Photography major - R data visualization: PCA principal component analysis diagram

R data visualization: PCA principal component analysis diagram

For another example, chestnut, if you are a photographer of a flower planting tool brochure, you shoot a kettle. The kettle is three-dimensional, but the picture is two-dimensional. In order to show the kettle to customers more comprehensively, you need to take some photos from different angles. The picture below shows the photos you took from four directions:

In the first picture, you can see the back of the kettle, but not the front.

The second picture was taken in front, and you can see the spout. This picture can provide the missing information in the first picture, but the handle of the kettle can't be seen.

In the third plan view, you can see the spout and handle, but you can't see the height of the pot.

The fourth picture is what you plan to put in the catalog. The height, top, spout and handle of the kettle are clearly visible.

The design concept of PCA is similar. It can map high-dimensional data sets to low-dimensional space, while retaining as many variables as possible.

Can I make a PCA diagram similar to SIMCA-P with R language?

The answer is yes. Using R language, we can not only make PCA diagrams like SIMCA-P, but also make better-looking diagrams than SIMCA-P, and the upper limit of good-looking only depends on personal aesthetic style.

Principal component analysis chart = scatter chart+confidence ellipse, and the abscissa and ordinate of scatter point correspond to first principal component and the second principal component of PCA.

Next, I want to add a classification color to the scattered points:

The color is added, but how did the ellipse become three?

It turns out that the stat_ellipse function calculates its own confidence interval for each type of data by default. How to calculate only one confidence interval for multi-class samples? Check the help documentation for stat_ellipse:

It turns out that the stat_ellipse function will inherit the aes setting in ggplot by default. If you want stat_ellipse to use its own aes settings, you need to set the parameter inherit.aes to FALSE.

Next, fine-tune the style: customize colors for different categories of samples, add X-axis and Y-axis titles, and add titles:

Comparing the drawing results with SIMCA-P, the scattered points and ellipses are basically the same, but they are more pleasing to the eye ~

Welcome to leave a message for discussion. If this article is helpful to you, it would be better to like it!

[1] mastering machine learning with scikit -learn

[1] R data visualization: horizontal gradient histogram

[2] R data visualization: double coordinate system column chart

[3] R data visualization: box diagram

[4] R data visualization: circular histogram