Traditional Culture Encyclopedia - Weather inquiry - Understand AR and MA models

Understand AR and MA models

Recently, I am engaged in a little forecasting technology to predict future data with past data.

For example, in the past week, from Monday to Sunday, Wu Dalang sold 20, 20, 20, 20, 20, 20, 20, 20, 20. How much can he sell every day next week? He must predict it, or Pan Jinlian will lose if he does too much and too little.

The cake made by Dalang, as you can imagine, looks simple. However, there are many shops in Ximen Grand View, which is not easy to predict clearly.

I chose many methods, and finally decided to use XGBoost, because I have to consider the influence of wind and rain and the holidays in the Song Dynasty.

Before that, I still spent a lot of time in ARIMA. After all, cake management is time series data, and ARIMA is best at dealing with stable time series data.

To study ARIMA, to determine P and Q, we should also look at truncation and tailing of ACF and PACF. But all kinds of online tutorials, including college textbooks, are dry formulas, but they don't give examples.

I doubt it. Many people who write textbooks may not really understand them themselves.

The more I watch it, the more dizzy I get. Later, after reading the English materials, I understood. Foreigners are really idle and patient, and every detail is clearly written, as if teaching primary school students.

Let's talk about the AR model, that is, the autoregressive model. That is to calculate a function, so that every observation quantity (that is, the number of kitchen cakes sold in Da Lang every day) depends on the past quantity. For example, the 20, 20, and 20 in Wu Dalang can be designed with a function that the cake made today is equal to the cake made yesterday. The general formula is:

This is the formula of order 1, which is AR( 1). Do you see it? And then what? ? What's the connection between them? .

And AR(2) is:

Like using? Statsmodels.tsa.arima_process simulates a match? Time series data of AR( 1). Draw a picture:

It should be noted that only? ? We appointed it, and then what? ? And then what? ? Is it? Arima_process is given randomly. Is a fixed value. It's random every time.

Can you guess? Right? ? The correlation is 0.9, and? ? Right? What's the connection? , the correlation, is it? . This association will continue and be passed on. Therefore, the autocorrelation ACF diagram is as follows:

Look, a beautiful tail, long and neat.

So this? What should be the partial autocorrelation of AR( 1)? This is a concept that confuses novices.

Make a metaphor. Xiao Ming, Xiao Ming's father, Xiao Ming's grandfather and his ancestors formed an AR( 1). Remember, they will never form. AR(2). Therefore, their DNA is passed down from generation to generation. The correlation coefficients of Xiaoming and Xiaoming's father are 1/2, that of Xiaoming's grandfather and Xiaoming's father are 1/2, and that of Xiaoming and Xiaoming's grandfather are 1/4.

But what about the partial correlation between Xiaoming and Grandpa? It's too bad, it can only be zero. Here we have to twist a bend in our head to remove the influence of Xiaoming's father, that is, to remove the 1/2 gene given to Xiaoming by Xiaoming's father, and the partial autocorrelation between Xiaoming and grandpa is zero.

If it is not zero, it is broken. ?

But the partial correlation between Xiao Ming and his father, and between his father and his grandfather, is actually the autocorrelation between them, which is 1/2.

So, what about this one above? ? The partial autocorrelation graph PACF of is:

Look, what a classic truncation, first-order truncation, like a cliff. ?

If it is RA(2), it is truncated according to the second order. However, the above metaphor has entered a very difficult and incomprehensible situation.

We must assume that 1/2 of Xiaoming's DNA comes from his father, and 1/4 comes directly from his grandfather. Typing here is a little unbearable. Note that 1/4 is directly from, that is, grandpa is both a grandfather and a father.

Well, consider it science fiction.

Let's start with the moving average model horse. First of all, it should be noted that the moving average model here is not as simple as calculating the moving average.

Imagine an unrelated time series. For example, in the three-body world, their weather is ups and downs every day, and there are no four distinct seasons on the earth. On the trisomy, it was 40 degrees on the first day, and it was very hot, but it was still breathing. It was 200 degrees below zero the next day, and everyone froze to death. What about the third day? Spring warms bloom to 23 degrees. On the fourth day, it is a steel mill with 3000 degrees, and the planet turns into liquid.

In short, every day is a random number. There is no law between heaven and earth, and it doesn't matter.

What should I do? How to predict the temperature tomorrow? How to decide whether to continue to live tomorrow or to work hard?

No way, we can only make a rough moving average, such as adding up the weather in the past 10 days and dividing it by 10.

This three-body weather is a pure MA (10) model.

The general formula of a horse is:

Watch carefully. And then what? ? There is no direct relationship, not AR, but EMA? ? Value to build relationships.

Like using? Statsmodels.tsa.arima_process simulates the time series data conforming to MA( 1). Draw a picture:

Pay attention to the same, only? ? Is it designated? They are all random values.

So, can you see it ? What is the autocorrelation coefficient between them? , in this case -0.479. The autocorrelation coefficient of R_{t} R_{t-2} is 0. I can't understand this paragraph. Why? After the second order is 0, which is easy to understand. There are smart people pointing out the maze. )

Therefore, the ACF diagram of this MA is as follows:

Clean truncation ends in the order 1

PACF's chart is:

It is puzzling why the partial correlation of a standard MA( 1) is actually trailing. In my understanding, it should be an erratic oscillation process. Because it doesn't matter from the point of view of 1 post-autocorrelation, then partial autocorrelation should be irregular.

If partial correlation is still important, why is autocorrelation really important?

In ARMA or ARIMA of time series data, MA is the regression of residual in AR process.