多項式回歸分析 (Polynomial Regression)

4月 27, 2018

套路44: 多項式回歸分析 (Polynomial Regression)

1. 使用時機: 以自變項的多項式預測一個因變項。

2. 分析類型: 回歸分析(regression analysis)。

3. 範例資料: 咪路以河口為起點測量不同距離採樣點河水中的鉛含量，資料如下。

Sample	Dist. (km)	Conc. (mg/L)
1	1.22	40.9
2	1.34	41.8
3	1.51	42.4
4	1.66	43.0
5	1.72	43.4
6	1.93	43.9
7	2.14	44.3
8	2.39	44.7
9	2.51	45.0
10	2.78	45.1
11	2.97	45.4
12	3.17	46.2
13	3.32	47.0
14	3.50	48.6
15	3.53	49.0
16	3.85	49.7
17	3.95	50.0
18	4.11	50.8
19	4.18	51.1

求多項式回歸方程式。

4. 建立資料。

第一步: 使用基本模組(base) read.table函數輸入建立資料儲存到變數m。

m <- read.table(header = T, text = "

Dist Conc

1.22 40.9

1.34 41.8

1.51 42.4

1.66 43.0

1.72 43.4

1.93 43.9

2.14 44.3

2.39 44.7

2.51 45.0

2.78 45.1

2.97 45.4

3.17 46.2

3.32 47.0

3.50 48.6

3.53 49.0

3.85 49.7

3.95 50.0

4.11 50.8

4.18 51.1") # 資料間以空白分隔

第二步: 使用基本模組(base)的attach及names函數賦予m中的資料名稱(標題) 。

attach(m) # 讓R能透過變數名稱搜尋資料。

names(m) # 賦予資料名稱(標題)。

5. 畫圖看資料分佈:

第一步: 安裝ggplot2程式套件。

第二步: 呼叫ggplot2程式套件備用。

library(ggplot2)

第三步: 畫圖。

ggplot(m, aes(x = Dist, y = Conc)) +

geom_point(shape = 1) + # 畫空心圓

geom_smooth() # 加回歸線，及95%信賴區間。

結果:

6. 執行回歸。

第一步: 使用基本模組(base)的lm函數代入m資料求回歸方程式，結果儲存至變數fit1-fit6。

fit1 <- lm(Conc ~ Dist)

fit2 <- lm(Conc ~ Dist + I(Dist^2))

fit3 <- lm(Conc ~ Dist + I(Dist^2) + I(Dist^3))

fit4 <- lm(Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4))

fit5 <- lm(Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) + I(Dist^5))

fit6 <- lm(Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) + I(Dist^5) + I(Dist^6))

# Conc為因變數，Dist為自變數，^2是二次方，^3是三次方。

第二步: 顯示判讀結果。

summary(fit1)

Call:

lm(formula = Conc ~ Dist)

Residuals:

Min 1Q Median 3Q Max

-1.2758 -0.2706 0.2599 0.4483 0.6407

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.3890 0.4364 85.67 < 2e-16 ***

Dist 3.1269 0.1510 20.71 1.69e-13 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6335 on 17 degrees of freedom

Multiple R-squared: 0.9619, Adjusted R-squared: 0.9596

F-statistic: 428.9 on 1 and 17 DF, p-value: 1.693e-13

summary(fit2)

Call:

lm(formula = Conc ~ Dist + I(Dist^2))

Residuals:

Min 1Q Median 3Q Max

-0.89062 -0.36433 0.09016 0.37248 0.68840

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 40.3017 1.1335 35.55 <2e-16 ***

Dist 0.6666 0.9135 0.73 0.4761

I(Dist^2) 0.4540 0.1669 2.72 0.0151 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5399 on 16 degrees of freedom

Multiple R-squared: 0.9739, Adjusted R-squared: 0.9707

F-statistic: 298.9 on 2 and 16 DF, p-value: 2.134e-13

summary(fit3)

Call:

lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3))

Residuals:

Min 1Q Median 3Q Max

-0.72883 -0.27353 0.00466 0.19039 1.00640

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 32.7673 3.1132 10.525 2.54e-08 ***

Dist 10.4109 3.9030 2.667 0.0176 *

I(Dist^2) -3.3868 1.5136 -2.238 0.0408 *

I(Dist^3) 0.4701 0.1844 2.549 0.0222 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4658 on 15 degrees of freedom

Multiple R-squared: 0.9818, Adjusted R-squared: 0.9782

F-statistic: 269.9 on 3 and 15 DF, p-value: 2.859e-13

summary(fit4)

Call:

lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4))

Residuals:

Min 1Q Median 3Q Max

-0.45159 -0.24518 -0.00307 0.14294 0.70094

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.9265 7.2855 0.951 0.35787

Dist 55.8348 12.4946 4.469 0.00053 ***

I(Dist^2) -31.4866 7.6054 -4.140 0.00100 **

I(Dist^3) 7.7625 1.9573 3.966 0.00141 **

I(Dist^4) -0.6751 0.1808 -3.735 0.00222 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3413 on 14 degrees of freedom

Multiple R-squared: 0.9909, Adjusted R-squared: 0.9883

F-statistic: 380.6 on 4 and 14 DF, p-value: 4.141e-14

summary(fit5)

Call:

lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) +

I(Dist^5))

Residuals:

Min 1Q Median 3Q Max

-0.41950 -0.16702 -0.02428 0.13190 0.69400

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 36.2391 22.7989 1.590 0.136

Dist -9.1615 49.5645 -0.185 0.856

I(Dist^2) 23.3871 41.2381 0.567 0.580

I(Dist^3) -14.3460 16.4561 -0.872 0.399

I(Dist^4) 3.5936 3.1609 1.137 0.276

I(Dist^5) -0.3174 0.2347 -1.353 0.199

Residual standard error: 0.3316 on 13 degrees of freedom

Multiple R-squared: 0.992, Adjusted R-squared: 0.9889

F-statistic: 322.9 on 5 and 13 DF, p-value: 3.726e-13

summary(fit6)

Call:

lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) +

I(Dist^5) + I(Dist^6))

Residuals:

Min 1Q Median 3Q Max

-0.33153 -0.18574 0.00889 0.11797 0.57277

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 157.8822 73.6834 2.143 0.0533 .

Dist -330.9759 192.2850 -1.721 0.1109

I(Dist^2) 364.0428 201.2862 1.809 0.0956 .

I(Dist^3) -199.3612 108.4010 -1.839 0.0908 .

I(Dist^4) 58.1131 31.7588 1.830 0.0922 .

I(Dist^5) -8.6070 4.8130 -1.788 0.0990 .

I(Dist^6) 0.5096 0.2956 1.724 0.1103

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.309 on 12 degrees of freedom

Multiple R-squared: 0.9936, Adjusted R-squared: 0.9904

F-statistic: 310.4 on 6 and 12 DF, p-value: 1.907e-12

第三步: 由上列結果選出最適模型fit4。

Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4)

Conc = 6.9265 + 55.8348 Dist - 31.4866 Dist² + 7.7625 Dist³ – 0.6751 Dist⁴

來勁了嗎? 想知道更多?? 補充資料(連結):

1. 關於Regression analysis (https://en.wikipedia.org/wiki/Regression_analysis)

2. 關於Polynomial regression (https://en.wikipedia.org/wiki/Polynomial_regression)

3. 關於Polynomial regression techniques (https://www.r-bloggers.com/polynomial-regression-techniques/)

4. 關於R基礎，R繪圖及統計快速入門:

a. R Tutorial: https://www.tutorialspoint.com/r/index.htm

b. Cookbook for R: http://www.cookbook-r.com/

c. Quick-R: https://www.statmethods.net/

d. Statistical tools for high-throughput data analysis (STHDA): http://www.sthda.com/english/

e. The Handbook of Biological Statistics: http://www.biostathandbook.com/

f. An R Companion for the Handbook of Biological Statistics: http://rcompanion.org/rcompanion/index.html

5. Zar, JH. 2010. Biostatistical Analysis, Fifth Edition, Pearson.

搜尋此網誌

統計不球人

多項式回歸分析 (Polynomial Regression)

留言

張貼留言

這個網誌中的熱門文章

統計不球人目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

多項式回歸分析 (Polynomial Regression)

留言

張貼留言

這個網誌中的熱門文章

統計不球人 目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

統計不球人目錄 (Table of Contents)