多項式回歸分析 (Polynomial Regression)
套路44: 多項式回歸分析 (Polynomial Regression)
1. 使用時機: 以自變項的多項式預測一個因變項。
2. 分析類型: 回歸分析(regression analysis)。
3. 範例資料: 咪路以河口為起點測量不同距離採樣點河水中的鉛含量,資料如下。
Sample
|
Dist. (km)
|
Conc. (mg/L)
|
1
|
1.22
|
40.9
|
2
|
1.34
|
41.8
|
3
|
1.51
|
42.4
|
4
|
1.66
|
43.0
|
5
|
1.72
|
43.4
|
6
|
1.93
|
43.9
|
7
|
2.14
|
44.3
|
8
|
2.39
|
44.7
|
9
|
2.51
|
45.0
|
10
|
2.78
|
45.1
|
11
|
2.97
|
45.4
|
12
|
3.17
|
46.2
|
13
|
3.32
|
47.0
|
14
|
3.50
|
48.6
|
15
|
3.53
|
49.0
|
16
|
3.85
|
49.7
|
17
|
3.95
|
50.0
|
18
|
4.11
|
50.8
|
19
|
4.18
|
51.1
|
求多項式回歸方程式。
4. 建立資料。
第一步: 使用基本模組(base) read.table函數輸入建立資料儲存到變數m。
m <- read.table(header
= T, text = "
Dist Conc
1.22 40.9
1.34 41.8
1.51 42.4
1.66 43.0
1.72 43.4
1.93 43.9
2.14 44.3
2.39 44.7
2.51 45.0
2.78 45.1
2.97 45.4
3.17 46.2
3.32 47.0
3.50 48.6
3.53 49.0
3.85 49.7
3.95 50.0
4.11 50.8
4.18 51.1")
# 資料間以空白分隔
第二步: 使用基本模組(base)的attach及names函數賦予m中的資料名稱(標題) 。
attach(m) # 讓R能透過變數名稱搜尋資料。
names(m) # 賦予資料名稱(標題)。
5. 畫圖看資料分佈:
第一步: 安裝ggplot2程式套件。
第二步: 呼叫ggplot2程式套件備用。
library(ggplot2)
第三步: 畫圖。
ggplot(m, aes(x = Dist,
y = Conc)) +
geom_point(shape = 1)
+ # 畫空心圓
geom_smooth() # 加回歸線,及95%信賴區間。
結果:
6. 執行回歸。
第一步: 使用基本模組(base)的lm函數代入m資料求回歸方程式,結果儲存至變數fit1-fit6。
fit1 <- lm(Conc ~
Dist)
fit2 <- lm(Conc ~
Dist + I(Dist^2))
fit3 <- lm(Conc ~
Dist + I(Dist^2) + I(Dist^3))
fit4 <- lm(Conc ~
Dist + I(Dist^2) + I(Dist^3) + I(Dist^4))
fit5 <- lm(Conc ~
Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) + I(Dist^5))
fit6 <- lm(Conc ~
Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) + I(Dist^5) + I(Dist^6))
# Conc為因變數,Dist為自變數,^2是二次方,^3是三次方。
第二步: 顯示判讀結果。
summary(fit1)
Call:
lm(formula = Conc ~ Dist)
Residuals:
Min 1Q
Median 3Q Max
-1.2758 -0.2706 0.2599 0.4483
0.6407
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 37.3890 0.4364
85.67 < 2e-16 ***
Dist 3.1269 0.1510
20.71 1.69e-13 ***
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6335 on 17 degrees of freedom
Multiple R-squared:
0.9619, Adjusted
R-squared: 0.9596
F-statistic: 428.9 on 1 and 17 DF,
p-value: 1.693e-13
summary(fit2)
Call:
lm(formula = Conc ~ Dist + I(Dist^2))
Residuals:
Min 1Q
Median 3Q Max
-0.89062 -0.36433
0.09016 0.37248 0.68840
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 40.3017 1.1335
35.55 <2e-16 ***
Dist 0.6666 0.9135
0.73 0.4761
I(Dist^2) 0.4540 0.1669
2.72 0.0151 *
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5399 on 16 degrees of freedom
Multiple R-squared:
0.9739, Adjusted
R-squared: 0.9707
F-statistic: 298.9 on 2 and 16 DF,
p-value: 2.134e-13
summary(fit3)
Call:
lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3))
Residuals:
Min 1Q
Median 3Q Max
-0.72883 -0.27353
0.00466 0.19039 1.00640
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 32.7673 3.1132
10.525 2.54e-08 ***
Dist 10.4109 3.9030
2.667 0.0176 *
I(Dist^2) -3.3868 1.5136
-2.238 0.0408 *
I(Dist^3) 0.4701 0.1844
2.549 0.0222 *
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4658 on 15 degrees of freedom
Multiple R-squared:
0.9818, Adjusted
R-squared: 0.9782
F-statistic: 269.9 on 3 and 15 DF,
p-value: 2.859e-13
summary(fit4)
Call:
lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4))
Residuals:
Min 1Q
Median 3Q Max
-0.45159 -0.24518 -0.00307
0.14294 0.70094
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 6.9265 7.2855
0.951 0.35787
Dist 55.8348 12.4946
4.469 0.00053 ***
I(Dist^2) -31.4866 7.6054
-4.140 0.00100 **
I(Dist^3) 7.7625 1.9573
3.966 0.00141 **
I(Dist^4) -0.6751 0.1808
-3.735 0.00222 **
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3413 on 14 degrees of freedom
Multiple R-squared:
0.9909, Adjusted
R-squared: 0.9883
F-statistic: 380.6 on 4 and 14 DF,
p-value: 4.141e-14
summary(fit5)
Call:
lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) +
I(Dist^5))
Residuals:
Min 1Q
Median 3Q Max
-0.41950 -0.16702 -0.02428
0.13190 0.69400
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 36.2391 22.7989
1.590 0.136
Dist -9.1615 49.5645
-0.185 0.856
I(Dist^2) 23.3871 41.2381
0.567 0.580
I(Dist^3) -14.3460 16.4561
-0.872 0.399
I(Dist^4) 3.5936 3.1609
1.137 0.276
I(Dist^5) -0.3174 0.2347
-1.353 0.199
Residual standard error: 0.3316 on 13 degrees of freedom
Multiple R-squared:
0.992, Adjusted
R-squared: 0.9889
F-statistic: 322.9 on 5 and 13 DF,
p-value: 3.726e-13
summary(fit6)
Call:
lm(formula = Conc ~ Dist + I(Dist^2) + I(Dist^3) + I(Dist^4) +
I(Dist^5) + I(Dist^6))
Residuals:
Min 1Q
Median 3Q Max
-0.33153 -0.18574
0.00889 0.11797 0.57277
Coefficients:
Estimate Std.
Error t value Pr(>|t|)
(Intercept) 157.8822 73.6834
2.143 0.0533 .
Dist -330.9759 192.2850
-1.721 0.1109
I(Dist^2) 364.0428 201.2862
1.809 0.0956 .
I(Dist^3) -199.3612 108.4010
-1.839 0.0908 .
I(Dist^4) 58.1131 31.7588
1.830 0.0922 .
I(Dist^5) -8.6070 4.8130
-1.788 0.0990 .
I(Dist^6) 0.5096 0.2956
1.724 0.1103
---
Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.309 on 12 degrees of freedom
Multiple R-squared:
0.9936, Adjusted
R-squared: 0.9904
F-statistic: 310.4 on 6 and 12 DF,
p-value: 1.907e-12
第三步: 由上列結果選出最適模型fit4。
Conc ~ Dist + I(Dist^2)
+ I(Dist^3) + I(Dist^4)
Conc = 6.9265 + 55.8348
Dist - 31.4866 Dist2 + 7.7625 Dist3 – 0.6751 Dist4
來勁了嗎? 想知道更多?? 補充資料(連結):
3. 關於Polynomial regression techniques (https://www.r-bloggers.com/polynomial-regression-techniques/)
4. 關於R基礎,R繪圖及統計快速入門:
a. R Tutorial: https://www.tutorialspoint.com/r/index.htm
b. Cookbook for R: http://www.cookbook-r.com/
c. Quick-R: https://www.statmethods.net/
d. Statistical tools
for high-throughput data analysis (STHDA): http://www.sthda.com/english/
e. The Handbook of Biological Statistics: http://www.biostathandbook.com/
f. An R Companion for the Handbook of
Biological Statistics: http://rcompanion.org/rcompanion/index.html
5. Zar, JH. 2010. Biostatistical Analysis, Fifth Edition,
Pearson.
留言
張貼留言