卡方適合度檢定 (Chi-square Goodness of fit test)

5月 03, 2018

套路46: 卡方適合度檢定 (Chi-square Goodness of fit test)

1. 使用時機:卡方適合度檢定用來分析數據之分佈型(distribution)或符合某種比例。

2. 分析類型: 母數分析(parametric analysis)。

3. 使用R計算卡方適合度檢定範例一:

某研究在一棵25公尺高的樹幹的不同高度(m)觀察到某種蛾。資料如下：

編號	1	2	3	4	5	6	7	8	9	10	11	12	13	14
高度	1.4	2.6	3.3	4.2	4.7	5.6	6.4	7.7	9.3	10.6	11.5	12.4	18.6	22.3
蛾數	1	1	1	1	1	2	1	1	1	1	1	1	1	1

試問蛾在樹上不同高度的分布是否為均勻分布(uniform distribution)?

H₀: 蛾在樹上不同高度的分布是均勻分布。H_A: 蛾在樹上不同高度的分布不是均勻分布。

第一步: 安裝EnvStats程式套件。

第二步: 呼叫EnvStats程式套件備用。

library(EnvStats)

第三步: 輸入建立資料。

y <- c(1.4, 2.6, 3.3, 4.2, 4.7, 5.6, 6.4, 7.7, 9.3, 10.6, 11.5, 12.4, 18.6, 22.3)

第四步: 使用EnvStats程式套件的gofTest函數代入y計算卡方適合度檢定。

gofTest(y~1, test = "chisq", distribution = "unif", alternative = "two.sided")

# y~1只有一組樣本(one-sample test)。

# test = "chisq" 執行單一樣本卡方適合度檢定。

# distribution = "unif" 檢定的是資料是否為均勻分布(uniform distribution)。

第五步: 判讀結果。

Results of Goodness-of-Fit Test

-------------------------------

Test Method: Chi-square GOF

Hypothesized Distribution: Uniform

Estimated Parameter(s): min = 1.4

max = 22.3

Estimation Method: mle

Data: y

Sample Size: 14

Test Statistic: Chi-square = 5.714286

Test Statistic Parameter: df = 3

P-value: 0.1263692

Alternative Hypothesis: True cdf does not equal the Uniform Distribution.

# P-value < 0.05，H₀: 蛾在樹上不同高度的分布是均勻分布，不成立。

# P-value > 0.05，H₀: 蛾在樹上不同高度的分布是均勻分布，成立。

4. 使用R計算卡方適合度檢定範例二:

夢得爾葛格計數黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆，資料如下：

性狀	黃色圓皮	黃色皺皮	綠色圓皮	綠色皺皮	總數
觀察值	152	39	53	6	250
期望值	(9/16) x 250	(3/16) x 250	(3/16) x 250	(1/16) x 250

試問黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆比例是否為9:3:3:1?

H₀: 黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆比例為9:3:3:1。

H_A: 黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆比例不是9:3:3:1。

第一步: 閱讀基本模組(base)中stats程式套件的chisq.test函數的說明書。

help(chisq.test)

第二步: 輸入建立資料。

x <- c(152,39,53,6)

第三步: 使用stats程式套件的chisq.test函數代入x及期望值。

chisq.test(x, p = c(9/16, 3/16, 3/16, 1/16), correct = FALSE)

# p = c(9/16, 3/16, 3/16, 1/16)比例期望值，加總需等於1。

# p = c(9, 3, 3, 1)不行，因為比例期望值加總需等於1。

# correct = FALSE項目超過兩項，自由度大於1，不做葉慈修正

第四步: 判讀結果。

Chi-squared test for given probabilities

data: x

X-squared = 8.9724, df = 3, p-value = 0.02966

# p-value < 0.05，H₀: 黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆比例為9:3:3:1，不成立。

# p-value > 0.05，H₀: 黃色圓皮、黃色皺皮、綠色圓皮及綠色皺皮碗豆比例為9:3:3:1，成立。

5. 使用R計算卡方適合度檢定範例三:

咪路調查大一到大四男生的人數得到資料如下:

年級	大一	大二	大三	大四	總數
男生	32	43	36	39	150

試問四個年級的男生佔全部男生的比例(p₁、p₂、p₃及p₄)是否相同?

H₀: p₁ = p₂ = p₃ = p₄。

H_A: 四個年級的男生比例不完全相同。

第一步: 閱讀基本模組(base)中的chisq.test函數的說明書。

help(chisq.test)

第二步: 輸入建立資料。

x <- c(32/150, 43/150, 36/150, 39/150)

第三步: 使用基本模組(base) 中的chisq.test函數代入x及期望值。

chisq.test(x, p = c(1/4, 1/4, 1/4, 1/4))

# 假設四組比例相同，因此比例期望值設定p = c(1/4, 1/4, 1/4, 1/4)。

# p = c(1, 1, 1, 1)不行，因為比例期望值加總需等於1。

第四步: 判讀結果。

Chi-squared test for given probabilities

data: x

X-squared = 0.011556, df = 3, p-value = 0.9997

# p-value < 0.05，H₀: p₁ = p₂ = p₃ = p₄，不成立。

# p-value > 0.05，H₀: p₁ = p₂ = p₃ = p₄，成立。

6. 注意1，有另一種狀況是利用卡方分析(proportion test)比較三個以上的比值是否相同，如下列範例所示(勿與範例三混淆):

咪路調查大一到大四抽菸的人數得到資料如下:

年級	大一	大二	大三	大四
抽菸人數	32	43	16	9
調查人數	87	108	80	25

試問四個年級抽菸的人數比例(p₁、p₂、p₃及p₄)是否相同?

H₀: p₁ = p₂ = p₃ = p₄。

H_A: 四個年級抽菸的人數比例不完全相同。

第一步: 閱讀基本模組(base)中的prop.test函數的說明書。

help(prop.test)

第二步: 輸入建立資料。

smokers <- c(32, 43, 16, 9)

students <- c(87, 108, 80, 25)

第三步: 使用stats程式套件的chisq.test函數代入x及期望值。

prop.test(smokers, students, alternative = "two.sided")

第四步: 判讀結果。

4-sample test for equality of proportions without continuity correction

data: smokers out of students

X-squared = 8.9872, df = 3, p-value = 0.02946

alternative hypothesis: two.sided

sample estimates:

prop 1 prop 2 prop 3 prop 4

0.3678161 0.3981481 0.2000000 0.3600000

# p-value < 0.05，H₀: p₁ = p₂ = p₃ = p₄，不成立。

# p-value > 0.05，H₀: p₁ = p₂ = p₃ = p₄，成立。

7. 注意2，有一種狀況是只有兩項(degree of freedom = 1)時需做葉慈修正(Yates' correction)，如下列範例所示:

夢得爾葛格計數黃色及綠色碗豆，資料如下：:

	黃色	綠色	總數
碗豆數	84	16	100

試問黃色及綠色碗豆比例是否為3:1?

H₀: 黃色及綠色碗豆比例為3:1。

H_A: 黃色及綠色碗豆比例不是3:1。

第一步: 閱讀基本模組(base)中的chisq.test函數的說明書。

help(chisq.test)

第二步: 輸入建立資料。

x <- c(84, 16)

第三步: 使用stats程式套件的chisq.test函數代入x及期望值。

chisq.test(x, p = c(3/4, 1/4), correct = TRUE)

# p = c(3/4, 1/4)比例期望值，加總需等於1。

# p = c(3, 1)不行，因為比例期望值加總需等於1。

# correct = TRUE葉慈修正。

# simulate.p.value = TRUE, B = 2000估計p值。

第四步: 判讀結果。

Chi-squared test for given probabilities

data: x

X-squared = 4.32, df = 1, p-value = 0.03767

# p-value < 0.05，H₀: 黃色及綠色碗豆比例為3:1，不成立。

# p-value > 0.05，H₀: 黃色及綠色碗豆比例為3:1，成立。

來勁了嗎? 想知道更多?? 補充資料(連結):

1. 關於Karl Pearson (https://en.wikipedia.org/wiki/Karl_Pearson)

2. 關於Goodness of fit (https://en.wikipedia.org/wiki/Goodness_of_fit)

3. 關於Pearson's chi-squared test (https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test)

4. 關於R基礎，R繪圖及統計快速入門:

a. R Tutorial: https://www.tutorialspoint.com/r/index.htm

b. Cookbook for R: http://www.cookbook-r.com/

c. Quick-R: https://www.statmethods.net/

d. Statistical tools for high-throughput data analysis (STHDA): http://www.sthda.com/english/

e. The Handbook of Biological Statistics: http://www.biostathandbook.com/

f. An R Companion for the Handbook of Biological Statistics: http://rcompanion.org/rcompanion/index.html

5. Zar, JH. 2010. Biostatistical Analysis, Fifth Edition, Pearson.

搜尋此網誌

統計不球人

卡方適合度檢定 (Chi-square Goodness of fit test)

留言

張貼留言

這個網誌中的熱門文章

統計不球人目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

卡方適合度檢定 (Chi-square Goodness of fit test)

留言

張貼留言

這個網誌中的熱門文章

統計不球人 目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

統計不球人目錄 (Table of Contents)