兩組變異數不同樣本t檢定 (t Test for Two-Samples with Unequal Variances，Welch Two Sample t-test)

4月 30, 2018

套路12: 兩組變異數不同樣本t檢定

(t Test for Two-Samples with Unequal Variances，

Welch Two Sample t test)

什麼是兩組獨立樣本假設檢定? 說白了就是兩組分別獨立取樣的資料做比較的假設檢定。統計假設檢定檢定什麼?看H₀。例如兩組獨立樣本假設檢定H₀ : μ₁ = μ₂，H_A : μ₁ ¹ μ₂是檢定兩組資料的平均值是否相同。又例如兩組獨立樣本假設檢定H₀ : μ₁ < μ₂，H_A : μ₁ ³ μ₂是檢定第一組資料的平均值是否小於第二組資料的平均值。假設相等時為雙尾 (two-tailed test) 檢定。假設不相等時為單尾 (one-tailed test) 檢定。如下圖所示:

1. 使用時機: 用於比較觀測到的兩組獨立樣本變異數不同資料平均值(mean)。大樣本用Z檢定小樣本用t檢定。

2. 分析類型: 母數分析(parametric analysis)。直接使用資料數值算統計叫parametric方法，把資料排序之後用排序的名次算統計叫non-parametric方法。

3. 前提假設: 兩組資料均為常態分布(normal distribution) 或接近常態分布，但是變異數不同(unequal variance)。

4. 資料範例: 咪路比較每單位面積溫室內及室外種植番茄產量(kg/m²)，資料如下:

室外	69.3	75.5	81.0	74.7	72.3	78.7	76.4	70.5	77.9
溫室	69.5	64.6	74.0	84.8	76.0	93.9	81.2	73.4	88.0	79.5	90.2

請問溫室內及室外種植番茄平均重量是否相同? H₀: m₁ = m₂，H_A: m₁ ≠ m₂。

5. 輸入建立資料:

第一步: 用小c將資料放入名稱為h1及u1的vector (R最基本資料結構)。用rep函數產生與資料相同

數目的(11及10個)大寫H及U放入名稱為h2及u2的vector，再組合成名稱為dat的data frame。

h1 <- c(69.3, 75.5, 81.0, 74.7, 72.3, 78.7, 76.4, 70.5, 77.9)

u1 <- c(69.5, 64.6, 74.0, 84.8, 76.0, 93.9, 81.2, 73.4, 88.0, 79.5, 90.2)

h2 <- rep("Out", 9)

u2 <- rep("Inn", 11)

Weight <- c(h1, u1)

Place <- c(h2, u2)

dat <- data.frame(Weight, Place)

6. 畫圖看資料分布:

第一步: 安裝ggplot2程式套件。

第二步: 呼叫ggplot2程式套件備用。

library(ggplot2)

第三步: 畫圖。

ggplot(dat, aes(x = Place, y = Weight)) +

geom_boxplot(color = "red")+

geom_jitter(position = position_jitter(0.05))

# 同時畫x-y散布(黑色點)圖及盒圖(紅色box plot)。

# ggplot2程式套件geom_jitter函數讓重疊(數值相同)的資料點錯開，避免誤判。

7. 檢查資料是否為常態分布:

第一步: 閱讀基本模組(base)中shapiro.test函數的說明書。

help(shapiro.test)

第二步: 使用基本模組(base)中shapiro.test函數檢查h1及u1中資料是否為常態分布。

shapiro.test(h1)

shapiro.test(u1)

第三步: 判讀結果。

Shapiro-Wilk normality test

data: h1

W = 0.97326, p-value = 0.9211 # p-value > 0.05，資料符合常態分布。

Shapiro-Wilk normality test

data: u1

W = 0.98112, p-value = 0.9721 # p-value > 0.05，資料符合常態分布。

# p-value > 0.05，資料符合常態分布。

# p-value < 0.05，資料不符合常態分布。

8. 檢查兩組資料是否為相同變異數(H₀: s²₁ = s²₂，H_A: s²₁ ≠s²₂):

第一步: 閱讀基本模組(base)中var.test函數的說明書。

help(var.test)

第二步: 使用基本模組(base)中var.test函數帶入h1及u1中資料。

var.test(h1, u1, ratio = 1, alternative = "two.sided")

# ratio = 1，H₀: s²₁ = s²₂。

第三步: 判讀結果。

F test to compare two variances

data: h1 and u1

F = 0.1818, num df = 8, denom df = 10, p-value = 0.02378 # p-value < 0.05，H₀: s²₁ = s²₂不成立。

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.04715977 0.78083583

sample estimates: ratio of variances 0.1817958

# p-value > 0.05，H₀: s²₁ = s²₂成立，資料相同變異數。

# p-value < 0.05，H₀: s²₁ = s²₂不成立，資料不同變異數。

9. 使用R計算兩組不同變異數樣本t檢定:

第一步: 閱讀基本模組(base)中的t.test函數的使用說明。

help(t.test)

第二步: 使用基本模組(base)中的t.test函數代入資料數值。

t.test(h1, u1, alternative = "two.sided", paired = FALSE, var.equal = FALSE)

# var.equal = FALSE資料不同變異數。

# paired = FALSE不是成對資料t檢定。

# alternative = "two.sided" 執行雙尾檢定。

# 如果要檢定: H₀: m₁ ≥ m₂，H_A: m₁ < m₂或H₀: m₁ > m₂，H_A: m₁ ≤ m₂，alternative = "less"。

# 如果要檢定: H₀: m₁ ≤ m₂，H_A: m₁ > m₂或H₀: m₁ < m₂，H_A: m₁ ≥ m₂，alternative = "greater"。

第三步: 判讀結果。

Welch Two Sample t-test

data: h1 and u1

t = -1.4551, df = 14.069, p-value = 0.1676 # p-value > 0.05，H₀: m₁ = m₂成立。

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-10.907639 2.087437

sample estimates:

mean of x mean of y 75.14444 79.55455

# p-value < 0.05，H₀: m₁ = m₂，不成立。

# p-value > 0.05，H₀: m₁ = m₂，成立。

來勁了嗎? 想知道更多?? 補充資料(連結):

1. Bernard Lewis Welch (https://en.wikipedia.org/wiki/Bernard_Lewis_Welch)

2. Welch's t-test (https://en.wikipedia.org/wiki/Welch%27s_t-test)

3. Statistical hypothesis testing (https://en.wikipedia.org/wiki/Statistical_hypothesis_testing)

4. Test statistic (https://en.wikipedia.org/wiki/Test_statistic)

5. 關於R基礎，R繪圖及統計快速入門:

a. R Tutorial: https://www.tutorialspoint.com/r/index.htm

b. Cookbook for R: http://www.cookbook-r.com/

c. Quick-R: https://www.statmethods.net/

d. Statistical tools for high-throughput data analysis (STHDA): http://www.sthda.com/english/

e. The Handbook of Biological Statistics: http://www.biostathandbook.com/

f. An R Companion for the Handbook of Biological Statistics: http://rcompanion.org/rcompanion/index.html

6. Zar, JH. 2010. Biostatistical Analysis, Fifth Edition, Pearson.

搜尋此網誌

統計不球人

兩組變異數不同樣本t檢定 (t Test for Two-Samples with Unequal Variances，Welch Two Sample t-test)

留言

張貼留言

這個網誌中的熱門文章

統計不球人目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

兩組變異數不同樣本t檢定 (t Test for Two-Samples with Unequal Variances，Welch Two Sample t-test)

留言

張貼留言

這個網誌中的熱門文章

統計不球人 目錄 (Table of Contents)

如何選擇統計方法 1

如何檢查資料是否接近常態分布 (Normality Test using R)

統計不球人目錄 (Table of Contents)