Week 5 figures - Lectures 9 and 10

analysis of variance
eta squared
shapiro-wilk
levene
mann-whitney U
wilcoxon signed-rank
kruskal-wallis
cliffs delta
Author
Affiliation
Lecture date

February 3, 2026

Modified

February 17, 2026

0. Set up

Code
# cleaning
library(tidyverse)

# visualization
theme_set(theme_classic() +
            theme(panel.grid = element_blank(),
                  axis.text = element_text(size = 18),
                  axis.title = element_text(size = 18),
                  text = element_text(family = "Lato")))
library(patchwork)

# data
library(palmerpenguins)

# analysis
library(car)
library(effectsize)
library(rstatix)
library(dunn.test)

1. Math

a. sum of squares

Among groups:

\[ \sum^k_{i = 1}\sum^n_{j = 1}(\bar{y_i} - \bar{y})^2 \]

where k is the number of groups, i is ith group, n is the number of observations per group, j is the jth observation. \(\bar{y_i}\) is the mean of group i, while \(\bar{y}\) is the grand mean (of all samples pooled together).

Within groups:

\[ \sum^k_{i = 1}\sum^n_{j = 1}(y_{ij} - \bar{y_i})^2 \]

where \(y_{ij}\) is the jth observation in the ith group, and \(\bar{y_i}\) is the mean of group i.

Total sum of squares:

\[ \sum^k_{i = 1}\sum^n_{j = 1}(y_{ij} - \bar{y})^2 \]

where \(y_{ij}\) is the jth observation in the ith group, and \(\bar{y}\) is the grand mean.

b. mean squares

Among groups:

\[ \frac{SS_{among \ group}}{k - 1} \]

Within groups:

\[ \frac{SS_{within \ group}}{n - k} \]

c. F-ratio/F-statistic

\[ \frac{MS_{among \ group}}{MS_{within \ group}} \]

Put another way, the F-ratio is the ratio of among group variance to within group variance. If the among group variance is larger than within group variance, the F-ratio is large, and therefore the probability of among group variance being equal to within group variance is small. Thus, you would reject the null hypothesis if the F-ratio is large.

d. η squared

\[ \eta^2 = \frac{SS_{among \ group}}{SS_{among \ group} + SS_{within \ group}} \] ### e. U statistic

\[ \begin{align} U_1 &= \Sigma R_1 - n_1(n_1 + 1)/2 = 17 - 5(5+1)/2 = 2 \\ U_2 &= \Sigma R_2 - n_2(n_2 + 1)/2 = 38 - 5(5+1)/2 = 23 \end{align} \]

2. Warm up: code for a figure

Code
# random sample of 10 rows from data frame
sample_n(chickwts, 10) %>% 
  arrange(feed)
   weight      feed
1     222    casein
2     390    casein
3     168 horsebean
4     203   linseed
5     315  meatmeal
6     380  meatmeal
7     250   soybean
8     295 sunflower
9     340 sunflower
10    339 sunflower
Code
ggplot(data = chickwts,           # data frame: chickwts
       aes(x = feed,              # x-axis: feed type
           y = weight,            # y-axis: chick weight
           fill = feed)) +        # fill by feed type  
  geom_boxplot() +                # creates a boxplot
  geom_jitter(height = 0,         # prevents jitter from moving points along y-axis
              width = 0.2) +      # narrows spread of jitter along x-axis
  theme(legend.position = "none") # removes legend

3. Analysis of variance

Central question: How does bill length differ between penguin species?

a. Exploratory data visualization

Code
penguins_jitter <- ggplot(data = penguins,
                          aes(x = species,
                              y = bill_length_mm,
                              color = species)) +
  geom_jitter(width = 0.2,
              height = 0,
              shape = 21) +
  scale_color_manual(values = c("#209c90", "#018ca9", "#27c839")) +
  labs(x = "Species",
       y = "Bill length (mm)") +
  theme(legend.position = "none")

penguins_jitter

b. histogram and qq plots

Code
hist <- ggplot(data = penguins,
       aes(x = bill_length_mm,
           fill = species)) +
  geom_histogram(bins = 14,
                 color = "black") +
  scale_fill_manual(values = c("#209c90", "#018ca9", "#27c839")) +
  scale_y_continuous(expand = c(0, 0)) +
  facet_wrap(~ species, scales = "free", ncol = 1) +
  labs(x = "Bill length (mm)",
       y = "Count") +
  theme(legend.position = "none",
        strip.background = element_rect(color = "white"),
        strip.text = element_text(size = 20))

qq <- ggplot(data = penguins,
       aes(sample = bill_length_mm)) +
  geom_qq_line() +
  geom_qq(aes(color = species)) +
  scale_color_manual(values = c("#209c90", "#018ca9", "#27c839")) +
  facet_wrap(~ species, scales = "free", ncol = 1) +
  labs(x = "Theoretical quantile",
       y = "Value") +
  theme(legend.position = "none",
        strip.background = element_rect(color = "white"),
        strip.text = element_text(size = 20))

hist + qq

c. Shapiro-Wilk normality test

General: Is the response variable normally distributed?
Example: Is bill length normally distributed?

Code
# first, wrangle
adelie <- penguins %>% 
  filter(species == "Adelie") %>% 
  pull(bill_length_mm)

chinstrap <- penguins %>% 
  filter(species == "Chinstrap") %>% 
  pull(bill_length_mm)

gentoo <- penguins %>% 
  filter(species == "Gentoo") %>% 
  pull(bill_length_mm)

# then, do the shapiro-wilk test
shapiro.test(adelie)

    Shapiro-Wilk normality test

data:  adelie
W = 0.99336, p-value = 0.7166
Code
shapiro.test(chinstrap)

    Shapiro-Wilk normality test

data:  chinstrap
W = 0.97525, p-value = 0.1941
Code
shapiro.test(gentoo)

    Shapiro-Wilk normality test

data:  gentoo
W = 0.97272, p-value = 0.01349

d. Levene test of variances

General: Are the group variances equal?
Example: Are the species variances equal?

Code
leveneTest(bill_length_mm ~ species,
           data = penguins)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  2.2425 0.1078
      339               
Code
penguins |> 
  drop_na(bill_length_mm) |> 
  group_by(species) |> 
  summarize(variance = var(bill_length_mm, na.rm = TRUE),
            n = length(bill_length_mm)) 
# A tibble: 3 × 3
  species   variance     n
  <fct>        <dbl> <int>
1 Adelie        7.09   151
2 Chinstrap    11.2     68
3 Gentoo        9.50   123

e. analysis of variance

Code
# do the actual test
# model object stored as `penguins_anova`
penguins_anova <- aov(bill_length_mm ~ species,
                      data = penguins)

# output of the test
penguins_anova
Call:
   aov(formula = bill_length_mm ~ species, data = penguins)

Terms:
                 species Residuals
Sum of Squares  7194.317  2969.888
Deg. of Freedom        2       339

Residual standard error: 2.959853
Estimated effects may be unbalanced
2 observations deleted due to missingness
Code
# more information
summary(penguins_anova)
             Df Sum Sq Mean Sq F value Pr(>F)    
species       2   7194    3597   410.6 <2e-16 ***
Residuals   339   2970       9                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2 observations deleted due to missingness

f. post hoc test

Which group comparisons are different?

Code
TukeyHSD(penguins_anova)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = bill_length_mm ~ species, data = penguins)

$species
                      diff       lwr        upr     p adj
Chinstrap-Adelie 10.042433  9.024859 11.0600064 0.0000000
Gentoo-Adelie     8.713487  7.867194  9.5597807 0.0000000
Gentoo-Chinstrap -1.328945 -2.381868 -0.2760231 0.0088993

g. effect size

Code
eta_squared(penguins_anova)
  species 
0.7078091 

h. example of writing

Without the stats: Our results suggest a difference in bill length between species, with a large effect of species. Species differed in bill length, and pairwise comparisons between species showed that all three species differed from each other. Generally, Adelie penguins tend to have shorter bills than Chinstrap and Gentoo penguins. Gentoo penguins tend to have shorter bills than Chinstrap penguins.

With the stats: Our results suggest a difference in bill length between species, with a large (\(\eta^2\) = 0.71) effect of species. Species differed in bill length (one-way ANOVA, F(2, 339) = 410.6, p < 0.001, \(\alpha\) = 0.05), and pairwise comparisons between species showed that all three species differed from each other. Generally, Adelie penguins tend to have 10.0 mm shorter bills than Chinstrap (Tukey HSD, p < 0.001, 95% confidence interval: [9.0, 11.1] mm) penguins and 8.7 mm shorter than Gentoo (Tukey HSD, p < 0.001, 95% confidence interval: [7.9, 9.6] mm) penguins. Gentoo penguins tend to have 1.3 mm shorter bills than Chinstrap penguins (Tukey HSD, p = 0.008, 95% confidence interval: [0.3, 2.4] mm).

i. a “finalized” figure

Code
ggplot(data = penguins,
       aes(x = species,
           y = bill_length_mm,
           color = species)) +
  geom_jitter(width = 0.2,
              height = 0,
              shape = 21,
              alpha = 0.4) +
  stat_summary(geom = "pointrange",
               fun.data = mean_cl_normal,
               size = 0.8,
               linewidth = 1) +
  scale_color_manual(values = c("#209c90", "#018ca9", "#27c839")) +
  labs(x = "Species",
       y = "Bill length (mm)") +
  theme(legend.position = "none")

4. Non parametric tests

a. Mann-Whitney U

Code
sharks <- cbind(sb = c(1.25, 1.21, 1.18, 1.14, 1.15), 
                sd = c(1.42, 1.44, 1.45, 1.51, 1.20)) %>% 
  as_tibble() %>% 
  pivot_longer(cols = sb:sd) %>% 
  rename(site = name,
         length_m = value) |> 
  mutate(site = case_match(
    site,
    "sb" ~ "Santa Barbara",
    "sd" ~ "San Diego"
  ),
  site = fct_relevel(as_factor(site), "Santa Barbara", "San Diego"))

ggplot(data = sharks,
       aes(x = site,
           y = length_m)) +
  geom_boxplot(aes(fill = site),
               outliers = FALSE,
               width = 0.5) +
  geom_jitter(
    width = 0.2,
    height = 0,
    shape = 21
  ) +
  labs(x = "Site", 
       y = "Length (m)") +
  scale_fill_manual(values = c(
    "San Diego" = "goldenrod",
    "Santa Barbara" = "turquoise"
  )) +
  theme(legend.position = "none")

Code
wilcox.test(
  length_m ~ site,
  data = sharks
)

    Wilcoxon rank sum exact test

data:  length_m by site
W = 2, p-value = 0.03175
alternative hypothesis: true location shift is not equal to 0
Code
cliffs_delta(
  length_m ~ site,
  data = sharks
)
r (rank biserial) |         95% CI
----------------------------------
-0.84             | [-0.96, -0.44]

b. Wilcoxon signed-rank

Code
# for a comparison of one group against a theoretical median
wilcox.test(Sample1, mu = theoretical)

# for a comparison of two groups
wilcox.test(value ~ sample,
            data = wilcox_df, 
            paired = TRUE)

c. Kruskal-Wallis

Code
sharks_monterey <- cbind(sb = c(1.25, 1.21, 1.18, 1.14, 1.15), 
                         sd = c(1.42, 1.44, 1.45, 1.51, 1.20),
                         mt = c(1.09, 1.08, 1.07, 1.07, 1.06)) %>% 
  as_tibble() %>% 
  pivot_longer(cols = sb:mt) %>% 
  rename(site = name,
         length_m = value) |> 
  mutate(site = case_match(
    site,
    "sb" ~ "Santa Barbara",
    "sd" ~ "San Diego",
    "mt" ~ "Monterey"
  ),
  site = fct_relevel(as_factor(site), "Monterey", "Santa Barbara", "San Diego"))

sharks_monterey |> 
  group_by(site) |> 
  summarize(median = median(length_m)) |> 
  ungroup()
# A tibble: 3 × 2
  site          median
  <fct>          <dbl>
1 Monterey        1.07
2 Santa Barbara   1.18
3 San Diego       1.44
Code
kruskal.test(
  length_m ~ site,
  data = sharks_monterey
)

    Kruskal-Wallis rank sum test

data:  length_m by site
Kruskal-Wallis chi-squared = 11.601, df = 2, p-value = 0.003026
Code
dunn_test(
  length_m ~ site,
  data = sharks_monterey,
  p.adjust.method = "bonferroni"
)
# A tibble: 3 × 9
  .y.      group1        group2           n1    n2 statistic        p   p.adj p.adj.signif
* <chr>    <chr>         <chr>         <int> <int>     <dbl>    <dbl>   <dbl> <chr>       
1 length_m Monterey      Santa Barbara     5     5      1.91 0.0560   0.168   ns          
2 length_m Monterey      San Diego         5     5      3.40 0.000681 0.00204 **          
3 length_m Santa Barbara San Diego         5     5      1.49 0.137    0.412   ns          
Code
dunn.test(
  x = sharks_monterey$length_m,
  g = sharks_monterey$site,
  method = "bonferroni"
)

kruskal_effsize(
  length_m ~ site,
  data = sharks_monterey
)
# A tibble: 1 × 5
  .y.          n effsize method  magnitude
* <chr>    <int>   <dbl> <chr>   <ord>    
1 length_m    15   0.800 eta2[H] large    
Code
ggplot(data = sharks_monterey,
       aes(x = site,
           y = length_m)) +
  geom_boxplot(aes(fill = site),
               outliers = FALSE,
               width = 0.5) +
  geom_jitter(
    width = 0.2,
    height = 0,
    shape = 21
  ) +
  labs(x = "Site", 
       y = "Length (m)") +
  scale_fill_manual(values = c(
    "San Diego" = "goldenrod",
    "Santa Barbara" = "turquoise",
    "Monterey" = "tomato4"
  )) +
  theme(legend.position = "none")

Code
kruskal_df <- cbind(Sample1 = round(rnorm(n = 5, mean = 4, sd = 1), 1), 
                    Sample2 = round(rnorm(n = 5, mean = 6, sd = 1), 1),
                    Sample3 = round(rnorm(n = 5, mean = 8, sd = 1), 1)) %>% 
  as_tibble() %>% 
  pivot_longer(cols = Sample1:Sample3) %>% 
  rename(sample = name) %>% 
  arrange(sample)

rank_by_hand <- kruskal_df %>% 
  arrange(value) %>% 
  rownames_to_column("ranks") %>% 
  mutate(ranks = as.numeric(ranks)) %>% 
  group_by(sample) %>% 
  reframe(sum_ranks = sum(ranks))

R1 <- rank_by_hand[1, 2]
R2 <- rank_by_hand[2, 2]
R3 <- rank_by_hand[3, 2]
n <- 15
n1 <- 5
n2 <- 5
n3 <- 5

rstatix::kruskal_test(value ~ sample,
             data = kruskal_df)
# A tibble: 1 × 6
  .y.       n statistic    df       p method        
* <chr> <int>     <dbl> <int>   <dbl> <chr>         
1 value    15      10.6     2 0.00489 Kruskal-Wallis
Code
kruskal.test(value ~ sample,
             data = kruskal_df)

    Kruskal-Wallis rank sum test

data:  value by sample
Kruskal-Wallis chi-squared = 10.64, df = 2, p-value = 0.004893
Code
#((12 * STATISTIC / (n * (n + 1)) - 3 * (n + 1)) / (1 - sum(TIES^3 - TIES) / (n^3 - n)))

(12/(n*(n+1)))*((R1^2)/n1 + (R2^2)/n2 + (R3^2)/n3) - 3*(n + 1)
  sum_ranks
1     10.64
Code
rstatix::kruskal_effsize(value ~ sample,
                         data = kruskal_df)
# A tibble: 1 × 5
  .y.       n effsize method  magnitude
* <chr> <int>   <dbl> <chr>   <ord>    
1 value    15   0.720 eta2[H] large    
Code
rstatix::dunn_test(value ~ sample,
                   data = kruskal_df)
# A tibble: 3 × 9
  .y.   group1  group2     n1    n2 statistic       p   p.adj p.adj.signif
* <chr> <chr>   <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
1 value Sample1 Sample2     5     5      1.84 0.0660  0.132   ns          
2 value Sample1 Sample3     5     5      3.25 0.00114 0.00343 **          
3 value Sample2 Sample3     5     5      1.41 0.157   0.157   ns          

Citation

BibTeX citation:
@online{bui2026,
  author = {Bui, An},
  title = {Week 5 Figures - {Lectures} 9 and 10},
  date = {2026-02-03},
  url = {https://winter-2026.envs-193ds.com/lecture/lecture_week-05.html},
  langid = {en}
}
For attribution, please cite this work as:
Bui, An. 2026. “Week 5 Figures - Lectures 9 and 10.” February 3, 2026. https://winter-2026.envs-193ds.com/lecture/lecture_week-05.html.