Winter 2026 - Coding workshop: Week 5

Workshop date: Friday 6 February

1. Summary

Packages

tidyverse
lterdatasampler
rstatix

Operations

New functions

make sure a variable is a factor using as_factor()
do an ANOVA using aov()
look for more information from model results using summary()
do post-hoc Tukey test using TukeyHSD()
calculate effect size for ANOVA using rstatix::eta_squared()

Review

read in data using read_csv()
chain functions together using |>
modify columns using mutate()
select columns using select()
rename columns using rename()
visualize data using ggplot()
create histograms using geom_histogram()
visualize QQ plots using geom_qq() and geom_qq_line()
create multi-panel plots using facet_wrap()
group data using group_by()
summarize data using summarize()

Data sources

The Plum Island Ecosystem fiddler crab data is from lterdatasampler (data info here). Crab carapace width is measured in mm.

2. Code

1. Packages

library(tidyverse) # general use
library(lterdatasampler) # data package
library(rstatix) # effect size calculations

data(pie_crab) # displaying data in Environment

2. Parametric tests

a. Cleaning and wrangling

Create a new object called pie_crab_clean.
Filter to only include the following sites: Cape Cod, Virginia Coastal Reserve LTER, and Zeke’s Island NERR.
Make sure site and name are set as factors.
Select the columns of interest: site, name, and code.
Rename the column name to site_name and code to site_code.

pie_crab_clean <- pie_crab |> # start with the pie_crab dataset
  filter(site %in% c("CC", "ZI", "VCR")) |> # filter for Cape Cod, Zeke's Island, Virginia Coastal
  mutate(name = as_factor(name), # make sure site name is being read in as a factor
         site = as_factor(site)) |> 
  select(site, name, size) |> # select columns of interest
  rename(site_code = site, # rename site to be site_code
         site_name = name) # rename name to be site_name

# display some rows from the data frame
# useful for showing a small part of the data frame
slice_sample(pie_crab_clean, # data frame
             n = 10) # number of rows to show

# A tibble: 10 × 3
   site_code site_name                      size
   <fct>     <fct>                         <dbl>
 1 VCR       Virginia Coastal Reserve LTER 15.1 
 2 VCR       Virginia Coastal Reserve LTER 19.2 
 3 ZI        Zeke's Island NERR             8.05
 4 ZI        Zeke's Island NERR             9.81
 5 ZI        Zeke's Island NERR             9.86
 6 VCR       Virginia Coastal Reserve LTER 20.8 
 7 ZI        Zeke's Island NERR            12.7 
 8 CC        Cape Cod                      14.4 
 9 ZI        Zeke's Island NERR            13.6 
10 ZI        Zeke's Island NERR            11.1

b. Quick summary

Create a new object called pie_crab_summary.
Calculate the mean, variance, and sample size. Display the object.

# creating a new object called pie_crab_summary
pie_crab_summary <- pie_crab_clean |> # starting with clean data frame
  group_by(site_name) |> # group by site
  summarize(mean = mean(size), # calculate mean size
            var = var(size), # calculate variance of size
            n = length(size)) |> # calculate number of observations per site (sample size)
  ungroup() # ungrouping

# display pie_crab_summary
pie_crab_summary

# A tibble: 3 × 4
  site_name                      mean   var     n
  <fct>                         <dbl> <dbl> <int>
1 Zeke's Island NERR             12.1  4.04    35
2 Virginia Coastal Reserve LTER  16.3  8.63    30
3 Cape Cod                       16.8  4.22    27

c. Exploring the data

Create a jitter plot with the mean crab carapace width for each site.

# base layer: ggplot
ggplot(pie_crab_clean, # use the clean data set
       aes(x = site_name, # x-axis
           y = size)) + # y-axis
  # first layer: jitter (each point is an individual crab)
  geom_jitter(height = 0, # don't jitter points vertically
              width = 0.15) + # narrower jitter (easier to see)
  # second layer: point representing mean
  stat_summary(geom = "point", # geometry being plotted
               fun = mean, # function (calculating the mean)
               color = "red", # color to make it easier to see
               size = 5) + # larger point size
  theme_minimal() + # cleaner plot theme
  labs(x = "Site name",
       y = "Carapace width (mm)")

Is there a difference in mean crab carapace width between the three sites?

Yes, and Zeke’s Island NERR crabs tend to be smaller than those from Cape Cod or Virginia Coastal Reserve LTER.

d. Check 1: normally distributed variable

Create a histogram of crab size.
Facet your histogram so that you have 3 panels, with one panel for each site.

# base layer: ggplot
ggplot(data = pie_crab_clean, 
       aes(x = size)) + # x-axis
  # first layer: histogram
  geom_histogram(bins = 6, # bins from Rice Rule (approximate)
                 fill = "orange",
                 color = "black") + 
  # faceting by site_name: creating 3 different panels
  facet_wrap(~ site_name) +
  theme_minimal() +
  labs(x = "Crab carapace width (mm)",
       y = "Frequency")

Create a QQ plot of crab size.
Facet your QQ plot so that you have 3 panels, with one panel for each site.

# base layer: ggplot
ggplot(data = pie_crab_clean, 
       aes(sample = size)) + # y-axis
  # first layer: QQ plot reference line
  geom_qq_line() + 
  # second layer: QQ plot points (actual observations)
  geom_qq(color = "orange") + 
  # faceting by site_name
  facet_wrap(~ site_name, 
             scales = "free") + # let axes vary between panels
  theme_minimal() +
  labs(x = "Theoretical (standard normal) quantiles",
       y = "Sample quantiles")

What are the outcomes of your visual checks?

Not perfect (crab carapace width from VCR LTER seems not normally distributed) but with large sample sizes for each group, this might not matter too much.

e. Check 2: equal variances

Do a gut check: is the ratio of the largest variance to the smallest variance less than 3?

8.63/4.04

[1] 2.136139

What are the outcomes of your variance check?

By the rule of thumb (ratio of largest variance to smallest variance < 3), the variances are close enough; the largest variance is about 2.1 times larger than the smallest variance.

f. ANOVA

# creating an object called crab_anova
crab_anova <- aov(size ~ site_name, # formula
                  data = pie_crab_clean) # data

# test object
crab_anova

Call:
   aov(formula = size ~ site_name, data = pie_crab_clean)

Terms:
                site_name Residuals
Sum of Squares   442.2073  497.4372
Deg. of Freedom         2        89

Residual standard error: 2.364145
Estimated effects may be unbalanced

# gives more information
summary(crab_anova)

            Df Sum Sq Mean Sq F value  Pr(>F)    
site_name    2  442.2  221.10   39.56 5.1e-13 ***
Residuals   89  497.4    5.59                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summarize results: is there a difference in mean crab carapace width between the three sites?

There is a difference in mean crab carapace width between Cape Cod, Virginia Coastal Reserve LTER, and Zeke’s Island NERR (one-way ANOVA, F(2, 89) = 39.6, p < 0.001, $\alpha$ = 0.05).

g. Post-hoc: Tukey HSD

TukeyHSD(crab_anova)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = size ~ site_name, data = pie_crab_clean)

$site_name
                                                      diff       lwr      upr
Virginia Coastal Reserve LTER-Zeke's Island NERR 4.2719524  2.869912 5.673993
Cape Cod-Zeke's Island NERR                      4.7514709  3.308099 6.194843
Cape Cod-Virginia Coastal Reserve LTER           0.4795185 -1.015317 1.974354
                                                     p adj
Virginia Coastal Reserve LTER-Zeke's Island NERR 0.0000000
Cape Cod-Zeke's Island NERR                      0.0000000
Cape Cod-Virginia Coastal Reserve LTER           0.7255994

Which pairwise comparisons are actually different? Which ones are not different?

Zeke’s Island NERR and Cape Cod crabs are different, and Zeke’s Island NERR and Virginia Coastal Reserve LTER crabs are different. Virginial Coastal Reserve LTER and Cape Cod crabs are not different.

h. effect size

Using eta_squared() from rstatix

eta_squared(crab_anova)

site_name 
0.4706113

What is the magnitude of the differences between sites in crab size?

There is a large difference in crab size between sites.

i. Putting everything together

We found a large ($\eta^2$ = 0.47) difference between sites in mean crab carapace width (one-way ANOVA, F(2, 89) = 39.6, p < 0.001, $\alpha$ = 0.05). The smallest crabs were from Zeke’s Island NERR, which were on average 12.1 mm. Zeke’s Island crabs were 4.8 mm (95% CI: [3.3, 6.2] mm) smaller than crabs from Cape Cod (Tukey’s HSD: p < 0.001) and 4.3 mm (95% CI: [2.9, 5.7] mm) smaller than crabs from Virginia Coastal Reserve LTER (Tukey’s HSD: p < 0.001).

3. Captions

Figure 1.

Figure 1. Crab carapace width differs across sites. Small black points represent measurements of crab carapace widths at three sites: Zeke’s Island National Estuarine Research Reserve (NERR), Virginia Coastal Reserve Long-Term Ecological Research (LTER) site, and Cape Cod. Large red points represent mean crab carapace width. Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via lterdatasampler package.

Figure 2.

Figure 2. Histograms of crab carapace width. Columns represent counts of individual crabs. Distributions of crab carapace width are represented in three panels: Zeke’s Island NERR (left), Virginia Coastal Reserve LTER (center), and Cape Cod (right). Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via lterdatasampler package.

Alternative option if you have a figure made using the same dataset as a previous figure:

Figure 2. Histograms of crab carapace widths. Bars represent counts of crab sizes. Distributions are shown for three sites: Zeke’s Island NERR (left), Virginia Coastal Reserve (center), and Cape Cod (right). Data source and access as in Figure 1.

Figure 3.

Figure 3. Quantile-Quantile (QQ) plots of crab carapace width. Points represent individual observations of crab carapace width. Lines represent reference line relating theoretical quantiles (x-axis) and sample quantiles (y-axis). Panels represent different sites: Zeke’s Island NERR (left), Virginia Coastal Reserve LTER (center), and Cape Cod (right). Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via lterdatasampler package.

Alternative option if you have a figure made using the same dataset as a previous figure:

Figure 3. Quantile-quantile (QQ) plots of crab carapace widths. Points represent individual crab carapace width measurements. Lines represent reference quantiles of sample (y-axis) and standard normal theoretical (x-axis) distributions. Panels, data source, and access as in Figure 2.

END OF WORKSHOP 5

Extra stuff

More applications of `stat_summary()`

stat_summary() in the ggplot2 package is a useful function to do quick summary visualizations within a plot.

In class, we used stat_summary() to plot mean carapace widths at each site in red (see the “Exploring the data” section above).

You can also use stat_summary() to visualize means and bars (representing standard deviation, standard error, confidence intervals, etc.).

For example, here is a plot using stat_summary() to visualize means and standard errors:

# base layer: ggplot
ggplot(pie_crab_clean, # use the clean data set
       aes(x = site_name, # x-axis
           y = size)) + # y-axis
  # first layer: jitter (each point is an individual crab)
  geom_jitter(height = 0, # don't jitter points vertically
              width = 0.15) + # narrower jitter (easier to see)
  # second layer: point representing mean
  stat_summary(geom = "pointrange", # geometry being plotted
               fun.data = mean_sdl, # function to calculate mean and sd (note new argument name)
               color = "red", # color to make it easier to see
               size = 1) + # larger point size
  theme_minimal() + # cleaner plot theme
  labs(x = "Site name",
       y = "Carapace width (mm)",
       caption = "Mean \U00B1 standard deviation")

And standard error:

# base layer: ggplot
ggplot(pie_crab_clean, # use the clean data set
       aes(x = site_name, # x-axis
           y = size)) + # y-axis
  # first layer: jitter (each point is an individual crab)
  geom_jitter(height = 0, # don't jitter points vertically
              width = 0.15) + # narrower jitter (easier to see)
  # second layer: point representing mean
  stat_summary(geom = "pointrange", # geometry being plotted
               fun.data = mean_se, # function to calculate mean and SE (note new argument name)
               color = "red", # color to make it easier to see
               size = 1) + # larger point size
  theme_minimal() + # cleaner plot theme
  labs(x = "Site name",
       y = "Carapace width (mm)",
       caption = "Mean \U00B1 standard error")

And a 95% confidence interval:

# base layer: ggplot
ggplot(pie_crab_clean, # use the clean data set
       aes(x = site_name, # x-axis
           y = size)) + # y-axis
  # first layer: jitter (each point is an individual crab)
  geom_jitter(height = 0, # don't jitter points vertically
              width = 0.15) + # narrower jitter (easier to see)
  # second layer: point representing mean
  stat_summary(geom = "pointrange", # geometry being plotted
               fun.data = mean_cl_normal, # function to calculate range for confidence interval (note new argument name)
               color = "red", # color to make it easier to see
               size = 1) + # larger point size
  theme_minimal() + # cleaner plot theme
  labs(x = "Site name",
       y = "Carapace width (mm)",
       caption = "Means with 95% confidence interval")

--- title: "Coding workshop: Week 5" subtitle: "Analysis of variance" categories: [tidyverse, lterdatasampler, rstatix, aov, summary, TukeyHSD, eta_squared, read_csv, pipe operators, '|>', select, rename, ggplot, geom_histogram, geom_qq, geom_qq_line, facet_wrap, group_by, summarize, as_factor, mutate] format: html: toc: true toc-depth: 8 --- [Workshop date: Friday 6 February]{style="color: #79ACBD; font-size: 24px;"} ## 1. Summary ### Packages - `tidyverse` - `lterdatasampler` - `rstatix` ### Operations #### New functions - make sure a variable is a factor using `as_factor()` - do an ANOVA using `aov()` - look for more information from model results using `summary()` - do post-hoc Tukey test using `TukeyHSD()` - calculate effect size for ANOVA using `rstatix::eta_squared()` #### Review - read in data using `read_csv()` - chain functions together using ` |> ` - modify columns using `mutate()` - select columns using `select()` - rename columns using `rename()` - visualize data using `ggplot()` - create histograms using `geom_histogram()` - visualize QQ plots using `geom_qq()` and `geom_qq_line()` - create multi-panel plots using `facet_wrap()` - group data using `group_by()` - summarize data using `summarize()` ### Data sources The Plum Island Ecosystem fiddler crab data is from `lterdatasampler` (data info [here](https://lter.github.io/lterdatasampler/articles/pie_crab_vignette.html)). Crab carapace width is measured in mm. ## 2. Code ### 1. Packages ```{r read-packages} #| eval: false library(tidyverse) # general use library(lterdatasampler) # data package library(rstatix) # effect size calculations data(pie_crab) # displaying data in Environment ``` ```{r data-for-website} #| echo: false #| message: false library(tidyverse) # general use library(lterdatasampler) # data package library(rstatix) # effect size calculations # dataset from lterdatasampler package data(pie_crab) ``` ### 2. Parametric tests #### a. Cleaning and wrangling - Create a new object called `pie_crab_clean`. - Filter to only include the following sites: Cape Cod, Virginia Coastal Reserve LTER, and Zeke's Island NERR. - Make sure `site` and `name` are set as factors. - Select the columns of interest: `site`, `name`, and `code`. - Rename the column `name` to `site_name` and `code` to `site_code`. ```{r crab-cleaning} pie_crab_clean <- pie_crab |> # start with the pie_crab dataset filter(site %in% c("CC", "ZI", "VCR")) |> # filter for Cape Cod, Zeke's Island, Virginia Coastal mutate(name = as_factor(name), # make sure site name is being read in as a factor site = as_factor(site)) |> select(site, name, size) |> # select columns of interest rename(site_code = site, # rename site to be site_code site_name = name) # rename name to be site_name # display some rows from the data frame # useful for showing a small part of the data frame slice_sample(pie_crab_clean, # data frame n = 10) # number of rows to show ``` #### b. Quick summary - Create a new object called `pie_crab_summary`. - Calculate the mean, variance, and sample size. Display the object. ```{r crab-summary} # creating a new object called pie_crab_summary pie_crab_summary <- pie_crab_clean |> # starting with clean data frame group_by(site_name) |> # group by site summarize(mean = mean(size), # calculate mean size var = var(size), # calculate variance of size n = length(size)) |> # calculate number of observations per site (sample size) ungroup() # ungrouping # display pie_crab_summary pie_crab_summary ``` #### c. Exploring the data Create a jitter plot with the mean crab carapace width for each site. ```{r crab-explore-viz} # base layer: ggplot ggplot(pie_crab_clean, # use the clean data set aes(x = site_name, # x-axis y = size)) + # y-axis # first layer: jitter (each point is an individual crab) geom_jitter(height = 0, # don't jitter points vertically width = 0.15) + # narrower jitter (easier to see) # second layer: point representing mean stat_summary(geom = "point", # geometry being plotted fun = mean, # function (calculating the mean) color = "red", # color to make it easier to see size = 5) + # larger point size theme_minimal() + # cleaner plot theme labs(x = "Site name", y = "Carapace width (mm)") ``` Is there a difference in mean crab carapace width between the three sites? **Yes, and Zeke's Island NERR crabs tend to be smaller than those from Cape Cod or Virginia Coastal Reserve LTER.** #### d. Check 1: normally distributed variable - Create a histogram of crab size. - Facet your histogram so that you have 3 panels, with one panel for each site. ```{r crab-histogram} # base layer: ggplot ggplot(data = pie_crab_clean, aes(x = size)) + # x-axis # first layer: histogram geom_histogram(bins = 6, # bins from Rice Rule (approximate) fill = "orange", color = "black") + # faceting by site_name: creating 3 different panels facet_wrap(~ site_name) + theme_minimal() + labs(x = "Crab carapace width (mm)", y = "Frequency") ``` - Create a QQ plot of crab size. - Facet your QQ plot so that you have 3 panels, with one panel for each site. ```{r crab-qq} # base layer: ggplot ggplot(data = pie_crab_clean, aes(sample = size)) + # y-axis # first layer: QQ plot reference line geom_qq_line() + # second layer: QQ plot points (actual observations) geom_qq(color = "orange") + # faceting by site_name facet_wrap(~ site_name, scales = "free") + # let axes vary between panels theme_minimal() + labs(x = "Theoretical (standard normal) quantiles", y = "Sample quantiles") ``` What are the outcomes of your visual checks? **Not perfect (crab carapace width from VCR LTER seems not normally distributed) but with large sample sizes for each group, this might not matter too much.** #### e. Check 2: equal variances Do a gut check: is the ratio of the largest variance to the smallest variance less than 3? ```{r crab-variances} 8.63/4.04 ``` What are the outcomes of your variance check? **By the rule of thumb (ratio of largest variance to smallest variance < 3), the variances are close enough; the largest variance is about 2.1 times larger than the smallest variance.** #### f. ANOVA ```{r crab-anova} # creating an object called crab_anova crab_anova <- aov(size ~ site_name, # formula data = pie_crab_clean) # data # test object crab_anova # gives more information summary(crab_anova) ``` Summarize results: is there a difference in mean crab carapace width between the three sites? **There is a difference in mean crab carapace width between Cape Cod, Virginia Coastal Reserve LTER, and Zeke's Island NERR (one-way ANOVA, F(2, 89) = 39.6, p < 0.001, $\alpha$ = 0.05).** #### g. Post-hoc: Tukey HSD ```{r crab-tukey} TukeyHSD(crab_anova) ``` Which pairwise comparisons are actually different? Which ones are not different? **Zeke's Island NERR and Cape Cod crabs are different, and Zeke's Island NERR and Virginia Coastal Reserve LTER crabs are different. Virginial Coastal Reserve LTER and Cape Cod crabs are not different.** #### h. effect size Using `eta_squared()` from `rstatix` ```{r crab-effectsize} eta_squared(crab_anova) ``` What is the magnitude of the differences between sites in crab size? **There is a large difference in crab size between sites.** #### i. Putting everything together We found a large ($\eta^2$ = 0.47) difference between sites in mean crab carapace width (one-way ANOVA, F(2, 89) = 39.6, p < 0.001, $\alpha$ = 0.05). The smallest crabs were from Zeke's Island NERR, which were on average 12.1 mm. Zeke's Island crabs were 4.8 mm (95% CI: [3.3, 6.2] mm) smaller than crabs from Cape Cod (Tukey's HSD: p < 0.001) and 4.3 mm (95% CI: [2.9, 5.7] mm) smaller than crabs from Virginia Coastal Reserve LTER (Tukey's HSD: p < 0.001). ### 3. Captions #### Figure 1. **Figure 1. Crab carapace width differs across sites.** Small black points represent measurements of crab carapace widths at three sites: Zeke's Island National Estuarine Research Reserve (NERR), Virginia Coastal Reserve Long-Term Ecological Research (LTER) site, and Cape Cod. Large red points represent mean crab carapace width. Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via `lterdatasampler` package. #### Figure 2. **Figure 2. Histograms of crab carapace width.** Columns represent counts of individual crabs. Distributions of crab carapace width are represented in three panels: Zeke's Island NERR (left), Virginia Coastal Reserve LTER (center), and Cape Cod (right). Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via `lterdatasampler` package. Alternative option if you have a figure made using the same dataset as a previous figure: **Figure 2. Histograms of crab carapace widths.** Bars represent counts of crab sizes. Distributions are shown for three sites: Zeke's Island NERR (left), Virginia Coastal Reserve (center), and Cape Cod (right). Data source and access as in Figure 1. #### Figure 3. **Figure 3. Quantile-Quantile (QQ) plots of crab carapace width.** Points represent individual observations of crab carapace width. Lines represent reference line relating theoretical quantiles (x-axis) and sample quantiles (y-axis). Panels represent different sites: Zeke's Island NERR (left), Virginia Coastal Reserve LTER (center), and Cape Cod (right). Data originally from: Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb. Accessed via `lterdatasampler` package. Alternative option if you have a figure made using the same dataset as a previous figure: **Figure 3. Quantile-quantile (QQ) plots of crab carapace widths.** Points represent individual crab carapace width measurements. Lines represent reference quantiles of sample (y-axis) and standard normal theoretical (x-axis) distributions. Panels, data source, and access as in Figure 2. **END OF WORKSHOP 5** ## Extra stuff ### More applications of `stat_summary()` `stat_summary()` in the `ggplot2` package is a useful function to do quick summary visualizations within a plot. In class, we used `stat_summary()` to plot mean carapace widths at each site in red (see the "Exploring the data" section above). You can also use `stat_summary()` to visualize means and bars (representing standard deviation, standard error, confidence intervals, etc.). For example, here is a plot using `stat_summary()` to visualize means and standard errors: ```{r} #| label: mean and SD # base layer: ggplot ggplot(pie_crab_clean, # use the clean data set aes(x = site_name, # x-axis y = size)) + # y-axis # first layer: jitter (each point is an individual crab) geom_jitter(height = 0, # don't jitter points vertically width = 0.15) + # narrower jitter (easier to see) # second layer: point representing mean stat_summary(geom = "pointrange", # geometry being plotted fun.data = mean_sdl, # function to calculate mean and sd (note new argument name) color = "red", # color to make it easier to see size = 1) + # larger point size theme_minimal() + # cleaner plot theme labs(x = "Site name", y = "Carapace width (mm)", caption = "Mean \U00B1 standard deviation") ``` And standard error: ```{r} #| label: mean and SE # base layer: ggplot ggplot(pie_crab_clean, # use the clean data set aes(x = site_name, # x-axis y = size)) + # y-axis # first layer: jitter (each point is an individual crab) geom_jitter(height = 0, # don't jitter points vertically width = 0.15) + # narrower jitter (easier to see) # second layer: point representing mean stat_summary(geom = "pointrange", # geometry being plotted fun.data = mean_se, # function to calculate mean and SE (note new argument name) color = "red", # color to make it easier to see size = 1) + # larger point size theme_minimal() + # cleaner plot theme labs(x = "Site name", y = "Carapace width (mm)", caption = "Mean \U00B1 standard error") ``` And a 95% confidence interval: ```{r} #| label: mean and confidence interval # base layer: ggplot ggplot(pie_crab_clean, # use the clean data set aes(x = site_name, # x-axis y = size)) + # y-axis # first layer: jitter (each point is an individual crab) geom_jitter(height = 0, # don't jitter points vertically width = 0.15) + # narrower jitter (easier to see) # second layer: point representing mean stat_summary(geom = "pointrange", # geometry being plotted fun.data = mean_cl_normal, # function to calculate range for confidence interval (note new argument name) color = "red", # color to make it easier to see size = 1) + # larger point size theme_minimal() + # cleaner plot theme labs(x = "Site name", y = "Carapace width (mm)", caption = "Means with 95% confidence interval") ```