Due on Thursday the 12th (Week 6) at 11:59 PM

Description

In this midterm, you will demonstrate your ability to synthesize lecture concepts and technical skills from workshop. At this point, you have the conceptual ideas you need (for example, what is the appropriate test to use if you want to compare groups?) and the technical skills you need (for example, summarizing data, visualizing data). You also have the investigative skills you need (for example, reading for tasks, googling!). You will use all these components to complete the midterm.

This midterm is open note, open internet, open everything; feel free to also talk with classmates and friends.

Set up tasks

As usual, create a new folder in your ENVS-193DS folder for midterm materials (a logical folder name would be midterm). Create an Rproject. Download all materials from Canvas into your midterm folder.

Start a new Quarto or RMarkdown document for your midterm. Make sure you read in your packages and data only at the top of the document.

Read in and store the data for Problem 3 as an object called tussocks.

Read in and store the data for Problem 4 as an object called temp.

Problems

Problem 1. Understanding and critiquing written communication

Skills you will demonstrate

In this problem, you will be responsible for demonstrating your understanding of the different components of a statistical test. All numbers describe a different statistical concept; how well do you understand what those numbers represent?

Description

The purple sea urchin (Strongylocentrotus purpuratus) fishery is a major player in California’s seafood exports. Purple sea urchins are valued for their gonads, which are eaten as uni.

Climate change is expected to influence urchin gonad weight (measured in grams). You’re a resource manager who wants to know how urchin gonad weight could differs between heated conditions (that represent future climate scenarios) and ambient temperatures (which urchins experience now). You find a study in which researchers examined this effect. In their results section, they wrote:

There was a difference in urchin gonad weight between heated and ambient treatments (Student’s two-sample t-test, t(53) = -13.5, p < 0.001).

Components

a. Hypotheses

The researchers didn’t state their hypotheses explicitly anywhere in the paper. In one sentence each, write the null ($H_0$) and alternative ($H_A$) hypotheses in statistical terms only.

b. Test type

The researchers ran a specific type of t-test. What must have been true for them to use this type, as opposed to any other? Respond in 1-2 sentences only.

c. Test summary components

In parentheses, the researchers cite some information about their test:

(Student’s two-sample t-test, t(53) = -13.51, p < 0.001)

In 1-2 sentences each, explain the meaning of:

t
53
-13.51
p < 0.001

Be specific in your explanation. What is the term or name that describes the component, and what does that component represent in the specific context of this type of test, hypotheses, or variables?

d. Missing information

The researchers reported their findings, but you know that there’s more they could have reported. Identify 2 other components or statistics they could have included (note that there may be more than 2 additional components that could make sense).

For each of your pieces of missing information, explain why it would be relevant to either addressing the hypotheses or building a biological narrative. Be specific in your explanation, using variables, groups, etc.

Real-world research

The ideas behind this problem are drawn from a paper in which researchers examined the effects of temperature and diet on metrics describing urchin gonads (e.g. weight, quality, absorption). The citation is below:

Azad, A. K. et al. 2011. “Effects of diet and temperature on ingestion, absorption, assimilation, gonad yield, and gonad quality of the purple sea urchin (Strongylocentrotus purpuratus)”. Aquaculture 317:1-4. https://doi.org/10.1016/j.aquaculture.2011.03.019.

Problem 2. Interpretation and communication

Skills you will demonstrate

In this problem, you will be responsible for interpreting the results of a test with which you may be familiar, but haven’t seen. You will also demonstrate your ability to interpret code output and synthesize the statistics in writing to ground the stats in biology for a scientific audience.

Description

In California, climate change is expected to make drought and precipitation regimes more unpredictable. When long droughts are interrupted by extreme precipitation, this phenomenon is called hydroclimate volatility or precipitation whiplash.

Precipitation whiplash may be a problem for agricultural production in the state, as crops depend on water availability to be productive. However, producers have also found ways to adapt to long dry spells such that major fluctuations in climate and precipitation may not impact production very much.

In this problem, you will be presented with a test that you have not seen before, but that you have the skills to interpret. The dataset comes from Kern County, CA which is a region of major agricultural production. The researchers used data on crop production from 2022 (an especially dry year) and 2023 (an especially wet year) to explore the potential effects of precipitation whiplash on production.

Before completing this problem, read about the study here.

Real world research

This work was led by Dr. Ashley Larsen, research professor in Bren. The full citation for this work is:

Larsen, A. E. et al. 2025. “Weathering the storm: precipitation whiplash has limited effects on agricultural production and pesticide use in California.” Environmental Research Letters 20. https://doi.org/10.1088/1748-9326/adf131.

In the study, the researchers write:

The general distribution of crops remained similar for most major crops, with pistachio, processing tomatoes, barley, corn for forage/fodder, wheat for forage/fodder and potato increasing by more than 1000 ha and almond, grape and cotton decreasing by more than 1000 ha. The analysis of county crop statistics echoed these results, with a non-significant change in total value, yields, and harvested hectares and a significant increase in total production in 2023, based on a paired t-test of 28 crops (table S3).

Components

a. Hypotheses

In one sentence each, write the null ($H_0$) and alternative ($H_A$) hypotheses in statistical terms only to answer the following question:

Does mean crop value (measured in 1000s of US dollars) across 28 crops in Kern County, CA differ between 2022 and 2023?

b. Statistical summary

The researchers include the components of their t-test, but a) summarize their analyses in the supplement rather than the main text and b) include all these components in a table rather than in writing. Both these choices are common when reporting statistics, but you need to understand what they mean.

Find the supplementary table in which the researchers summarize the results of their tests.

Compare this table with the results of a paired t-test run on their dataset below:


    Paired t-test

data:  production$`2022` and production$`2023`
t = -1.5211, df = 27, p-value = 0.1399
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -51595337   7664551
sample estimates:
mean difference 
      -21965393

Identify two pieces of information that are the same between the table in the paper and the test output above. Include their values.

Additionally, in 2-5 sentences total, identify 2 pieces of information the test output provides that the researchers do not and describe what these values mean in the context of this question, hypotheses, or interpretation. Use specific terms to show you understand the variables, what is presented in the table, and what is presented in the results of this test.

c. Writing a new statement

In 1-2 sentences only, write a new statement with the interpretation and results of this test from the output in part b. Be sure to include all components of the test summary in the parentheses that would be important for understanding the structure of the test.

Where necessary, round values to 2 decimal points and/or express as < 0.001.

d. Statistical implications

Imagine that you are an agricultural consultant working for a blueberry farmer in Kern County who is worried about how climate change is going to influence the value of their berries.

What and how would you communicate to this farmer about the implications of the results in parts b and c? How might they be able to prepare for potential uncertainty in drought and precipitation regimes in the state? Express your thoughts in 3-5 sentences, using the context for this study as Larsen et al. write in their introduction and discussion.

Problem 3. Reproducing an analysis

Skills you will demonstrate

In environmental studies, open research means that researchers make their data and/or their code available for anyone to see. This means that anyone should be able to reproduce the analysis, even if they are not on the research team. In this problem, you will demonstrate your ability to read a paper to understand the context for a research study and its statistical analysis. You will then demonstrate your ability to take a data set and analyze it, using the researchers’ original analysis as a guide.

Description

You will reproduce the analysis in: Steketee, J. K. et al. 2022. “Small Herbivores with Big Impacts: Tundra Voles (Microtus Oeconomus) Alter Post-Fire Ecosystem Dynamics.” Ecology 103:7. https://doi.org/10.1002/ecy.3689.

Specifically, you will recreate the analysis that allowed the researchers to make this statement (in bold):

Although tussock density was lower at the burned site, tussocks were larger and Eriophorum more abundant (Table 1, Figure 2a; Klupar et al., 2021). Average tussock mound diameters were larger by 4 cm (Table 1; p < 0.001).

Read all parts of the paper before starting this problem.

Getting the data

This statement relies on the dataset stored here:

Rocha, A. 2021. Tussock height and diameter in moist acidic tussock tundra at the site of the 2007 Anaktuvuk River fire scar, and nearby unburned tundra measured in 2016 ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/1dccd3fdb3aa693f9c2b69a24f8306ed.

Download the data into your midterm directory. To understand the data structure, read the metadata (under Resources > View full metadata > Data Entities).

Components

a. Variables

In 1-2 sentences each, describe:

the response variable (with units and what type of variable it is)
the predictor variable (include each group)
how the researchers measured the response variable

For “how the researchers measured the response variable”, cite the specific section of the paper in which you found their description of their process (e.g. the methods section in paragraph 2).

b. Hypotheses

In one sentence each, write the null ($H_0$) and alternative ($H_A$) hypotheses in statistical terms only.

c. Cleaning and organizing data

Create an object called tussocks_clean. In no particular order, clean the data (you should have stored this as tussocks) by:

cleaning all the column names
replacing the values in the site column as follows:
- replace Sev with Burned
- replace Unb with Unburned
selecting only the site and average_diameter column

Use the pipe operator to string functions together. Remember to annotate every new function with a description of what you specifically are doing in that line of code.

Once you are done, display 10 rows from tussocks_clean using the slice_sample() function.

d. Summarizing and table displays

Create an object called tussocks_summary using the tussocks_clean data frame to calculate the means and standard errors for the response variable within the two groups.

Round all numbers to 1 decimal point.

Display only the site, mean, and standard error values in a table using one of the following packages to do so:

gt
flextable

Make sure the column names of the table are polished (no underscores and capitalized in sentence case).

Show all code for your calculations and making/polishing the table. The only output displayed from this section should be your final table.

Double check your table!

Render your document to make sure your table looks right. Do not assume that just because your code works, your table will render correctly.

e. Check your assumptions

Check your assumptions using one visual tool and one statistical tool only.

You must check both core assumptions for this type of test. You must be able to check both assumptions using one tool (either visual or statistical) each.

In 1-2 sentences each, summarize your thoughts about your assumption checks. Be specific by describing the variable and visual components of the plot you made and results of the test.

f. Do a test

Conduct the appropriate test that matches with your hypotheses.

Calculate the appropriate effect size using a function from the rstatix package.

Make sure your annotated code and outputs for both the test and the effect size are visible.

g. Written communication

Again, the authors expressed their results as follows:

Average tussock mound diameters were larger by 4 cm.

In 1-2 sentences only, update this interpretation with results of the test, effect size, sample means, standard error, and sample sizes. Include a comparison of means, as the authors do.

Be sure to include all components of the test summary (for example, degrees of freedom, distribution, test statistic, confidence interval, etc.) that are important for understanding its structure.

h. Making a new figure

The authors do not visually represent this variable in their paper. Make a new figure that displays the mean and standard error (as the authors do in figure 3b for a different variable) showing the underlying data. For full credit:

take out the gridlines and make the plot and panel background white
jitter observations horizontally but not vertically
use colors for each site that are different from ggplot() defaults
use shapes for each site that are different from ggplot() defaults
use transparent open shapes to represent the underlying data
take out the legend
include a plot title
relabel the x- and y-axes
include the units on the y-axis label
put your name as the subtitle

i. Write a caption

In text (not in your code for your figure), write a caption for the figure you made in part h. Include a data citation.

Problem 4. Cleaning, wranging, and visualization

Skills you will demonstrate

Figures are built on data; however, to make a figure, you need to understand the data structure and any cleaning, wrangling, or summarizing steps to create it. In this problem, you will demonstrate your ability to clean, wrangle, and/or summarize a data set to create a figure, using a final figure as a guide.

Description

In this problem, you will use a data set of ocean temperature from the Santa Barbara Coastal LTER from three different sites: Isla Vista reef, Naples Reef, and Santa Cruz Island (Diablo).

You will need to recreate the following figure:

The caption for the figure is as follows:

Figure 2. Mean daily bottom temperatures at Isla Vista, Naples, and Santa Cruz Island. Lines represent mean daily temperatures calculated from hourly recorded temperature with loggers affixed to the ocean floor at Isla Vista (red), Naples (yellow), and Santa Cruz (Diablo) (green). Data from 2024 are shown. Data source: Santa Barbara Coastal LTER. Accessed February 2026.

Components

a. Initial cleaning, wrangling, and summarizing

The following chunk of code creates and object called temp_clean. Copy and paste this code into your document.

This code is missing pipe operators at the end of every line section. Be sure to add them in as you go.

Where prompted in the code annotations, fill in your responses to describe what each function is doing, and how the data frame changes. Only write your responses in the annotations. See 8 and 9 for an example.

temp_clean <- temp
  
  # 1. what package is this function from?
  # insert response here
  # 2. What changes after this function and how?
  # insert response here
  clean_names() 
  
  # 3. what package is year() from?
  # insert response here
  # 4. what is this combination of mutate() and year() doing?
  # insert response here
  mutate(year = year(date_local)) 
  
  # 5. what package is filter() from?
  # insert response here
  # 6. what is being filtered in this functional call?
  # insert response here
  # 7. what sites and year(s) do you expect to be included 
  # in the resulting data frame?
  # insert response here
  filter(site %in% c("NAPL", "SCDI", "IVEE") &
           year == "2024") 
  
  # 8. what column is manipulated?
  # the site column
  # 9. what changes about the values in this column?
  # the site acronyms are changed to full names
  mutate(site = case_when(
    site == "IVEE" ~ "Isla Vista",
    site == "NAPL" ~ "Naples",
    site == "SCDI" ~ "Santa Cruz - Diablo"
  )) 
  
  # 10. what is being calculated? and at what scale 
  # (e.g. annual, monthly, daily)?
  # insert response here
  # 11. give an example
  # insert response here
  group_by(site, date_local) |> 
  summarize(mean_daily = mean(temp_c)) |> 
  ungroup()

Once you are done, display 5 rows from the data frame using slice_sample().

b. Make the figure

Recreate the figure. Specifically, recreate the:

x-axis and label
y-axis and label (with ° symbol)
title and title position
data source text and location
legend position (doesn’t have to be exact, but does have to be in the top left)
panel background (blank)
axis lines and ticks
geometry (hint: there is only one), and
different colors (you don’t have to use these exact ones - just use different colors for each site that are different from ggplot defaults)

c. Interpretation

In 1-3 sentences total, summarize:

when the highest temperatures are recorded
when the lowest temperatures are recorded
and what visual components of the plot demonstrate these points.

My figure doesn’t look right when I render my document.

You will have to set the code chunk options to make a larger figure. Click the gear button in the code chunk to set those options.

--- title: "Midterm" editor: source published-title: "Due date" date: 2026-02-12 date-modified: last-modified format: html: toc: true toc-depth: 5 --- [Due on Thursday the 12th (Week 6) at 11:59 PM]{style="color: #79ACBD; font-size: 24px;"} # Description In this midterm, you will demonstrate your ability to synthesize lecture concepts and technical skills from workshop. At this point, you have the _conceptual_ ideas you need (for example, what is the appropriate test to use if you want to compare groups?) and the _technical_ skills you need (for example, summarizing data, visualizing data). You also have the _investigative_ skills you need (for example, reading for tasks, googling!). You will use all these components to complete the midterm. This midterm is open note, open internet, open everything; feel free to also talk with classmates and friends. ```{r set-up} #| message: false #| echo: false #| warning: false library(tidyverse) library(here) library(readxl) library(rstatix) library(janitor) production <- read_csv(here("data", "CropReport.csv")) |> clean_names() |> filter(!(crop %in% c("Citrus, All", "Grapes, All"))) |> select(crop, year, tot_value) |> pivot_wider(id_cols = crop, names_from = year, values_from = tot_value) ``` # Set up tasks As usual, create a new folder in your `ENVS-193DS` folder for midterm materials (a logical folder name would be `midterm`). Create an Rproject. Download all materials from Canvas into your `midterm` folder. Start a new Quarto or RMarkdown document for your midterm. Make sure you read in your packages and data _only at the top of the document_. Read in and store the data for Problem 3 as an object called `tussocks`. Read in and store the data for Problem 4 as an object called `temp`. # Problems ## Problem 1. Understanding and critiquing written communication ### Skills you will demonstrate In this problem, you will be responsible for demonstrating your understanding of the different components of a statistical test. All numbers describe a different statistical concept; how well do you understand what those numbers represent? ### Description The purple sea urchin (_Strongylocentrotus purpuratus_) fishery is a major player in California's seafood exports. Purple sea urchins are valued for their gonads, which are eaten as uni. Climate change is expected to influence urchin gonad weight (measured in grams). You're a resource manager who wants to know how urchin gonad weight could differs between heated conditions (that represent future climate scenarios) and ambient temperatures (which urchins experience now). You find a study in which researchers examined this effect. In their results section, they wrote: > There was a difference in urchin gonad weight between heated and ambient treatments (Student's two-sample t-test, t(53) = -13.5, p < 0.001). ### Components #### a. Hypotheses The researchers didn't state their hypotheses explicitly anywhere in the paper. In one sentence _each_, write the null ($H_0$) and alternative ($H_A$) hypotheses in _statistical terms only_. #### b. Test type The researchers ran a specific type of t-test. What must have been true for them to use this type, as opposed to any other? Respond in 1-2 sentences _only_. #### c. Test summary components In parentheses, the researchers cite some information about their test: > (Student's two-sample t-test, t(53) = -13.51, p < 0.001) In 1-2 sentences _each_, explain the meaning of: - t - 53 - -13.51 - p < 0.001 **Be specific in your explanation.** What is the term or name that describes the component, and what does that component represent in the specific context of this type of test, hypotheses, or variables? #### d. Missing information The researchers reported their findings, but you know that there's more they could have reported. Identify 2 other components or statistics they could have included (note that there may be more than 2 additional components that could make sense). For each of your pieces of missing information, explain why it would be relevant to either addressing the hypotheses or building a biological narrative. **Be specific in your explanation**, using variables, groups, etc. :::{.callout-note title="Real-world research" collapse="true"} The ideas behind this problem are drawn from a paper in which researchers examined the effects of temperature and diet on metrics describing urchin gonads (e.g. weight, quality, absorption). The citation is below: Azad, A. K. et al. 2011. "Effects of diet and temperature on ingestion, absorption, assimilation, gonad yield, and gonad quality of the purple sea urchin (_Strongylocentrotus purpuratus_)". _Aquaculture_ 317:1-4. [https://doi.org/10.1016/j.aquaculture.2011.03.019](https://doi.org/10.1016/j.aquaculture.2011.03.019). ::: ## Problem 2. Interpretation and communication ### Skills you will demonstrate In this problem, you will be responsible for interpreting the results of a test with which you may be familiar, but haven’t seen. You will also demonstrate your ability to interpret code output and synthesize the statistics in writing to ground the stats in biology for a scientific audience. ### Description In California, climate change is expected to make drought and precipitation regimes more unpredictable. When long droughts are interrupted by extreme precipitation, this phenomenon is called hydroclimate volatility or precipitation whiplash. Precipitation whiplash may be a problem for agricultural production in the state, as crops depend on water availability to be productive. However, producers have also found ways to adapt to long dry spells such that major fluctuations in climate and precipitation may not impact production very much. In this problem, you will be presented with a test that you have not seen before, but that you have the skills to interpret. The dataset comes from Kern County, CA which is a region of major agricultural production. The researchers used data on crop production from 2022 (an especially dry year) and 2023 (an especially wet year) to explore the potential effects of precipitation whiplash on production. Before completing this problem, read about the study [here](https://doi.org/10.1088/1748-9326/adf131). :::{.callout-note title="Real world research" collapse="true"} This work was led by Dr. Ashley Larsen, research professor in Bren. The full citation for this work is: Larsen, A. E. et al. 2025. "Weathering the storm: precipitation whiplash has limited effects on agricultural production and pesticide use in California." _Environmental Research Letters_ 20. [https://doi.org/10.1088/1748-9326/adf131](https://doi.org/10.1088/1748-9326/adf131). ::: In the study, the researchers write: > The general distribution of crops remained similar for most major crops, with pistachio, processing tomatoes, barley, corn for forage/fodder, wheat for forage/fodder and potato increasing by more than 1000 ha and almond, grape and cotton decreasing by more than 1000 ha. **The analysis of county crop statistics echoed these results, with a non-significant change in total value, yields, and harvested hectares and a significant increase in total production in 2023, based on a paired t-test of 28 crops (table S3).** ### Components #### a. Hypotheses In one sentence _each_, write the null ($H_0$) and alternative ($H_A$) hypotheses in _statistical terms only_ to answer the following question: **Does mean crop value (measured in 1000s of US dollars) across 28 crops in Kern County, CA differ between 2022 and 2023?** #### b. Statistical summary The researchers include the components of their t-test, but a) summarize their analyses in the supplement rather than the main text and b) include all these components in a table rather than in writing. Both these choices are common when reporting statistics, but you need to understand what they mean. Find the supplementary table in which the researchers summarize the results of their tests. Compare this table with the results of a paired t-test run on their dataset below: ```{r} #| echo: false t.test(production$`2022`, production$`2023`, paired = TRUE) ``` Identify **two pieces of information that are the same between the table in the paper and the test output above**. Include their values. Additionally, in 2-5 sentences _total_, identify **2 pieces of information the test output provides that the researchers do not** and describe what these values mean in the context of this question, hypotheses, or interpretation. Use specific terms to show you understand the variables, what is presented in the table, and what is presented in the results of this test. #### c. Writing a new statement In 1-2 sentences _only_, write a new statement with the interpretation and results of this test from the output in part b. Be sure to include all components of the test summary in the parentheses that would be important for understanding the structure of the test. Where necessary, round values to 2 decimal points and/or express as < 0.001. #### d. Statistical implications Imagine that you are an agricultural consultant working for a blueberry farmer in Kern County who is worried about how climate change is going to influence the value of their berries. What and how would you communicate to this farmer about the implications of the results in parts b and c? How might they be able to prepare for potential uncertainty in drought and precipitation regimes in the state? Express your thoughts in 3-5 sentences, using the context for this study as Larsen et al. write in their introduction and discussion. ## Problem 3. Reproducing an analysis ### Skills you will demonstrate In environmental studies, open research means that researchers make their data and/or their code available for anyone to see. This means that anyone should be able to reproduce the analysis, even if they are not on the research team. In this problem, you will demonstrate your ability to read a paper to understand the context for a research study and its statistical analysis. You will then demonstrate your ability to take a data set and analyze it, using the researchers’ original analysis as a guide. ### Description You will reproduce the analysis in: Steketee, J. K. et al. 2022. “Small Herbivores with Big Impacts: Tundra Voles (Microtus Oeconomus) Alter Post-Fire Ecosystem Dynamics.” _Ecology_ 103:7. [https://doi.org/10.1002/ecy.3689](https://doi.org/10.1002/ecy.3689). Specifically, you will recreate the analysis that allowed the researchers to make this statement (in **bold**): > Although tussock density was lower at the burned site, tussocks were larger and Eriophorum more abundant (Table 1, Figure 2a; Klupar et al., 2021). **Average tussock mound diameters were larger by 4 cm** (Table 1; p < 0.001). **Read all parts of the paper before starting this problem.** ### Getting the data This statement relies on the dataset stored here: Rocha, A. 2021. Tussock height and diameter in moist acidic tussock tundra at the site of the 2007 Anaktuvuk River fire scar, and nearby unburned tundra measured in 2016 ver 1. Environmental Data Initiative. [https://doi.org/10.6073/pasta/1dccd3fdb3aa693f9c2b69a24f8306ed](https://doi.org/10.6073/pasta/1dccd3fdb3aa693f9c2b69a24f8306ed). Download the data into your midterm directory. To understand the data structure, read the metadata (under Resources > View full metadata > Data Entities). ### Components #### a. Variables In 1-2 sentences each, describe: - the response variable (with units and what type of variable it is) - the predictor variable (include each group) - how the researchers measured the response variable For "how the researchers measured the response variable", cite the specific section of the paper in which you found their description of their process (e.g. the methods section in paragraph 2). #### b. Hypotheses In one sentence _each_, write the null ($H_0$) and alternative ($H_A$) hypotheses in _statistical terms only_. #### c. Cleaning and organizing data Create an object called `tussocks_clean`. In no particular order, clean the data (you should have stored this as `tussocks`) by: - cleaning _all_ the column names - replacing the values in the `site` column as follows: - replace `Sev` with `Burned` - replace `Unb` with `Unburned` - selecting only the `site` and `average_diameter` column Use the pipe operator to string functions together. Remember to annotate every new function with a description of what _you specifically_ are doing in that line of code. Once you are done, display 10 rows from `tussocks_clean` using the `slice_sample()` function. #### d. Summarizing and table displays Create an object called `tussocks_summary` using the `tussocks_clean` data frame to calculate the means and standard errors for the response variable within the two groups. Round all numbers to 1 decimal point. Display **only the site, mean, and standard error** values in a **table** using one of the following packages to do so: - `gt` - `flextable` Make sure the column names of the table are **polished** (no underscores and capitalized in sentence case). Show _all_ code for your calculations and making/polishing the table. **The only output displayed from this section should be your final table.** :::{.callout-note title="Double check your table!" collapse="true"} Render your document to make sure your table looks right. **Do not assume that just because your code works, your table will render correctly.** ::: #### e. Check your assumptions Check your assumptions using **one visual tool** and **one statistical tool** only. You must check **both core assumptions for this type of test.** You must be able to check **both assumptions using one tool (either visual or statistical) each.** In 1-2 sentences each, summarize your thoughts about your assumption checks. Be specific by describing the variable and visual components of the plot you made and results of the test. #### f. Do a test Conduct the appropriate test that matches with your hypotheses. Calculate the appropriate effect size using a function from the `rstatix` package. Make sure your annotated code and outputs for both the test and the effect size are visible. #### g. Written communication Again, the authors expressed their results as follows: > Average tussock mound diameters were larger by 4 cm. In 1-2 sentences _only_, update this interpretation with results of the test, effect size, sample means, standard error, and sample sizes. Include a comparison of means, as the authors do. Be sure to include all components of the test summary (for example, degrees of freedom, distribution, test statistic, confidence interval, etc.) that are important for understanding its structure. #### h. Making a new figure The authors do not visually represent this variable in their paper. Make a new figure that displays the mean and standard error (as the authors do in figure 3b for a different variable) showing the underlying data. For full credit: - take out the gridlines and make the plot and panel background white - jitter observations horizontally but not vertically - use colors for each site that are different from `ggplot()` defaults - use shapes for each site that are different from `ggplot()` defaults - use transparent open shapes to represent the underlying data - take out the legend - include a plot title - relabel the x- and y-axes - include the units on the y-axis label - put your name as the subtitle #### i. Write a caption In text (not in your code for your figure), write a caption for the figure you made in part h. Include a data citation. ## Problem 4. Cleaning, wranging, and visualization ### Skills you will demonstrate Figures are built on data; however, to make a figure, you need to understand the data structure and any cleaning, wrangling, or summarizing steps to create it. In this problem, you will demonstrate your ability to clean, wrangle, and/or summarize a data set to create a figure, using a final figure as a guide. ### Description In this problem, you will use a data set of ocean temperature from the Santa Barbara Coastal LTER from three different sites: Isla Vista reef, Naples Reef, and Santa Cruz Island (Diablo). You will need to recreate the following figure: ![](/assignments/images/midterm/sbc-temps-fig.png) The caption for the figure is as follows: > **Figure 2. Mean daily bottom temperatures at Isla Vista, Naples, and Santa Cruz Island.** Lines represent mean daily temperatures calculated from hourly recorded temperature with loggers affixed to the ocean floor at Isla Vista (red), Naples (yellow), and Santa Cruz (Diablo) (green). Data from 2024 are shown. Data source: Santa Barbara Coastal LTER. Accessed February 2026. ### Components #### a. Initial cleaning, wrangling, and summarizing The following chunk of code creates and object called `temp_clean`. Copy and paste this code into your document. This code is missing pipe operators at the end of every line section. Be sure to add them in as you go. Where prompted in the code annotations, fill in your responses to describe what each function is doing, and how the data frame changes. _Only write your responses in the annotations._ See 8 and 9 for an example. ```{r} #| eval: false temp_clean <- temp # 1. what package is this function from? # insert response here # 2. What changes after this function and how? # insert response here clean_names() # 3. what package is year() from? # insert response here # 4. what is this combination of mutate() and year() doing? # insert response here mutate(year = year(date_local)) # 5. what package is filter() from? # insert response here # 6. what is being filtered in this functional call? # insert response here # 7. what sites and year(s) do you expect to be included # in the resulting data frame? # insert response here filter(site %in% c("NAPL", "SCDI", "IVEE") & year == "2024") # 8. what column is manipulated? # the site column # 9. what changes about the values in this column? # the site acronyms are changed to full names mutate(site = case_when( site == "IVEE" ~ "Isla Vista", site == "NAPL" ~ "Naples", site == "SCDI" ~ "Santa Cruz - Diablo" )) # 10. what is being calculated? and at what scale # (e.g. annual, monthly, daily)? # insert response here # 11. give an example # insert response here group_by(site, date_local) |> summarize(mean_daily = mean(temp_c)) |> ungroup() ``` Once you are done, display 5 rows from the data frame using `slice_sample()`. #### b. Make the figure Recreate the figure. Specifically, recreate the: - x-axis and label - y-axis and label (with ° symbol) - title and title position - data source text and location - legend position (doesn't have to be exact, but does have to be in the top left) - panel background (blank) - axis lines and ticks - geometry (hint: there is only one), and - different colors (you don't have to use these exact ones - just use different colors for each site that are different from `ggplot` defaults) #### c. Interpretation In 1-3 sentences _total_, summarize: - when the highest temperatures are recorded - when the lowest temperatures are recorded - and what visual components of the plot demonstrate these points. :::{.callout-note collapse="true" title="My figure doesn't look right when I render my document."} You will have to set the code chunk options to make a larger figure. Click the gear button in the code chunk to set those options. ::: # Checklist Your submission should: - [ ] include your name, the title, and the date - [ ] include all code with annotations - [ ] be uploaded to Canvas as a single PDF (not .docx) - [ ] be organized and readable (for example: no messages, warnings, etc., text is formatted correctly with subscripts where necessary, text and headers are clearly different, code annotations should not run off page) Additionally, your submission should only include the components listed below: - [ ] a set up chunk at the top of the document, where you have read in your packages and your data for Problem 1: - [ ] written responses for a-d for Problem 2: - [ ] written responses for a-d for Problem 3: - [ ] written responses for a-b - [ ] annotated code for c - [ ] annotated code and table for d - [ ] annotated code and outputs (visual and statistical) and written response for e - [ ] annotated code and outputs for f - [ ] written response for g - [ ] annotated code and output for h - [ ] written response for i for Problem 4: - [ ] responses in annotated code and `slice_sample()` output for a - [ ] annotated code and figure for b - [ ] written response for c Additionally, you should have: - [ ] submitted your [Generative AI Statement of Use](https://docs.google.com/forms/d/e/1FAIpQLScQOUQERT9c_HI8PQ3hMDw5WpaQmtTnUF82mXdZJMUoewGY6A/viewform) **by Thursday 12 February at 11:59 PM** Lastly, check out the **rubric on Canvas** to see the point breakdown in more detail. ## General formatting components For annotations, you should have meaningful comments for: - each line of visualization code and/or ggplot geom/theme call (not needed for each argument, though good to have) - each function in any piping operations - set up code to denote where packages and/or data are read in - each argument of a test call (e.g. `var.test()`, `t.test()`) For readability, you should make sure that: - all messages/warnings are hidden - all code is contained in code chunks (double check line breaks in comments once you render your document) - all text is where it’s supposed to be (all components like headers, main text, superscripts/subscripts, etc. show up correctly) - code includes carriage returns, spaces, etc. to make pipe operations and arguments clear - code includes consistent spacing and indents