Winter 2026 - Homework 2

Due on Thursday 29 January (Week 4) at 11:59 PM

Read the instructions carefully and double check that you have everything on the checklist.

Part 1. Tasks

Remember that you are not expected to turn anything in for tasks, but you should complete all the material.

Task 1. Set up your folders and Rproject.

a. Create a new folder for this homework assignment within your `ENVS-193DS` folder.

Within your ENVS-193DS folder, create a new folder for Homework 2. Name it whatever you want (a logical name could be homework-02).

b. Download the files from Canvas

Download the homework files from Canvas into your homework folder. This includes:

lobster-catch-data.csv

c. Create an Rproject for this homework assignment

Create an Rproj file within your homework-02 folder. If you need help with this, watch the “Creating an Rproject” video on Canvas.

d. At the top of your document, insert a code chunk. Write code to load in the packages you need.

You will probably need to use the tidyverse, and potentially janitor.

e. Store the data as an object called `lobsters`.

Refer to workshop code if you are lost.

Task 2. Enter your data for your personal data project.

a. Create a spreadsheet to enter your data.

Two good options include Google Sheets or Microsoft Excel. This will be the spreadsheet that you continually update with data for new observations, so store it in a logical place so that you can find it later.

b. Create the columns of your spreadsheet.

If you have organized your data sheet in “long” format, in which each row is an observation, then your spreadsheet columns will be the same as your data sheet columns. If not, that’s ok; just make sure you’re entering your data in a way that makes sense based on how you collected it.

c. Enter your data.

Double check your values!

d. Save your spreadsheet as a .csv file in your `homework-02` folder.

e. Read your data into R.

Include the code to to this at the top of the template. You will want to do this the same way you read any data into R: by creating a new object (you could call this my_data) and using the left arrow operator to store and read in the data using read_csv().

You are now ready to start your homework!

Part 2. Problems

Problem 1. Native bunchgrasses

Managers are interested in the recovery of a native bunchgrass species, California fescue (Festuca californica) in a local oak woodland. You are conducting surveys in different sites within the oak woodland, and count the following numbers of individuals in 13 different sites:

\[ 4, 6, 4, 5, 5, 4, 2, 3, 6, 4, 2, 0, 1 \]

What kind of data did you collect, and why? Explain in 1-2 sentences.
What is a better description of the variability in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units.
What is a better description of the uncertainty in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units.

Problem 2. Lobster weights

Marine protected areas (MPAs) off the coast of California designate regions of the ocean in which fishing is not allowed. As a result, many fished species have a spatial refuge to grow and reproduce.

The California spiny lobster (Panulirus interruptus) is a major part of the state’s fisheries. In the commercial fishery, lobsters are caught in traps (example here) that can contain multiple individual lobsters.

In this problem, you will visualize and analyze data to answer the question: is there a difference in mean trapped lobster weight (Panulirus interruptus) between an MPA and a non-MPA site?

Data format

In the form that the data is presented to you (prior to any cleaning and wrangling), each row represents the total weight of lobsters in a single trap.

Create a new object called lobsters_clean to clean and wrangle the data. In no particular order:

clean the column names so that only lower case letters and underscores are used
change the MPA/non-MPA designations to be full words (i.e. change MPA to “Marine Protected Area and”non-MPA” to be “not protected”)
select columns such that the only columns present in the data frame are 1) a column with the MPA designation and 2) the weight of lobsters in the trap in pounds

Once you are done with your cleaning and wrangling, display 10 random rows from the data frame using slice_sample().

Not sure how to use a new function?

Look at the help page (type ?slice_sample in the console and hit enter).

Do not subset your data!

If you are piping slice_sample() into your cleaning and wrangling code, you are essentially choosing 10 random observations with which to do everything else.

DO NOT DO THIS.

Use slice_sample() to display rows, as in: show 10 random rows from lobsters_clean, but use the entire lobsters_clean data frame for downstream visualization and analyses.

Create a new object called lobster_summary to calculate the mean trapped lobster weight, the number of observations, and the standard deviation.

Display the lobster_summary data frame.

Create a plot showing mean trapped lobster weight and standard deviation, along with the underlying data (i.e. the raw observations) jittered horizontally. Use an approach from class. Additionally:
- color by site and change the colors from the default
- make sure each site has a different shape
- use a ggplot theme that is not the default
- make sure the legend is not showing
- label the x- and y-axes
In one sentence, write your hypotheses to answer the question: is there a difference in mean trapped weight of California spiny lobster between an MPA and a non-MPA site in statistical terms. Make sure you have a null and alternative hypothesis.
Using the lobsters_clean object, make a QQ plot. Make sure there are two panels for each location. You do not need to label the x and y-axes.
In one sentence, describe whether the variable of interest is normally distributed or not. Use visual components (e.g. shape) of the QQ plot you made to justify your characterization of the variable.
Check your variances using var.test(). Show the code and output.

In one sentence, describe whether the groups have equal variances or not.

Do a t-test using t.test(). Show the code and output.

t-test arguments

Double check your arguments to make sure you’re running the right test.

In one sentence each, describe:
- Why a t-test would have been appropriate for testing your hypothesis in part d
- How you evaluated normality and homogeneity of variance

Describe the results in 1-2 full sentences in your own words. Make sure to include the:
- Test you ran
- Number of observations for each location
- Significance level
- Degrees of freedom
- Test statistic
- p-value

Round any numbers with decimals to two decimal points.

Do not simply list each component!

You will be graded on how you synthesize information. See lecture notes for an example of how to summarize the results of a statistical test.

Problem 3. Personal data

By now, you have some observations on your data sheet for your personal data. Even though it’s early on in your data collection, it’s a good idea to practice good data management. For this problem, you’ll enter your data, read it into R, and create a visualization. If you get stuck at any step, you’ll know there’s something you need to fix.

Create a visualization with a categorical predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to:

label your x- and y-axes
include a title summarizing the main message of your figure
use different colors from the ggplot() defaults
use a different theme from the ggplot() default

Create a visualization with a continuous or discrete predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to:

label your x- and y-axes
include a title summarizing the main message of your figure
use different colors from the ggplot() defaults
use a different theme from the ggplot() default

In 2-5 sentences, describe what insights you can gain about your data from visualizations like these. Use specific components of the figure in your description of your insights. Once you collect more data and update these figures, would these insights change? Again, use specific components of the figure that would change.
In 2-5 sentences, describe the process of getting your data from your spreadsheet into R. Did you encounter any challenges? If so, why do you think those challenges arose, and how did you fix them? If not, why do you think your system for collecting your data worked?

Changing your data collection scheme

If you found that entering your data from your spreadsheet and getting it into the right format to be used in R was challenging, that’s ok! This happens a lot with data collection. Feel free to change your data sheet so that you’re collecting data in a way that makes your life easier as you’re reading your data into R and using it.

Problem 4. Statistical critique

Check the Google sheet and choose a paper to use for your critique based on An’s recommendations. Answer the following questions about the paper in 1-2 sentences each:

Why were you interested in this paper?
What questions/hypotheses are the authors addressing?
Which statistical test (from Homework 1) is included in this paper? What is the response variable? What is the predictor variable?
How does this test address the main question(s) presented by the authors? For example, how would the authors interpret a “significant” result?
Find the figure(s) and/or table(s) in the paper that are associated with the statistical tests from question (c). If there are multiple figures and tables relating to the statistical test, find the best one that demonstrates the relationship between the predictor and the response. Take a screenshot, and insert it into your document.

Make sure your screenshot is visible in your final document!

If your screenshot for part e is not visible, part f will be given “partial marks”.

My paper doesn’t include a figure or table that is associated with the statistical test.

First, check with An to make sure that’s actually the case.

If An verifies that indeed your paper does not include a figure or table that is related to the focal test, copy and paste the text that summarizes the test with a reference to the section (e.g. Results) and paragraph (e.g. paragraph 3) where that text can be found.

If you have a figure: describe the x- and y-axes, and what the figure is supposed to show (i.e. what is the main message of the figure)? If you have a table, what are the rows and columns, what is in each cell of the table, and what is the table supposed to demonstrate?

If you have text: what is your own interpretation of the summarized results text? In other words, what have you learned about the biological system specifically from the statistical results?

Assignment checklist

Your assignment should:

include your name, the title, and the date
include all code with annotations
be organized and readable (for example: no messages, warnings, etc., text is formatted correctly with subscripts or mathematical notation where necessary, text and headers are clearly different)
be uploaded to Canvas as a single docx or PDF

Your responses should include:

work and written responses for Problem 1
written responses, annotated code, and figure outputs for Problem 2
annotated code, figure output, and written responses for Problem 3
written responses and screenshot for Problem 4

Additionally, you should have:

submitted your Generative AI Statement of Use by Thursday 29 January at 11:59 PM

Lastly, check out the rubric on Canvas to see the point breakdown in more detail.

General formatting components

You will only receive full marks for annotations if you have meaningful comments for:

each line of visualization code and/or ggplot geom/theme call (not needed for each argument, though good to have)
each function in any piping operations
set up code to denote where packages and/or data are read in
each argument of a test call (e.g. var.test(), t.test())

You will only receive full marks for readability if:

all messages/warnings are hidden
all code is contained in code chunks (double check line breaks in comments once you render your document)
all text is where it’s supposed to be (all components like headers, main text, superscripts/subscripts, etc. show up correctly)
code includes carriage returns, spaces, etc. to make pipe operations and arguments clear
code includes consistent spacing and indents

--- title: "Homework 2" editor: source published-title: "Due date" date: 2026-01-29 date-modified: last-modified --- [Due on Thursday 29 January (Week 4) at 11:59 PM]{style="color: #79ACBD; font-size: 24px;"} Read the instructions carefully and double check that you have everything on the checklist. ## Part 1. Tasks Remember that you are not expected to turn anything in for tasks, but you should complete all the material. ### Task 1. Set up your folders and Rproject. #### a. Create a new folder for this homework assignment within your `ENVS-193DS` folder. Within your `ENVS-193DS` folder, create a _new_ folder for Homework 2. Name it whatever you want (a logical name could be `homework-02`). #### b. Download the files from Canvas Download the homework files from Canvas into your homework folder. This includes: - `lobster-catch-data.csv` #### c. Create an Rproject for this homework assignment Create an Rproj file within your `homework-02` folder. If you need help with this, watch the "Creating an Rproject" video on Canvas. #### d. At the top of your document, insert a code chunk. Write code to load in the packages you need. You will probably need to use the `tidyverse`, and potentially `janitor`. #### e. Store the data as an object called `lobsters`. Refer to workshop code if you are lost. ### Task 2. Enter your data for your personal data project. #### a. Create a spreadsheet to enter your data. Two good options include Google Sheets or Microsoft Excel. This will be the spreadsheet that you continually update with data for new observations, so store it in a logical place so that you can find it later. #### b. Create the columns of your spreadsheet. If you have organized your data sheet in "long" format, in which each row is an observation, then your spreadsheet columns will be the same as your data sheet columns. If not, that's ok; just make sure you're entering your data in a way that makes sense based on how you collected it. #### c. Enter your data. Double check your values! #### d. Save your spreadsheet as a .csv file in your `homework-02` folder. #### e. Read your data into R. Include the code to to this at the top of the template. You will want to do this the same way you read any data into R: by creating a new object (you could call this `my_data`) and using the left arrow operator to store and read in the data using `read_csv()`. #### You are now ready to start your homework! ## Part 2. Problems ### Problem 1. Native bunchgrasses Managers are interested in the recovery of a native bunchgrass species, California fescue (_Festuca californica_) in a local oak woodland. You are conducting surveys in different sites within the oak woodland, and count the following numbers of individuals in 13 different sites: $$ 4, 6, 4, 5, 5, 4, 2, 3, 6, 4, 2, 0, 1 $$ a. What kind of data did you collect, and why? Explain in 1-2 sentences. b. What is a better description of the variability in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units. c. What is a better description of the uncertainty in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units. ### Problem 2. Lobster weights Marine protected areas (MPAs) off the coast of California designate regions of the ocean in which fishing is not allowed. As a result, many fished species have a spatial refuge to grow and reproduce. The California spiny lobster (_Panulirus interruptus_) is a major part of the state's fisheries. In the commercial fishery, lobsters are caught in traps (example [here](https://www.nationalfisherman.com/west-coast-pacific/calif-commercial-fisherman-convicted-for-abandoning-traps)) that can contain multiple individual lobsters. In this problem, you will visualize and analyze data to answer the question: **is there a difference in mean trapped lobster weight (_Panulirus interruptus_) between an MPA and a non-MPA site?** :::{.callout-note title="Data format" collapse=true} In the form that the data is presented to you (prior to any cleaning and wrangling), each row represents the total weight of lobsters in a single trap. ::: a. Create a new object called `lobsters_clean` to clean and wrangle the data. In no particular order: - clean the column names so that only lower case letters and underscores are used - change the MPA/non-MPA designations to be full words (i.e. change MPA to "Marine Protected Area and "non-MPA" to be "not protected") - select columns such that the only columns present in the data frame are 1) a column with the MPA designation and 2) the weight of lobsters in the trap in pounds Once you are done with your cleaning and wrangling, display 10 random rows from the data frame using `slice_sample()`. :::{.callout-tip title="Not sure how to use a new function?" collapse=true} Look at the help page (type `?slice_sample` in the console and hit enter). ::: :::{.callout-warning title="Do not subset your data!" collapse=true} If you are piping `slice_sample()` into your cleaning and wrangling code, you are essentially choosing 10 random observations with which to do everything else. **DO NOT DO THIS.** Use `slice_sample()` to _display_ rows, as in: show 10 random rows from `lobsters_clean`, but use the entire `lobsters_clean` data frame for downstream visualization and analyses. ::: b. Create a new object called `lobster_summary` to calculate the mean trapped lobster weight, the number of observations, and the standard deviation. Display the `lobster_summary` data frame. c. Create a plot showing mean trapped lobster weight and standard deviation, along with the underlying data (i.e. the raw observations) jittered horizontally. Use an approach from class. Additionally: - color by site and change the colors from the default - make sure each site has a different shape - use a ggplot theme that is not the default - make sure the legend is not showing - label the x- and y-axes d. In one sentence, write your hypotheses to answer the question: **is there a difference in mean trapped weight of California spiny lobster between an MPA and a non-MPA site** in _statistical terms_. Make sure you have a _null_ and _alternative_ hypothesis. e. Using the `lobsters_clean` object, make a QQ plot. Make sure there are two panels for each location. You do not need to label the x and y-axes. f. In one sentence, describe whether the variable of interest is normally distributed or not. Use visual components (e.g. shape) of the QQ plot you made to justify your characterization of the variable. g. Check your variances using `var.test()`. Show the code and output. In one sentence, describe whether the groups have equal variances or not. h. Do a t-test using `t.test()`. Show the code and output. :::{.callout-tip title="t-test arguments" collapse=true} Double check your arguments to make sure you’re running the right test. ::: i. In one sentence each, describe: - Why a t-test would have been appropriate for testing your hypothesis in part d - How you evaluated normality and homogeneity of variance j. Describe the results in 1-2 full sentences _in your own words_. Make sure to include the: - Test you ran - Number of observations for each location - Significance level - Degrees of freedom - Test statistic - p-value Round any numbers with decimals to two decimal points. :::{.callout-tip title="Do _not_ simply list each component!" collapse=true} You will be graded on how you synthesize information. See lecture notes for an example of how to summarize the results of a statistical test. ::: ### Problem 3. Personal data By now, you have some observations on your data sheet for your personal data. Even though it's early on in your data collection, it's a good idea to practice good **data management**. For this problem, you'll enter your data, read it into R, and create a visualization. If you get stuck at any step, you'll know there's something you need to fix. a. Create a visualization with a _categorical_ predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to: - label your x- and y-axes - include a title summarizing the main message of your figure - use different colors from the `ggplot()` defaults - use a different theme from the `ggplot()` default b. Create a visualization with a _continuous_ or _discrete_ predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to: - label your x- and y-axes - include a title summarizing the main message of your figure - use different colors from the `ggplot()` defaults - use a different theme from the `ggplot()` default c. In 2-5 sentences, describe what insights you can gain about your data from visualizations like these. Use specific components of the figure in your description of your insights. Once you collect more data and update these figures, would these insights change? Again, use specific components of the figure that would change. d. In 2-5 sentences, describe the process of getting your data from your spreadsheet into R. Did you encounter any challenges? If so, why do you think those challenges arose, and how did you fix them? If not, why do you think your system for collecting your data worked? :::{.callout-tip title="Changing your data collection scheme" collapse=true} If you found that entering your data from your spreadsheet and getting it into the right format to be used in R was challenging, that's ok! This happens a lot with data collection. Feel free to change your data sheet so that you're collecting data in a way that makes your life easier as you're reading your data into R and using it. ::: ### Problem 4. Statistical critique Check the [Google sheet](https://docs.google.com/spreadsheets/d/1yW_hyhHrlwsb9u5Ul3_EfLYoOSS3nGv1IuSyFlbX78k/edit?usp=sharing) and choose a paper to use for your critique based on An’s recommendations. Answer the following questions about the paper in 1-2 sentences each: a. Why were you interested in this paper? b. What questions/hypotheses are the authors addressing? c. Which statistical test (from Homework 1) is included in this paper? What is the response variable? What is the predictor variable? d. How does this test address the main question(s) presented by the authors? For example, how would the authors interpret a "significant" result? e. Find the figure(s) and/or table(s) in the paper that are associated with the statistical tests from question (c). If there are multiple figures and tables relating to the statistical test, find the best one that demonstrates the relationship between the predictor and the response. Take a screenshot, and insert it into your document. :::{.callout-tip title="Make sure your screenshot is visible in your final document!" collapse="true"} If your screenshot for part e is not visible, part f will be given "partial marks". ::: :::{.callout-tip title="My paper doesn't include a figure or table that is associated with the statistical test." collapse="true"} First, check with An to make sure that's actually the case. If An verifies that indeed your paper does not include a figure or table that is related to the focal test, copy and paste the text that summarizes the test with a reference to the section (e.g. Results) and paragraph (e.g. paragraph 3) where that text can be found. ::: f. If you have a figure: describe the x- and y-axes, and what the figure is supposed to show (i.e. what is the main message of the figure)? If you have a table, what are the rows and columns, what is in each cell of the table, and what is the table supposed to demonstrate? If you have text: what is your _own_ interpretation of the summarized results text? In other words, what have you learned about the biological system _specifically_ from the statistical results? ## Assignment checklist Your assignment should: - [ ] include your name, the title, and the date - [ ] include all code with annotations - [ ] be organized and readable (for example: no messages, warnings, etc., text is formatted correctly with subscripts or mathematical notation where necessary, text and headers are clearly different) - [ ] be uploaded to Canvas as a single docx or PDF Your responses should include: - [ ] work and written responses for Problem 1 - [ ] written responses, annotated code, and figure outputs for Problem 2 - [ ] annotated code, figure output, and written responses for Problem 3 - [ ] written responses and screenshot for Problem 4 Additionally, you should have: - [ ] submitted your [Generative AI Statement of Use](https://docs.google.com/forms/d/e/1FAIpQLScQOUQERT9c_HI8PQ3hMDw5WpaQmtTnUF82mXdZJMUoewGY6A/viewform) **by Thursday 29 January at 11:59 PM** Lastly, check out the **rubric on Canvas** to see the point breakdown in more detail. ## General formatting components You will only receive full marks for annotations if you have meaningful comments for: - each line of visualization code and/or ggplot geom/theme call (not needed for each argument, though good to have) - each function in any piping operations - set up code to denote where packages and/or data are read in - each argument of a test call (e.g. `var.test()`, `t.test()`) You will only receive full marks for readability if: - all messages/warnings are hidden - all code is contained in code chunks (double check line breaks in comments once you render your document) - all text is where it’s supposed to be (all components like headers, main text, superscripts/subscripts, etc. show up correctly) - code includes carriage returns, spaces, etc. to make pipe operations and arguments clear - code includes consistent spacing and indents

Part 1. Tasks

Task 1. Set up your folders and Rproject.

a. Create a new folder for this homework assignment within your ENVS-193DS folder.

b. Download the files from Canvas

c. Create an Rproject for this homework assignment

d. At the top of your document, insert a code chunk. Write code to load in the packages you need.

e. Store the data as an object called lobsters.

Task 2. Enter your data for your personal data project.

a. Create a spreadsheet to enter your data.

b. Create the columns of your spreadsheet.

c. Enter your data.

d. Save your spreadsheet as a .csv file in your homework-02 folder.

e. Read your data into R.

You are now ready to start your homework!

Part 2. Problems

Problem 1. Native bunchgrasses

Problem 2. Lobster weights

Problem 3. Personal data

Problem 4. Statistical critique

Assignment checklist

General formatting components

a. Create a new folder for this homework assignment within your `ENVS-193DS` folder.

e. Store the data as an object called `lobsters`.

d. Save your spreadsheet as a .csv file in your `homework-02` folder.