Due on Thursday 29 January (Week 4) at 11:59 PM
Read the instructions carefully and double check that you have everything on the checklist.
Part 1. Tasks
Remember that you are not expected to turn anything in for tasks, but you should complete all the material.
Task 1. Set up your folders and Rproject.
a. Create a new folder for this homework assignment within your ENVS-193DS folder.
Within your ENVS-193DS folder, create a new folder for Homework 2. Name it whatever you want (a logical name could be homework-02).
b. Download the files from Canvas
Download the homework files from Canvas into your homework folder. This includes:
lobster-catch-data.csv
c. Create an Rproject for this homework assignment
Create an Rproj file within your homework-02 folder. If you need help with this, watch the “Creating an Rproject” video on Canvas.
d. At the top of your document, insert a code chunk. Write code to load in the packages you need.
You will probably need to use the tidyverse, and potentially janitor.
e. Store the data as an object called lobsters.
Refer to workshop code if you are lost.
Task 2. Enter your data for your personal data project.
a. Create a spreadsheet to enter your data.
Two good options include Google Sheets or Microsoft Excel. This will be the spreadsheet that you continually update with data for new observations, so store it in a logical place so that you can find it later.
b. Create the columns of your spreadsheet.
If you have organized your data sheet in “long” format, in which each row is an observation, then your spreadsheet columns will be the same as your data sheet columns. If not, that’s ok; just make sure you’re entering your data in a way that makes sense based on how you collected it.
c. Enter your data.
Double check your values!
d. Save your spreadsheet as a .csv file in your homework-02 folder.
e. Read your data into R.
Include the code to to this at the top of the template. You will want to do this the same way you read any data into R: by creating a new object (you could call this my_data) and using the left arrow operator to store and read in the data using read_csv().
You are now ready to start your homework!
Part 2. Problems
Problem 1. Native bunchgrasses
Managers are interested in the recovery of a native bunchgrass species, California fescue (Festuca californica) in a local oak woodland. You are conducting surveys in different sites within the oak woodland, and count the following numbers of individuals in 13 different sites:
\[ 4, 6, 4, 5, 5, 4, 2, 3, 6, 4, 2, 0, 1 \]
- What kind of data did you collect, and why? Explain in 1-2 sentences.
- What is a better description of the variability in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units.
- What is a better description of the uncertainty in bunchgrass count: standard deviation or standard error? Explain why in 1 sentence. Calculate the metric of your choice, showing your work. Round your final answer to 1 decimal point, and include the correct units.
Problem 2. Lobster weights
Marine protected areas (MPAs) off the coast of California designate regions of the ocean in which fishing is not allowed. As a result, many fished species have a spatial refuge to grow and reproduce.
The California spiny lobster (Panulirus interruptus) is a major part of the state’s fisheries. In the commercial fishery, lobsters are caught in traps (example here) that can contain multiple individual lobsters.
In this problem, you will visualize and analyze data to answer the question: is there a difference in mean trapped lobster weight (Panulirus interruptus) between an MPA and a non-MPA site?
In the form that the data is presented to you (prior to any cleaning and wrangling), each row represents the total weight of lobsters in a single trap.
- Create a new object called
lobsters_cleanto clean and wrangle the data. In no particular order:
- clean the column names so that only lower case letters and underscores are used
- change the MPA/non-MPA designations to be full words (i.e. change MPA to “Marine Protected Area and”non-MPA” to be “not protected”)
- select columns such that the only columns present in the data frame are 1) a column with the MPA designation and 2) the weight of lobsters in the trap in pounds
Once you are done with your cleaning and wrangling, display 10 random rows from the data frame using slice_sample().
Look at the help page (type ?slice_sample in the console and hit enter).
If you are piping slice_sample() into your cleaning and wrangling code, you are essentially choosing 10 random observations with which to do everything else.
DO NOT DO THIS.
Use slice_sample() to display rows, as in: show 10 random rows from lobsters_clean, but use the entire lobsters_clean data frame for downstream visualization and analyses.
- Create a new object called
lobster_summaryto calculate the mean trapped lobster weight, the number of observations, and the standard deviation.
Display the lobster_summary data frame.
Create a plot showing mean trapped lobster weight and standard deviation, along with the underlying data (i.e. the raw observations) jittered horizontally. Use an approach from class. Additionally:
- color by site and change the colors from the default
- make sure each site has a different shape
- use a ggplot theme that is not the default
- make sure the legend is not showing
- label the x- and y-axes
- color by site and change the colors from the default
In one sentence, write your hypotheses to answer the question: is there a difference in mean trapped weight of California spiny lobster between an MPA and a non-MPA site in statistical terms. Make sure you have a null and alternative hypothesis.
Using the
lobsters_cleanobject, make a QQ plot. Make sure there are two panels for each location. You do not need to label the x and y-axes.In one sentence, describe whether the variable of interest is normally distributed or not. Use visual components (e.g. shape) of the QQ plot you made to justify your characterization of the variable.
Check your variances using
var.test(). Show the code and output.
In one sentence, describe whether the groups have equal variances or not.
- Do a t-test using
t.test(). Show the code and output.
Double check your arguments to make sure you’re running the right test.
In one sentence each, describe:
- Why a t-test would have been appropriate for testing your hypothesis in part d
- How you evaluated normality and homogeneity of variance
- Why a t-test would have been appropriate for testing your hypothesis in part d
Describe the results in 1-2 full sentences in your own words. Make sure to include the:
- Test you ran
- Number of observations for each location
- Significance level
- Degrees of freedom
- Test statistic
- p-value
- Test you ran
Round any numbers with decimals to two decimal points.
You will be graded on how you synthesize information. See lecture notes for an example of how to summarize the results of a statistical test.
Problem 3. Personal data
By now, you have some observations on your data sheet for your personal data. Even though it’s early on in your data collection, it’s a good idea to practice good data management. For this problem, you’ll enter your data, read it into R, and create a visualization. If you get stuck at any step, you’ll know there’s something you need to fix.
- Create a visualization with a categorical predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to:
- label your x- and y-axes
- include a title summarizing the main message of your figure
- use different colors from the
ggplot()defaults
- use a different theme from the
ggplot()default
- Create a visualization with a continuous or discrete predictor variable on the x-axis and your response variable on the y-axis. In your visualization, be sure to:
- label your x- and y-axes
- include a title summarizing the main message of your figure
- use different colors from the
ggplot()defaults
- use a different theme from the
ggplot()default
In 2-5 sentences, describe what insights you can gain about your data from visualizations like these. Use specific components of the figure in your description of your insights. Once you collect more data and update these figures, would these insights change? Again, use specific components of the figure that would change.
In 2-5 sentences, describe the process of getting your data from your spreadsheet into R. Did you encounter any challenges? If so, why do you think those challenges arose, and how did you fix them? If not, why do you think your system for collecting your data worked?
If you found that entering your data from your spreadsheet and getting it into the right format to be used in R was challenging, that’s ok! This happens a lot with data collection. Feel free to change your data sheet so that you’re collecting data in a way that makes your life easier as you’re reading your data into R and using it.
Problem 4. Statistical critique
Check the Google sheet and choose a paper to use for your critique based on An’s recommendations. Answer the following questions about the paper in 1-2 sentences each:
Why were you interested in this paper?
What questions/hypotheses are the authors addressing?
Which statistical test (from Homework 1) is included in this paper? What is the response variable? What is the predictor variable?
How does this test address the main question(s) presented by the authors? For example, how would the authors interpret a “significant” result?
Find the figure(s) and/or table(s) in the paper that are associated with the statistical tests from question (c). If there are multiple figures and tables relating to the statistical test, find the best one that demonstrates the relationship between the predictor and the response. Take a screenshot, and insert it into your document.
If your screenshot for part e is not visible, part f will be given “partial marks”.
First, check with An to make sure that’s actually the case.
If An verifies that indeed your paper does not include a figure or table that is related to the focal test, copy and paste the text that summarizes the test with a reference to the section (e.g. Results) and paragraph (e.g. paragraph 3) where that text can be found.
- If you have a figure: describe the x- and y-axes, and what the figure is supposed to show (i.e. what is the main message of the figure)? If you have a table, what are the rows and columns, what is in each cell of the table, and what is the table supposed to demonstrate?
If you have text: what is your own interpretation of the summarized results text? In other words, what have you learned about the biological system specifically from the statistical results?
Assignment checklist
Your assignment should:
Your responses should include:
Additionally, you should have:
Lastly, check out the rubric on Canvas to see the point breakdown in more detail.
General formatting components
You will only receive full marks for annotations if you have meaningful comments for:
- each line of visualization code and/or ggplot geom/theme call (not needed for each argument, though good to have)
- each function in any piping operations
- set up code to denote where packages and/or data are read in
- each argument of a test call (e.g.
var.test(),t.test())
You will only receive full marks for readability if:
- all messages/warnings are hidden
- all code is contained in code chunks (double check line breaks in comments once you render your document)
- all text is where it’s supposed to be (all components like headers, main text, superscripts/subscripts, etc. show up correctly)
- code includes carriage returns, spaces, etc. to make pipe operations and arguments clear
- code includes consistent spacing and indents