Lab 3 Answers

Princeton University

Author

Jason Geller, Ph.D.(he/him)

Published

September 27, 2023

Lab 3

The data for this lab was taken from @bainbridge2020. The paper can be accessed here:

https://www.nature.com/articles/s41562-020-00963-z.epdf?sharing_token=jMnyW68w1ly_jP-7uAQkSNRgN0jAjWel9jnR3ZoTv0MCXZ4r7jrUALP4MF0GLxgMsZKEaFOrpxew-I7taYTt0yYa41WX3rCZIACffMyBDz-5K7wJceE9lopzU0bffcHEIKaH5l_LxD5aPdg99A8STR1rqgOzBjKjgcxkg6zsKeA%3D

If you’re curious, you can explore all their raw data by going to the repository associated with the paper, here).

There are seven variables in the data and each variable is described below. The first six rows of the data frame are also displayed below.

Data Columns

Data Columns
participant_id - Unique identifier for each infant. age - Age of infant as continuous variable in months. age_cat - Age of each participant as discrete variable in months. trial_type - Trial type (lullaby vs. non-lullaby). They also had “preference” trials in the experiment. Those trials are not included in this dataset. trial_id - Trial identifier. Note that the number of trials varies across participants. For some participants there are data for 6 trials, while for others there are data for only 4. obs_num - For each trial, they measured heart rate roughly every .4 seconds. This variable tells you which observation in a trial you’re looking at (.4 seconds after the trial started would be coded as `1`, .8 seconds after the trial started would be coded as `2`, etc.). zhr_pt - This is the heart rate at a given observation, normalized relative to the previous trial.

participant_id - Unique identifier for each infant.
age - Age of infant as continuous variable in months.
age_cat - Age of each participant as discrete variable in months.
trial_type - Trial type (lullaby vs. non-lullaby). They also had “preference” trials in the experiment. Those trials are not included in this dataset.
trial_id - Trial identifier. Note that the number of trials varies across participants. For some participants there are data for 6 trials, while for others there are data for only 4.
obs_num - For each trial, they measured heart rate roughly every .4 seconds. This variable tells you which observation in a trial you’re looking at (.4 seconds after the trial started would be coded as 1, .8 seconds after the trial started would be coded as 2, etc.).
zhr_pt - This is the heart rate at a given observation, normalized relative to the previous trial.

Let’s start by trying to understand the structure of the dataset. Calculate the following:
- The age of the youngest (minimum) child in the dataset.
- The age of the oldest (maximum) child in the dataset.
- The total number of observations represented in the data.
Create a data frame with only the first observation from the first trial for each participant. Uses this data frame to answer the following questions:
- How many participants are present in the dataset?
- How many 7-month-olds are there?
- Arrange the dataframe from youngest to oldest. What’s the participant_id for the youngest infant?
- Arrange the dataframe from oldest to youngest. What’s the participant_id for the oldest infant?
What is the overall mean number of observations per individual trial? (hint: you’ll need to use both group_by and ungroup).
How many observations are there in the lullaby condition and the non-lullaby condition?
Next, let’s examine the dependent variable, heart rate. Create a new variable called hr_round that is the heart rate value rounded to the nearest hundredth (use the function round()).
Plot a histogram of hr_round. Be sure to add appropriate labels and a title to your plot. What distribution would you say this looks like?
The histogram generated from the bb2021 data for the hr_round variable appears to be fairly normal. For a more comprehensive view, let’s perform bootstrapping on this dataset 10,000 times and examine the resulting histogram.

Firstly, extract only the hr_round variable from the bb2021 data. Following this, use the rep_sample_n function from the moderndive package (ensure it is installed) to bootstrap the dataset. If you are unfamiliar with this function, it allows for easy bootstrapping without the need to write loops. Set the function to randomly sample 100 rows (size = 100) and repeat this process 10,000 times (reps=10000). This will result in a large data frame.

Upon completion, plot the histogram for all the bootstrapped values. How does it compare to the original?
Plot a density plot of hr_round.
Calculate the mean heart rate for each participant on each trial type. Save it to a new dataframe called participant_means.
Use participant_means to create a violin plot showing the distribution of heart rates in the lullaby and non-lullaby conditions. Your plot should be a simplified version of Figure 2a in the paper with (a) two violins, (b) each violin a different color, and (c) points showing the underlying data. Get rid of the legend.

Tip

The order that you add geoms to your plot matters here. It did not matter for simple class experiment.
Pick a theme for your figure. Also, add two different colors to it.
Now make a rain cloud plot of the figure. Look at vignette here for an idea how to create them using geom_rain: https://www.njudd.com/raincloud-ggrain/
Use participant_means and plot the data with a strip chart. Use stat_summary to plot the mean for each condition.

Tip

- position = position_jitter controls how far apart points are.
Create the plot below showing the mean heart rate by condition across trials. Next, change the point size such that it corresponds to the number of trials represented.

Make a beautiful, clear plot that answers a question you might be interested in from the paper. Make sure to include a descriptive title and it is publication ready.