Lab 9: Interactions 1
Princeton University
Lab 9
There is currently much debate (and hype) surrounding smartphones and their effects on well-being, especially with regard to children and teenagers. We’ll be looking at data from this recent study of English adolescents:
Przybylski, A. & Weinstein, N. (2017). A Large-Scale Test of the Goldilocks Hypothesis. Psychological Science, 28, 204–215.
This was a large-scale study that found support for the “Goldilocks” hypothesis among adolescents: that there is a “just right” amount of screen time, such that any amount more or less than this amount is associated with lower well-being. This was a huge survey study: the data contain responses from over 120,000 participants!
Fortunately, the authors made the data from this study openly available, which allows us to dig deeper into their results. In this lab, we will look at whether the relationship between screen time and well-being is modulated by participants’ (self-reported) gender.
The dependent measure used in the study was the Warwick-Edinburgh Mental Well-Being Scale (WEMWBS). This is a 14-item scale with 5 response categories, summed together to form a single score ranging from 14-70.
At Przybylski & Weinstein’s page for this study on the Open Science Framework, you can find the participant survey which asks a large number of additional questions (see page 14 for the WEMWBS questions and pages 4-5 for the questions about screen time). Within the same page you can also find the raw data; however, for the purpose of this exercise, you will be using local pre-processed copies of the data (see data folder).
Przybylski and Weinstein looked at multiple measures of screen time, but we will be focusing on smartphone use. They found that decrements in well-being started to appear when respondents reported more than one hour of weekly smartphone use. Our question: Does the negative association between hours of (smartphone) use and well-being (beyond the one-hour point) differ for boys and girls?
Note that in this analysis, we have:
a continuous\(^*\) DV, well-being;
a continuous\(^*\) predictor, screen time;
a categorical predictor, gender.
\(^*\)these variables are only quasi-continuous, inasmuch as only discrete values are possible. However, there are a sufficient number of discrete categories that we can treat them as effectively continuous.
We want to estimate two slopes relating screen time to well-being, one for girls and one for boys, and then statistically compare these slopes. So this problem seems simultaneously like a situation where you would run a regression (to estimate the slopes) but also one where you would need a t-test (to compare two groups).
Set-up
- Load in the
wellbeing.csv
,participant_info.csv
, andscreen_time.csv
file from your data folder. Save them aswellbeing
,pinfo
andscreen
.
Look at the data
Take a look at the tibble for pinfo
, wellbeing
, and screen
. The wellbeing
tibble has information from the WEMWBS questionnaire; screen
has information about screen time use on weekends (variables ending with we
) and weekdays (variables ending with wk
) for four types of activities: using a computer (variables starting with Comph
; Q10 on the survey), playing video games (variables starting with Comp
; Q9 on the survey), using a smartphone (variables starting with Smart
; Q11 on the survey) and watching TV (variables starting with Watch
; Q8 on the survey). If you want more information about these variables, look at the items 8-11 on pages 4-5 of the the PDF version of the survey on the OSF website.
The variable corresponding to gender is located in the table named
pinfo
and this variable is calledmale
.Individual participants in this dataset are identified by the variable named
Serial
. This variable will allow us to link information across the three tables.
Run
summary()
on the three data-sets. Are there any missing data points? If so, get rid of them.
Compute the well-being score for each respondent
The WEMWBS well-being score is simply the sum of all the items.
Write the code to create a new table called
wemwbs
, with two variables:Serial
(the participant ID), andtot_wellbeing
, the total WEMWBS score. You will need to usepivot_longer
to do this.
Sanity check: Verify for yourself that the scores all fall in the 14-70 range. Przybylski and Weinstein reported a mean of 47.52 with a standard deviation of 9.55. Can you reproduce these values?
Now visualise the distribution of
tot_wellbeing
in a histogram
Smartphone and well-being for boys and girls
For this analysis, we are going to collapse weekday and weekend use for smartphones.
Create a new table,
smarttot
, that has the that has mean number of hours per day of smartphone use for each participant, averaged over weekends/weekdays.
You will need to filter the dataset to only include smartphone use and not other technologies.
You will need to use pivot_longer
You will also need to group the results by the participant ID (i.e.,
serial
).The final data-set should have two variables:
Serial
(the participant) andtothours
.You will need to use the data-set
screen_time
to do this.
Next, create a new tibble called
smart_wb
that only includes (filters) participants fromsmarttot
who used a smartphone for more than one hour per day each week, and then combine (join or merge) this table with the information inwemwbs
andpinfo
.** An inner join only keeps observations from X (here smarttot) that have a matching key in Y (wemwbs
andpinfo
). So if Serial is absent fromwemwbs
orpinfo
it will throw out that observation.
Mean-centering variables
- As discussed in the lecture, When you have continuous variables in a regression, it is often sensible to transform them by mean centering.
Use
mutate
to add two new variables tosmart_wb
:tothours_c
, calculated as a mean-centered version of thetothours
predictor; andmale_dev
, recoded as -.5 for female and .5 for male.
Finally, recode
male
as factor, so that R knows not to treat them as a real numbers
Visualise the relationship
Calculate mean well-being scores for each combination of
male
andtothours
, and then create a scatterplot plot that includes separate regression lines for each gender.
Running the regression
For the data in
smart_wb
, use thelm()
function to calculate the multiple regression model. Make sure the table is formatted nicely!
Follow-up with a simple effects analysis, if the interaction is significant
Assumption checking
Now it’s time to test those pesky assumptions
The predictors have non-zero variance
The relationship between outcome and predictor is linear
The residuals should be normally distributed
Multicollinearity: predictor variables should not be too highly correlated
Check assumptions of your model, noting any deviations.
Visualization
use the
interactions
package orggeffects
to visualize the interaction effect
Write-up
Provide a write-up/summary of the results in APA style
All continuous predictors were mean-centered and deviation coding was used for categorical predictors. The results of the regression indicated that the model significantly predicted course engagement (F(3, 7.1029^{4}) = 2450.89, p < .001, Adjusted R2 = 0.09, \(f_2\) = .63), accounting for 9% of the variance. Total screen time was a significant negative predictor of well-being scores (b = -0.77, p < .001, as was gender (b = 5.14, p < .001, with girls having lower well-being scores than boys. Importantly, there was a significant interaction between screen time and gender (b = 0.45, p < .001), smartphone use was more negatively associated with well-being for girls than for boys.
Power
Finally, we’ll calculate power
Calculate the minimum effect size we could reliably observe given our sample size and design, but for 99% power using a power calculator.
- The smallest effect size we can observe with 99% power is .0003. We can observe small effects given the sample size.