Statistical Inference: NHST
Princeton University
Applying NHST: Correlations
Dataset
Mental Health and Drug Use:
- CESD = depression measure
- PIL total = measure of meaning in life
- AUDIT total = measure of alcohol use
- DAST total = measure of drug usage
Dataset
CESD = depression measure
PIL total = measure of meaning in life
- What do you think relationship looks like?
Dataset
Correlation (r)
Quantifies relationship between two variables
Direction (positive or negative)
Strength
+1 is a perfect positive correlation
0 is no correlation (independence)
-1 is a perfect negative correlation
Correlations
```{webr-r echo=FALSE,out.height=“15%”, out.width=“70%”,fig.cap=““,fig.show=‘hold’,fig.align=‘center’}
knitr::include_graphics(‘images/corr.png’)
## Effect Size Heuristics
<br> <br>
- *r* \< 0.1 very small
- 0.1 ≤ *r* \< 0.3 small
- 0.3 ≤ *r* \< 0.5 moderate
- *r* ≥ 0.5 large
## Covariance and Correlation
- Pearson's *r*
<br> <br>
$$covariance = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{N - 1}$$
$$r = \frac{covariance}{s_xs_y} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{(N - 1)s_x s_y}$$
- Let's go to R!
## Statistical Test: Pearson's *r*
- $H_0$ *r* = 0
- $H_1$ *r* $\not=$ 0
- $\alpha$ = .05
$$\textit{t}_r = \frac{r\sqrt{N-2}}{\sqrt{1-r^2}}$$
```{webr-r}
library(correlation) # easystats
cor_result <-
cor_test(master,"PIL_total", "CESD_total")
cor_result %>%
knitr::kable()
Scatter plot
Scatter plot
Non-parametric Correlation
Spearman’s rank correlation coefficient :
\[ r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} \]
It assesses how well the relationship between two variables can be described using a monotonic (increasing or decreasing) function
Rank order method
Range [-1,+1]