## Biostatistics: Pearson’s Correlation

Biostatistics: Pearson’s Correlation

5. CHAPTER 17, PROBLEM 5

In a study conducted in Italy, 10 patients with hypertriglyceridemia were placed on a low-fat, high-carbohydrate diet. Before the start of the diet, cholesterol and triglyceride measurements were recorded for each subject.

Patient | Cholesterol Level (mmol/l) | Triglyceride level (mmol/l) | |||

1 | 5.12 | 2.30 | |||

2 | 6.18 | 2.54 | |||

3 | 6.77 | 2.95 | |||

4 | 6.65 | 3.77 | |||

5 | 6.36 | 4.18 | |||

6 | 5.90 | 5.31 | |||

7 | 5.48 | 5.53 | |||

8 | 6.02 | 8.83 | |||

9 | 10.34 | 9.48 | |||

10 | 8.51 | 14.20 |

a) Construct a two-way scatter plot for these data.

– See attachment

b) Does there appear to be any evidence of a linear relationship between cholesterol and triglyceride levels prior to the diet?

– There appears to be a linear correlation between cholesterol and triglyceride levels prior to the diet.

c) Compute r; the Pearson correlation coefficient.

HOW TO APPROACH A PEARSON CORRLEATION APPROACH (lec 17-5)

– Create a table of your data

– Variable 1? Cholestrol level

– Variable 1 = X. What is Xbar? (5.12+6.18+6.77+6.65+6.36+5.90+5.48+6.02+10.34+8.51)/10 = 6.733

– Variable 2? Triglyceride level

– Variable 2 = Y. What is Ybar? (2.30+2.54+2.95+3.77+4.18+5.31+5.53+8.83+9.48+14.20)/10 = 5.909

– n is total number of (X,Y) combinations you have.

What is n? 10

– Sx = SD of X. Sx = ? 1.56

– Sy = SD of Y. Sy = ? 3.818

CALCULATE R

– r = [Summation (Xi – Xbar)(Yi – Ybar)] / (n-1)(Sx*Sy)

This is a simpler equation for r

– r = [Summation (Xi*Yi) – n*Xbar*Ybar ] / (n-1)(SxSy)

– r= [(5.12*2.30) + (6.18*2.54) + (6.77*2.95)+(6.65 *3.77)+(6.36*4.18)+(5.90*5.31)+(5.48*5.53)+(6.02*8.83)+(10.34*9.48)+(8.51*14.20)

– 10(6.733)(5.909) ] / (9)(1.56)(3.818)

= 432.755 – 397.853 / 53.604

= 0.651

r = 0.651

d) At the 0.05 level of significance, test the null hypothesis that the population correlation p is equal to 0. What do you conclude?

HYPOTHESIS TEST

– Ho: ρ = 0 versus Ha: ρ ≠ 0.

SIGNIFICANCE LEVEL: Assume 0.05 significance level.

ASSUMPTIONS IN ORDER TO USE THE T-TEST

– We can use an approximate t-test if the following assumptions are met:

a) The pairs (Xi,Yi) are randomly selected from the population; and

b) both X and Y are normally distributed

CALCULATIONS

– The standard deviation of r is approximately

sqrt[(1-r2)/(n-2)]

– Use that to compute the test statistic

t = (r-0) / sqrt[(1-r2)/(n-2)]

Another way to write t is this:

t = r*sqrt[(n-2)/[(1-r2)]

– This t statistic has a t(n-2) degrees of freedom in the usual way.

t = 0.651*sqrt[(10-2)/(1-0.651^2)]

t = 0.651*sqrt[(8)/(0.576199)]

t = 2.42

df = n-2 = 8.

p/2 < 0.025

p < 0.05

I conclude that there is a significant positive relationship between cholesterol level and triglyceride level.

**Categorised as:** Uncategorized