Sunday, February 19, 2017

Statistical Analysis of Portulaca Measurements – Part 2 – The Right Way




Statistical Analysis of Portulaca Measurements – Part 2 –Mann-Whitney U - The Right Way (1/28/2017) 
Portulaca amilis plants from the Portulaca experiment. Plants to the left grown in potting soil (P). Plants to the right grown in 50/50 potting soil and sand (M).
In 10/17/2016 I took measurements of my jars to end the phase I cared about keeping them alive. Jars were tagged for identification (first column), type of soil used for the jars (second column), number of plants in the jars were counted (third column), height of the tallest plant was measured for each jar (forth column), the number of seed capsules on plants within the jar was counted (fifth column), and the weight of the entire jar with plants was measured (sixth column).


Tag
Type
# of Plants
Tallest Plant
# of pods
Weight (nearest 5 g
TP2
P
3
22.5
27
335
AP1
P
5
21.5
27
335
LP1
P
10
22
33
335
EP2
P
1
24
22
330
RP2
P
2
22.5
17
330
CP1
P
1
22
19
335
LP2
P
6
22
37
335
FP2
P
8
23
56
330
AP2
P
6
21
35
335
YP1
P
5
22
30
335
CP2
P
3
24
30
330
UP2
P
2
22.5
29
330
RP1
P
1
19
16
330
YP2
P
8
20
24
330
TP1
P
1
14.5
3
330
FP1
P
4
20.5
34
330
UP1
P
1
21
14
330
BP2
P
4
21.5
25
330
BP1
P
9
23.5
40
330
BM2
M
5
19.5
20
360
TM2
M
3
17.5
8
360
AM2
M
4
18
18
360
UM2
M
4
18.5
27
360
YM2
M
5
18.5
25
360
FM1
M
6
17.5
25
360
FM2
M
4
18.5
15
360
WM1
M
1
19
6
360
LM1
M
4
18
22
360
RM1
M
1
18.5
8
360
AM1
M
6
18
18
360
LM2
M
7
19
20
360
RM2
M
2
19.5
9
360
TM1
M
2
17.5
12
360
UM1
M
1
19.5
9
360
YM1
M
7
16.5
27
360
BM1
M
4
16.5
17
360
 



For this review I will focus on Tallest Plant.
The R program was used to run statistical analysis on the different measurement in relation to the soil type P (potting soil) and M (50/50 mixture of potting soil and sand). To compare sets of measurement, the data must be entered into are as shown (I will try to keep all characters entered into R bold, all results returned from R in italics:

P=c(22.5,21.5,22,24,22.5,22,22,23,21,22,24,22.5,19,20,14.5,20.5,21,21.5,23.5)
M=c(19.5,17.5,18,18.5,18.5,17.5,18.5,19,18,18.5,18,19,19.5,17.5,19.5,16.5,16.5)

The first two lines set up P and M. I obtained these numbers from the table under the tallest plant column. These can be viewed by:

View(P)
View(M)

Yes capitalization matters. The sets of numbers must be tested to determine if the data is of equal variances and follows a normal distribution. To perform many tests, the data must follow these assumptions. These are called parametric tests. If the data is not of equal variances or normal distribution then assumptions are not met and nonparametric tests must be used for data comparison.

To test for equal variances:

Var.test(P,M)

The R program then returns the following results:

                F test to compare two variances
data:  P and M
F = 5.1679, num df = 18, denom df = 16,
p-value = 0.001825
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
  1.902069 13.645142
sample estimates:
ratio of variances
          5.167927

The P value is below .05 at .001825. The null hypothesis can be rejected to determine variances are NOT equal.

*Update Start* A thought occurred to me a day later, to check for equal variances I should use the Levene's test. To perform the test I had to install the "lawstat" package into R. I will note, I use R Studio to assist with processing R. In R Studio I click packages, then install. From there I type lawstat and then click install. Then I am able to put the data into the required format and run the application with the code:



sample1=c(22.5,21.5,22,24,22.5,22,22,23,21,22,24,22.5,19,20,14.5,20.5,21,21.5,23.5)

sample2=c(19.5,17.5,18,18.5,18.5,17.5,18.5,19,18,18.5,18,19,19.5,17.5,19.5,16.5,16.5)

y <- c(sample1, sample2)

group <- as.factor(c(rep(1, length(sample1)), rep(2, length(sample2))))

levene.test(y, group)




                modified robust Brown-Forsythe Levene-type
                test based on the absolute deviations from
                the median
data:  y
Test Statistic = 1.7375, p-value = 0.1963

The Levene's Test of variances indicates the groups are of equal variances. So which one should I use? I looked at a histogram of the data. Then I thought about what is variance. My decision is to use the Levene's Test because that is what I learned in class. Although this assumption was met, the normality was still in violation. The rest of this process would proceed without the Levene's vs var.test issue. *Update End*


The Shapiro test is used to determine normality. It must be performed on each group.

shapiro.test(P)
                Shapiro-Wilk normality test
data:  P
W = 0.81364, p-value = 0.001816

shapiro.test(M)
                Shapiro-Wilk normality test
data:  M
W = 0.93043, p-value = 0.2212

P has a p-value < 0.05 to determine the data does not have a normal distribution.
M has a p-value > 0.05 to determine the data does have a normal distribution.
Both my samples do not have a normal distribution, the data does not meet the assumption.

The suggested non-parametric statistical test for independent groups would be the Wilcoxon rank sum test (different from the Wilcoxon signed ranked test) aka Mann-Whitney U test. I followed the instructions how to perform this test using http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html.

For this test, I took both the Type and Tallest Plant columns of my table and put them side by side. I then filtered and sorted by ascending order the tallest plant column. Then I added additional columns.
Type
Tallest Plant
"=RANK"
"=RANK.AVG"
P Rank
M Rank
P
14.5
1
1
1

M
16.5
2
2.5

2.5
M
16.5
2
2.5

2.5
M
17.5
4
5

5
M
17.5
4
5

5
M
17.5
4
5

5
M
18
7
8

8
M
18
7
8

8
M
18
7
8

8
M
18.5
10
11.5

11.5
M
18.5
10
11.5

11.5
M
18.5
10
11.5

11.5
M
18.5
10
11.5

11.5
P
19
14
15
15

M
19
14
15

15
M
19
14
15

15
M
19.5
17
18

18
M
19.5
17
18

18
M
19.5
17
18

18
P
20
20
20
20

P
20.5
21
21
21

P
21
22
22.5
22.5

P
21
22
22.5
22.5

P
21.5
24
24.5
24.5

P
21.5
24
24.5
24.5

P
22
26
27.5
27.5

P
22
26
27.5
27.5

P
22
26
27.5
27.5

P
22
26
27.5
27.5

P
22.5
30
31
31

P
22.5
30
31
31

P
22.5
30
31
31

P
23
33
33
33

P
23.5
34
34
34

P
24
35
35.5
35.5

P
24
35
35.5
35.5

Sum
666
492
174
I tried the =RANK function. But this did not sort out ranks if multiple values were tied. I then used the =RANK.AVG function. This provided the average ranks. I then added a P and M rank column to take values of each group for each of their respected columns. For those P and M rank columns, I took the sum of the columns. This sum should equal: =n(n+1)/2 and they do equal out each to 666.

After ranks, I then determine the U value. The equations are as follows:


U1 = n1n2 + n1(n1+1)/2 – R1                where n1 = 19; n2 = 19; R1 = 492
U2 = n1n2 + n2(n2+1)/2 – R2                where n1 = 19; n2 = 17; R2 = 174

Results in

U1 = 19*17 + 19(19+1)/2 – 494
U2 = 19*17 + 17(17+1)/2 – 174

Results in

U1 = 323 + 19(20)/2 – 494
U2 = 323 + 17(18)/2 – 174

Results in

U1 = 323 + 380/2 – 494
U2 = 323 + 306/2 – 174

Results in

U1 = 323 + 190 – 494
U2 = 323 + 153 – 174

Results in

U1 = 19
U2 = 302

The test statistic for Mann-Whitney U test is the smaller of the two U values. The U = 19.

The website http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html provides an explanation. If there is no difference among the groups, a U value will be the n. If one group is always larger than another group, a U value will be 0. If they have some overlap, the result will be another number. Complex concept, but do check out the site to understand it. The link cited also explains: “Thus, smaller values of U support the research hypothesis, and larger values of U support the null hypothesis.”

My value of U is 19. This is a lot smaller than the other U value at 302. Also found in the citation http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html is a “Critical Values of the Mann-Whitney U (Two-Tailed Testing” table which is what I need to use. Following the column for 19, down to the row of 17, and using the value 0.05, my critical U value is 99. If my U value is smaller than or equal to 99, then I have a statistically significant difference; if my U value is larger than 99, then I accept the null hypothesis of no difference. The value 0.01 has a critical value of 81. My U is less than or equal to 81, so it has 99% confidence the group means are different.

My U of 19 is less than or equal to the critical value of 99 or 81. My two groups are statistically significantly different.

So how do I report this statistic in the text? I followed the suggestion of http://evc-cit.info/psych018/Reporting_Statistics.pdf. For Mann-Whitney an example reporting is:
“U = 67.5, p = .034, r = .38”

Where do I obtain the p and r values?

Although I performed the wrong computations previously in “Statistical Analysis of Portulaca Measurements – Part 1 – The Wrong Way,” it did provide me with a better understanding.

I went to http://astatsa.com/WilcoxonTest/ and entered my data. It returned similar results to the test I performed in R as described in “Statistical Analysis of Portulaca Measurements – Part 1 – The Wrong Way.” In the R program, the test statistic provided was W = 302. Interesting, this value is the larger of my U values, although if you report the U value, you report the smaller. Seems R reports the larger value and calls it W. The http://astatsa.com/WilcoxonTest/ also reports what the R program reports. It also reports how to perform the function in R. The site reported: 

# Copy-paste these lines into the R command prompt.

# Lines that begin with the # character are taken as comment lines by R.    

A <- c(22.5, 21.5, 22.0, 24.0, 22.5, 22.0, 22.0, 23.0, 21.0, 22.0, 24.0, 22.5, 19.0, 20.0, 14.5, 20.5, 21.0, 21.5, 23.5)

B <- c(19.5, 17.5, 18.0, 18.5, 18.5, 17.5, 18.5, 19.0, 18.0, 18.5, 18.0, 19.0, 19.5, 17.5, 19.5, 16.5, 16.5)

wilcox.test(A, B, paired = FALSE, alternative = "two.sided", mu = 0.0,
      exact = TRUE, correct = TRUE, conf.int = TRUE, conf.level = 0.95)

I will attempt this and I expect the same outcome.

That website also reports these results:
Test statistic W : 302

p-value : 0.000009
There are ties in the ranks, so the exact p-value cannot be computed. The p-value shown is inexact but fairly robust when the number of ties is small.
null hypothesis   μ 0 = μ0= 0.0
alternative hypothesis: two sided,   μ ^ ≠μ 0 
μ^≠μ0
  95% confidence interval : 2.5000 --------- 4.5000
sample estimate of difference in location   μ ^ 
μ^ : 3.50003353466
Some of the characters did not paste the same as on the website.

The R program reports the results as:

                Wilcoxon rank sum test with continuity
                correction
data:  A and B
W = 302, p-value = 8.632e-06
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 2.500018 4.499995
sample estimates:
difference in location
              3.500034

These are the same results I obtained from the R program using the Wilcoxon sum ranked test to evaluate the two groups. Note the p value for my initial R program test was 0.000008016 while the website http://astatsa.com/WilcoxonTest/ returned 0.000009.



From my previous work: This site provided me with qnorm() that calculates p-values to Z scores (http://stats.stackexchange.com/questions/101136/how-can-i-find-a-z-score-from-a-p-value).

qnorm(8.016e-06)

[1] -4.31401

Also from previous work: The r value is the effect size. Its equation:
I calculate this out in excel with: =-4.31401/(SQRT(17+19)) to get -0.719001667 = r.

To Report:
U = 19, p = 8.016e-06, r = 0.719001667 (reporting suggested by http://evc-cit.info/psych018/Reporting_Statistics.pdf).

Or

For potting soil mean average 21.5 cm (n=19) and 50/50 mixture potting soil and sand mean average at 18.23529412 (n=17) Mann-Whitney U test reported U=19, p = 8.016e-06 (reporting suggested by https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test).

What this means. If we had another plant we forgot to measure from each group, we could gamble that the P plant will be taller than the M plant with over 99% confidence (p = 8.016e-06).

To graph these results what should I report?
My determination: A graph should show the means, possibly the sample size, error bars, and a note of any statistically significance with a possible report of the p-value. Test and test statistic are bonus to include. Error bars should be standard error or 95% confidence intervals. To help create the graph I made another table:


P
M

22.5
19.5

21.5
17.5

22
18

24
18.5

22.5
18.5

22
17.5

22
18.5

23
19

21
18

22
18.5

24
18

22.5
19

19
19.5

20
17.5

14.5
19.5

20.5
16.5

21
16.5

21.5


23.5


19
17
Count
21.52632
18.23529
Average
2.130947
0.937377
StdEv



Potting
Mix of 50/50 Sand/Potting

21.52632
18.23529
Average
2.130947
0.937377
StdEv
0.488873
0.227347
StdError

Choosing standard error or 95% confidence intervals for error bars is a topic to explore on its own. The standard error and confidence intervals are related: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064100/figure/. This also provides more insight http://www.mas.ncl.ac.uk/~njnsm/medfac/docs/se&ci.pdf. I previously figured out how to calculate standard error and I think I know how to incorporate them for graphical representation. This provides insight to standard error bars https://egret.psychol.cam.ac.uk/statistics/local_copies_of_sources_Cardinal_and_Aitken_ANOVA/errorbars.htm. The R program reports the confidence intervals, but I do not know how to use them for graphical representation.

There is probably more to say and understand, but I get the idea that P plants are larger than M plants.

Conclusion:
Plants grown in potting soil were taller than those grown in a 50/50 mixture of potting soil and sand (potting soil average height 21.5 cm (n=19) and 50/50 mixture potting soil and sand average height at 18.2 cm (n=17); Mann-Whitney U test: U=19, p = 8.016e-06).
Figure. Plant height by soil type (values are significantly different: Mann-Whitney U test: U=19, p = 8.016e-06; error bars are ± one standard error).

How to read the conclusion:
Plants grown in potting soil were taller than those grown in a 50/50 mixture of potting soil and sand is straight forward. In parenthesis the average provides the average height of plants grown in each type of soil. The potting soil average height was 21.5 which is taller than 50/50 soil type average height was 18.2. The parenthesis with n= indicates how many units, in this case plants, were measured to obtain the average: there were 19 potting soil plants measured; there were 17, 50/50 plants measured. Mann-Whitney U test is the test that was used to determine if the two soil type plant measurements had a statistical difference. The U=19 is the number the test returns so others familiar with the statistic have an idea of what took place. The p=8.016e-06 is the number people look for to be less than 0.05 which would indicate a statistically significant difference. The graph has two bars indicating the two soil types as shown by the X axis label. The height of the bars represents the average height of plants as shown by the Y axis label. The numbers on top of the bars are a repeat or exact number for the height of the bar which could also be determined by looking at the Y axis. The n=# at the base of the bar is a repeat of the text indicating how many units, plants in this case, were measured to obtain the average. The error bars on top of the bars are indicated to be ± one standard error. I won't go into too much detail about standard error, but the bars provide an idea of... I guess precision of the measurements. From standard error-error bars, if the error bars were to overlap for each average, then there is no statistical significance. If the bars do not overlap, as shown in this conclusion, then there may be a statistically significant difference, but that isn't an absolute with standard error-error bars, so then you need to refer to the p-vale.

Thank you for reading!
Have a Great day!
Comment to discuss!

No comments:

Post a Comment