Statistical Analysis of Portulaca Measurements – Part 2 –Mann-Whitney U - The Right Way (1/28/2017)
Portulaca amilis plants from the Portulaca experiment. Plants to the left grown in potting soil (P). Plants to the right grown in 50/50 potting soil and sand (M). |
In 10/17/2016 I took
measurements of my jars to end the phase I cared about keeping them alive. Jars
were tagged for identification (first column), type of soil used for the jars
(second column), number of plants in the jars were counted (third column),
height of the tallest plant was measured for each jar (forth column), the
number of seed capsules on plants within the jar was counted (fifth column),
and the weight of the entire jar with plants was measured (sixth column).
Tag
|
Type
|
# of Plants
|
Tallest Plant
|
# of pods
|
Weight (nearest 5 g
|
TP2
|
P
|
3
|
22.5
|
27
|
335
|
AP1
|
P
|
5
|
21.5
|
27
|
335
|
LP1
|
P
|
10
|
22
|
33
|
335
|
EP2
|
P
|
1
|
24
|
22
|
330
|
RP2
|
P
|
2
|
22.5
|
17
|
330
|
CP1
|
P
|
1
|
22
|
19
|
335
|
LP2
|
P
|
6
|
22
|
37
|
335
|
FP2
|
P
|
8
|
23
|
56
|
330
|
AP2
|
P
|
6
|
21
|
35
|
335
|
YP1
|
P
|
5
|
22
|
30
|
335
|
CP2
|
P
|
3
|
24
|
30
|
330
|
UP2
|
P
|
2
|
22.5
|
29
|
330
|
RP1
|
P
|
1
|
19
|
16
|
330
|
YP2
|
P
|
8
|
20
|
24
|
330
|
TP1
|
P
|
1
|
14.5
|
3
|
330
|
FP1
|
P
|
4
|
20.5
|
34
|
330
|
UP1
|
P
|
1
|
21
|
14
|
330
|
BP2
|
P
|
4
|
21.5
|
25
|
330
|
BP1
|
P
|
9
|
23.5
|
40
|
330
|
BM2
|
M
|
5
|
19.5
|
20
|
360
|
TM2
|
M
|
3
|
17.5
|
8
|
360
|
AM2
|
M
|
4
|
18
|
18
|
360
|
UM2
|
M
|
4
|
18.5
|
27
|
360
|
YM2
|
M
|
5
|
18.5
|
25
|
360
|
FM1
|
M
|
6
|
17.5
|
25
|
360
|
FM2
|
M
|
4
|
18.5
|
15
|
360
|
WM1
|
M
|
1
|
19
|
6
|
360
|
LM1
|
M
|
4
|
18
|
22
|
360
|
RM1
|
M
|
1
|
18.5
|
8
|
360
|
AM1
|
M
|
6
|
18
|
18
|
360
|
LM2
|
M
|
7
|
19
|
20
|
360
|
RM2
|
M
|
2
|
19.5
|
9
|
360
|
TM1
|
M
|
2
|
17.5
|
12
|
360
|
UM1
|
M
|
1
|
19.5
|
9
|
360
|
YM1
|
M
|
7
|
16.5
|
27
|
360
|
BM1
|
M
|
4
|
16.5
|
17
|
360
|
For this review I will focus on Tallest Plant.
The R program was used
to run statistical analysis on the different measurement in relation to the
soil type P (potting soil) and M (50/50 mixture of potting soil and sand). To
compare sets of measurement, the data must be entered into are as shown (I will
try to keep all characters entered into R bold,
all results returned from R in italics:
P=c(22.5,21.5,22,24,22.5,22,22,23,21,22,24,22.5,19,20,14.5,20.5,21,21.5,23.5)
M=c(19.5,17.5,18,18.5,18.5,17.5,18.5,19,18,18.5,18,19,19.5,17.5,19.5,16.5,16.5)
The first two lines set
up P and M. I obtained these numbers from the table under the tallest plant
column. These can be viewed by:
View(P)
View(M)
Yes capitalization
matters. The sets of numbers must be tested to determine if the data is of
equal variances and follows a normal distribution. To perform many tests, the
data must follow these assumptions. These are called parametric tests. If the
data is not of equal variances or normal distribution then assumptions are not
met and nonparametric tests must be used for data comparison.
To test for equal
variances:
Var.test(P,M)
The R program then
returns the following results:
F test to compare
two variances
data: P and M
F = 5.1679, num df = 18, denom df = 16,
p-value = 0.001825
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
1.902069 13.645142
sample estimates:
ratio of variances
5.167927
The P value is below .05 at .001825. The null hypothesis can be rejected to determine variances are NOT equal.
*Update Start* A thought occurred to me a day later, to check for equal variances I should use the Levene's test. To perform the test I had to install the "lawstat" package into R. I will note, I use R Studio to assist with processing R. In R Studio I click packages, then install. From there I type lawstat and then click install. Then I am able to put the data into the required format and run the application with the code:
sample1=c(22.5,21.5,22,24,22.5,22,22,23,21,22,24,22.5,19,20,14.5,20.5,21,21.5,23.5)
sample2=c(19.5,17.5,18,18.5,18.5,17.5,18.5,19,18,18.5,18,19,19.5,17.5,19.5,16.5,16.5)
y <- c(sample1,
sample2)
group <-
as.factor(c(rep(1, length(sample1)), rep(2, length(sample2))))
levene.test(y, group)
modified robust
Brown-Forsythe Levene-type
test based on the
absolute deviations from
the median
data: y
Test Statistic = 1.7375, p-value = 0.1963
The Levene's Test of variances indicates the groups are of equal variances. So which one should I use? I looked at a histogram of the data. Then I thought about what is variance. My decision is to use the Levene's Test because that is what I learned in class. Although this assumption was met, the normality was still in violation. The rest of this process would proceed without the Levene's vs var.test issue. *Update End*
The Shapiro test is
used to determine normality. It must be performed on each group.
shapiro.test(P)
Shapiro-Wilk
normality test
data: P
W = 0.81364, p-value = 0.001816
shapiro.test(M)
Shapiro-Wilk
normality test
data: M
W = 0.93043, p-value = 0.2212
P has a p-value < 0.05 to determine the data does not have a normal distribution.
M has a p-value > 0.05 to determine the data does have a normal distribution.
Both my samples do not have a normal distribution, the data does not meet the assumption.
For this test, I took both the Type and Tallest Plant columns of my table and put them side by side. I then filtered and sorted by ascending order the tallest plant column. Then I added additional columns.
Type
|
Tallest Plant
|
"=RANK"
|
"=RANK.AVG"
|
P Rank
|
M Rank
|
P
|
14.5
|
1
|
1
|
1
|
|
M
|
16.5
|
2
|
2.5
|
2.5
|
|
M
|
16.5
|
2
|
2.5
|
2.5
|
|
M
|
17.5
|
4
|
5
|
5
|
|
M
|
17.5
|
4
|
5
|
5
|
|
M
|
17.5
|
4
|
5
|
5
|
|
M
|
18
|
7
|
8
|
8
|
|
M
|
18
|
7
|
8
|
8
|
|
M
|
18
|
7
|
8
|
8
|
|
M
|
18.5
|
10
|
11.5
|
11.5
|
|
M
|
18.5
|
10
|
11.5
|
11.5
|
|
M
|
18.5
|
10
|
11.5
|
11.5
|
|
M
|
18.5
|
10
|
11.5
|
11.5
|
|
P
|
19
|
14
|
15
|
15
|
|
M
|
19
|
14
|
15
|
15
|
|
M
|
19
|
14
|
15
|
15
|
|
M
|
19.5
|
17
|
18
|
18
|
|
M
|
19.5
|
17
|
18
|
18
|
|
M
|
19.5
|
17
|
18
|
18
|
|
P
|
20
|
20
|
20
|
20
|
|
P
|
20.5
|
21
|
21
|
21
|
|
P
|
21
|
22
|
22.5
|
22.5
|
|
P
|
21
|
22
|
22.5
|
22.5
|
|
P
|
21.5
|
24
|
24.5
|
24.5
|
|
P
|
21.5
|
24
|
24.5
|
24.5
|
|
P
|
22
|
26
|
27.5
|
27.5
|
|
P
|
22
|
26
|
27.5
|
27.5
|
|
P
|
22
|
26
|
27.5
|
27.5
|
|
P
|
22
|
26
|
27.5
|
27.5
|
|
P
|
22.5
|
30
|
31
|
31
|
|
P
|
22.5
|
30
|
31
|
31
|
|
P
|
22.5
|
30
|
31
|
31
|
|
P
|
23
|
33
|
33
|
33
|
|
P
|
23.5
|
34
|
34
|
34
|
|
P
|
24
|
35
|
35.5
|
35.5
|
|
P
|
24
|
35
|
35.5
|
35.5
|
|
Sum
|
666
|
492
|
174
|
After ranks, I then
determine the U value. The equations are as follows:
U1 = n1n2
+ n1(n1+1)/2 – R1 where
n1 = 19; n2 = 19; R1 = 492
U2 = n1n2
+ n2(n2+1)/2 – R2 where
n1 = 19; n2 = 17; R2 = 174
Results in
U1 = 19*17 +
19(19+1)/2 – 494
U2 = 19*17 +
17(17+1)/2 – 174
Results in
U1 = 323 +
19(20)/2 – 494
U2 = 323 +
17(18)/2 – 174
Results in
U1 = 323 +
380/2 – 494
U2 = 323 +
306/2 – 174
Results in
U1 = 323 +
190 – 494
U2 = 323 +
153 – 174
Results in
U1 = 19
U2 = 302
The test statistic for
Mann-Whitney U test is the smaller of the two U values. The U = 19.
The website http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html
provides an explanation. If there is no difference among the groups, a U value
will be the n. If one group is always larger than another group, a U value will
be 0. If they have some overlap, the result will be another number. Complex
concept, but do check out the site to understand it. The link cited also
explains: “Thus, smaller values of U support the research hypothesis, and
larger values of U support the null hypothesis.”
My value of U is 19.
This is a lot smaller than the other U value at 302. Also found in the citation
http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html
is a “Critical Values of the Mann-Whitney U (Two-Tailed Testing” table which is what
I need to use. Following the column for 19, down to the row of 17, and using
the value 0.05, my critical U value is 99. If my U value is smaller than or
equal to 99, then I have a statistically significant difference; if my U value
is larger than 99, then I accept the null hypothesis of no difference. The
value 0.01 has a critical value of 81. My U is less than or equal to 81, so it
has 99% confidence the group means are different.
My U of 19 is less than
or equal to the critical value of 99 or 81. My two groups are statistically
significantly different.
So how do I report this
statistic in the text? I followed the suggestion of http://evc-cit.info/psych018/Reporting_Statistics.pdf.
For Mann-Whitney an example reporting is:
“U = 67.5, p = .034, r
= .38”
Where do I obtain the p
and r values?
Although I performed the
wrong computations previously in “Statistical Analysis of Portulaca
Measurements – Part 1 – The Wrong Way,” it did provide me with a better
understanding.
I went to http://astatsa.com/WilcoxonTest/
and entered my data. It returned similar results to the test I performed in R
as described in “Statistical Analysis of Portulaca Measurements – Part 1 – The
Wrong Way.” In the R program, the test statistic provided was W = 302.
Interesting, this value is the larger of my U values, although if you report
the U value, you report the smaller. Seems R reports the larger value and calls
it W. The http://astatsa.com/WilcoxonTest/
also reports what the R program reports. It also reports how to perform the
function in R. The site reported:
# Copy-paste these lines into the R command prompt.
# Lines that begin with the # character are taken as comment lines by
R.
A <- c(22.5, 21.5, 22.0, 24.0, 22.5, 22.0, 22.0, 23.0, 21.0, 22.0,
24.0, 22.5, 19.0, 20.0, 14.5, 20.5, 21.0, 21.5, 23.5)
B <- c(19.5, 17.5, 18.0, 18.5, 18.5, 17.5, 18.5, 19.0, 18.0, 18.5,
18.0, 19.0, 19.5, 17.5, 19.5, 16.5, 16.5)
wilcox.test(A, B, paired = FALSE, alternative = "two.sided",
mu = 0.0,
exact = TRUE, correct =
TRUE, conf.int = TRUE, conf.level = 0.95)
I will attempt this and
I expect the same outcome.
That website also
reports these results:
Test statistic W : 302
p-value : 0.000009
There are ties in the ranks, so the exact p-value cannot be computed.
The p-value shown is inexact but fairly robust when the number of ties is
small.
null hypothesis μ 0 = μ0= 0.0
alternative hypothesis: two sided,
μ ^ ≠μ 0
μ^≠μ0
95% confidence interval :
2.5000 --------- 4.5000
sample estimate of difference in location μ ^
μ^ : 3.50003353466
Some of the characters
did not paste the same as on the website.
The R program reports
the results as:
Wilcoxon rank sum
test with continuity
correction
data: A and B
W = 302, p-value = 8.632e-06
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
2.500018 4.499995
sample estimates:
difference in location
3.500034
These are the same
results I obtained from the R program using the Wilcoxon sum ranked test to
evaluate the two groups. Note the p value for my initial R program test was
0.000008016 while the website http://astatsa.com/WilcoxonTest/
returned 0.000009.
Again what to report in
the text? I consulted https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
and http://evc-cit.info/psych018/Reporting_Statistics.pdf.
For http://evc-cit.info/psych018/Reporting_Statistics.pdf
I would report: U = 19
From my previous work:
This site provided me with qnorm() that calculates p-values to Z scores (http://stats.stackexchange.com/questions/101136/how-can-i-find-a-z-score-from-a-p-value).
qnorm(8.016e-06)
[1] -4.31401
Also from previous
work: The r value is the effect size. Its equation:
I calculate this out in
excel with: =-4.31401/(SQRT(17+19)) to get -0.719001667 = r.
To Report:
U = 19, p = 8.016e-06,
r = 0.719001667 (reporting suggested by http://evc-cit.info/psych018/Reporting_Statistics.pdf).
Or
For potting soil mean
average 21.5 cm (n=19) and 50/50 mixture potting soil and sand mean average at
18.23529412 (n=17) Mann-Whitney U test reported U=19, p = 8.016e-06 (reporting
suggested by https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test).
What this means. If we
had another plant we forgot to measure from each group, we could gamble that
the P plant will be taller than the M plant with over 99% confidence (p =
8.016e-06).
To graph these results
what should I report?
My determination: A
graph should show the means, possibly the sample size, error bars, and a note
of any statistically significance with a possible report of the p-value. Test
and test statistic are bonus to include. Error bars should be standard error or
95% confidence intervals. To help create the graph I made another table:
P
|
M
|
|
22.5
|
19.5
|
|
21.5
|
17.5
|
|
22
|
18
|
|
24
|
18.5
|
|
22.5
|
18.5
|
|
22
|
17.5
|
|
22
|
18.5
|
|
23
|
19
|
|
21
|
18
|
|
22
|
18.5
|
|
24
|
18
|
|
22.5
|
19
|
|
19
|
19.5
|
|
20
|
17.5
|
|
14.5
|
19.5
|
|
20.5
|
16.5
|
|
21
|
16.5
|
|
21.5
|
||
23.5
|
||
19
|
17
|
Count
|
21.52632
|
18.23529
|
Average
|
2.130947
|
0.937377
|
StdEv
|
Potting
|
Mix of 50/50 Sand/Potting
|
|
21.52632
|
18.23529
|
Average
|
2.130947
|
0.937377
|
StdEv
|
0.488873
|
0.227347
|
StdError
|
Choosing standard error
or 95% confidence intervals for error bars is a topic to explore on its own.
The standard error and confidence intervals are related: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064100/figure/.
This also provides more insight http://www.mas.ncl.ac.uk/~njnsm/medfac/docs/se&ci.pdf.
I previously figured out how to calculate standard error and I think I know how
to incorporate them for graphical representation. This provides insight to
standard error bars https://egret.psychol.cam.ac.uk/statistics/local_copies_of_sources_Cardinal_and_Aitken_ANOVA/errorbars.htm.
The R program reports the confidence intervals, but I do not know how to use
them for graphical representation.
There is probably more
to say and understand, but I get the idea that P plants are larger than M
plants.
Conclusion:
Plants grown in potting
soil were taller than those grown in a 50/50 mixture of potting soil and sand
(potting soil average height 21.5 cm (n=19) and 50/50 mixture potting soil and
sand average height at 18.2 cm (n=17); Mann-Whitney U test: U=19, p = 8.016e-06).
How to read the conclusion:
Plants grown in potting soil were taller than those grown in a 50/50 mixture of potting soil and sand is straight forward. In parenthesis the average provides the average height of plants grown in each type of soil. The potting soil average height was 21.5 which is taller than 50/50 soil type average height was 18.2. The parenthesis with n= indicates how many units, in this case plants, were measured to obtain the average: there were 19 potting soil plants measured; there were 17, 50/50 plants measured. Mann-Whitney U test is the test that was used to determine if the two soil type plant measurements had a statistical difference. The U=19 is the number the test returns so others familiar with the statistic have an idea of what took place. The p=8.016e-06 is the number people look for to be less than 0.05 which would indicate a statistically significant difference. The graph has two bars indicating the two soil types as shown by the X axis label. The height of the bars represents the average height of plants as shown by the Y axis label. The numbers on top of the bars are a repeat or exact number for the height of the bar which could also be determined by looking at the Y axis. The n=# at the base of the bar is a repeat of the text indicating how many units, plants in this case, were measured to obtain the average. The error bars on top of the bars are indicated to be ± one standard error. I won't go into too much detail about standard error, but the bars provide an idea of... I guess precision of the measurements. From standard error-error bars, if the error bars were to overlap for each average, then there is no statistical significance. If the bars do not overlap, as shown in this conclusion, then there may be a statistically significant difference, but that isn't an absolute with standard error-error bars, so then you need to refer to the p-vale.
Thank you for reading!
Have a Great day!
Comment to discuss!
No comments:
Post a Comment