TeMarChi: Statistical Analysis of Portulaca Measurements – Part 1

Statistical Analysis of Portulaca Measurements – Part 1 – The Wrong Way

Note this work contains errors. I confused the Wilcoxon sum rank test with the Wilcoxon signed rank test. What? I agree, but I think I figured it out. This post is basically notes to see how I figured out how to perform the test. I recommend skipping this post and going to
Statistical Analysis of Portulaca Measurements – Part 2 - Mann Whitney U

On 10/17/2016 I took measurements of my jars to end the phase I cared about keeping them alive. Jars were tagged for identification (first column), number of plants in the jars were counted (second column), height of the tallest plant was measured for each jar (third column), the number of seed capsules on plants within the jar was counted (forth column), and the weight of the entire jar and jars was measured (fifth column).

Tag	Type	# of Plants	Tallest Plant	# of pods	Weight (nearest 5 g
TP2	P	3	22.5	27	335
AP1	P	5	21.5	27	335
LP1	P	10	22	33	335
EP2	P	1	24	22	330
RP2	P	2	22.5	17	330
CP1	P	1	22	19	335
LP2	P	6	22	37	335
FP2	P	8	23	56	330
AP2	P	6	21	35	335
YP1	P	5	22	30	335
CP2	P	3	24	30	330
UP2	P	2	22.5	29	330
RP1	P	1	19	16	330
YP2	P	8	20	24	330
TP1	P	1	14.5	3	330
FP1	P	4	20.5	34	330
UP1	P	1	21	14	330
BP2	P	4	21.5	25	330
BP1	P	9	23.5	40	330
BM2	M	5	19.5	20	360
TM2	M	3	17.5	8	360
AM2	M	4	18	18	360
UM2	M	4	18.5	27	360
YM2	M	5	18.5	25	360
FM1	M	6	17.5	25	360
FM2	M	4	18.5	15	360
WM1	M	1	19	6	360
LM1	M	4	18	22	360
RM1	M	1	18.5	8	360
AM1	M	6	18	18	360
LM2	M	7	19	20	360
RM2	M	2	19.5	9	360
TM1	M	2	17.5	12	360
UM1	M	1	19.5	9	360
YM1	M	7	16.5	27	360
BM1	M	4	16.5	17	360

The R program was used to run statistical analysis on the different measurement in relation to the soil type P (potting soil) and M (50/50 mixture of potting soil and sand). To compare sets of measurements, the data must be entered as shown (I will try to keep all characters entered into R bold, all results returned from R in italics:

P=c(22.5,21.5,22,24,22.5,22,22,23,21,22,24,22.5,19,20,14.5,20.5,21,21.5,23.5)

M=c(19.5,17.5,18,18.5,18.5,17.5,18.5,19,18,18.5,18,19,19.5,17.5,19.5,16.5,16.5)

The first two lines set up P and M. I obtained these numbers from the table under the tallest plant column. These can be viewed by:

View(P)

View(M)

Yes capitalization matters. The sets of numbers must be tested to determine if the data is of equal variances and follows a normal distribution. To perform many tests, the data must follow these assumptions. These are called parametric tests. If the data is not of equal variances or normal distribution then assumptions are not met and nonparametric tests must be used for data comparison.

To test for equal variances:

var.test(P,M)

The R program then returns the following results:

F test to compare two variances

data: P and M

F = 5.1679, num df = 18, denom df = 16,

p-value = 0.001825

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

1.902069 13.645142

sample estimates:

ratio of variances

5.167927

The P value is below .05 at .001825. The null hypothesis can be rejected to determine variances are NOT equal.

The Shapiro test is used to determine normality. It must be performed on each group.

shapiro.test(P)

Shapiro-Wilk normality test

data: P

W = 0.81364, p-value = 0.001816

shapiro.test(M)

Shapiro-Wilk normality test

data: M

W = 0.93043, p-value = 0.2212

The p-value for P is 0.05>0.001816. The P data is not of normal distribution.

The p-value for M is 0.05<0.2212. The M data is of normal distribution.

I assume the data does not meet assumptions since one set of the data is not normally distributed.

With both normality and equal variances assumptions not met, I must use a nonparametric test. In this case it is the Wilcox rank sum tests. I used the R program:

wilcox.test(P,M, correct=FALSE)

It returns:

        Wilcoxon rank sum test

data:  P and M

W = 302, p-value = 8.016e-06

alternative hypothesis: true location shift is not equal to 0

My two groups are different with such a small P value. The p-value is 0.000008016 which is below .05. I can reject the null hypothesis, these groups of samples came from different groups.

Different in what way? To see what is different I will take the mean of each group, P and M. R can display this as a graph with:

Boxplot(P,M)

And it spits out this graph:

In excel I created a table and graph:

P	M
22.5	19.5
21.5	17.5
22	18
24	18.5
22.5	18.5
22	17.5
22	18.5
23	19
21	18
22	18.5
24	18
22.5	19
19	19.5
20	17.5
14.5	19.5
20.5	16.5
21	16.5
21.5
23.5
19	17	Count
21.52632	18.23529412	Average
2.130947	0.937377443	StdEv

Potting	Mix of 50/50 Sand/Potting
21.52632	18.23529412	Average
2.130947	0.937377443	StdEv
0.488873	0.227347424	StdError

This provides a visual of the differences between the two soil types in comparison to tallest plant measured for a jar. The error bars are that of standard error. I was using standard deviation, but standard error should be used. This is a great video to describe standard error: https://www.youtube.com/watch?v=BwYj69LAQOI.

But how would I write up these results? What should I includes?

Here is an example I found:

“How to report

You can report the results of an Wilcoxon test as follows: The medians of Group A and Group B were 2.0 and 4.5, respectively. An Wilcoxon Signed-rank test shows that there is a significant effect of Group (W = 1, Z = -2.39, p < 0.05, r = 0.53)” (http://yatani.jp/teaching/doku.php?id=hcistats:wilcoxonsigned).

Here is another example I found:

http://evc-cit.info/psych018/Reporting_Statistics.pdf

I have the W and P value. Now I search how to obtain the Z and r values.

This site provided me with qnorm() that calculates p-values to Z scores (http://stats.stackexchange.com/questions/101136/how-can-i-find-a-z-score-from-a-p-value).

qnorm(8.016e-06)

[1] -4.31401

Further reading seemed to explain that an absolute Z value greater than 3 is a sign that the p-value is very small. Mine is at 0.000008016 (http://stats.stackexchange.com/questions/97745/how-to-deal-with-z-score-greater-than-3).

The r value is the effect size. It used the equation:

I calculate this out in excel with: =-4.31401/(SQRT(17+19)) to get -0.719001667 = r.

Another statistic are confidence intervals. Fortunately I think I can obtain those by adding conf.int=TRUE to my wilcox.test input as show along with the results (http://www.stat.umn.edu/geyer/5102/examp/wilcox.html).

wilcox.test(P,M, correct=FALSE,conf.int = TRUE)

Wilcoxon rank sum test

data: P and M

W = 302, p-value = 8.016e-06

alternative hypothesis: true location shift is not equal to 0

95 percent confidence interval:

2.500069 4.499919

sample estimates:

difference in location

3.500034

Confidence intervals lower at 2.500069 and upper at 4.499919. But I'm not sure how to use them at the time of this post.

It is always good to review the assumptions of a test: https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test. The test I performed was the Wilcoxon rank sum test, aka Mann-Whitney U test, aka Mann-Whitney-Wilcoxon, etc. Although it almost tricked me into a full restart, this test should not be confused with a Wilcoxon signed-rank test which is different due to assumptions (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test). The Wilcoxon rank sum test uses independent samples as I have. The Wilcoxon signed-rank test uses dependent samples. Complicated, but looking at the assumptions helped clarify this.

This site appears to perform the test by hand and I should try it out. I also confused the Wilcoxons with what to report. Z value should be for the test I didn’t do. I need to review the Mann Whitney U. http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html

Overall I confused two different tests. I decided to post this to show the work involved. It does have some correct information which may be useful in the future. Check out Statistical Analysis of Portulaca Measurements – Part 2 - Mann Whitney U.

Thank you for reading! 1/28/2017 - Quickly reviewed, but not too much.

Comment if you would like.

Have a Great day!

TeMarChi

Saturday, February 18, 2017

Statistical Analysis of Portulaca Measurements – Part 1 – The Wrong Way

No comments:

Post a Comment