Sunday, March 11, 2018

Data First Analysis - Number of Samples Needed

Data First Analysis - Number of Samples Needed
3/11/2018


Analysis 1 of samples:
After a number of measurements are taken, here are the steps for an initial analysis of that population in Excel or any spreadsheet.

Measurements – some number, count, quantity, etc.
Example: How many chickens are in the chicken pen? 12.

Population – the group you took the samples from.
Example: How many chicken pens are there? 10.

1. Put your samples into a table.
2. Below the table, put the number of samples sampled. I will call this the Count. =count()
3. Below that cell, put the total of all samples added together. I will call this the Sum. =sum()
4. Below that cell, put the average of the samples. This can be done by dividing the Sum by the Count. I will call this the Average. =average()
5. Below that cell put the standard deviation of the samples. I won't explain how to calculate standard deviation here, but I might add it later. I tested Excel and did the process by hand to make sure Excel could be trusted. I will call this the Standard Deviation. =stdev()
6.Below that cell put the standard error of the samples. I will explain how to calculate standard error here. I would not trust a formula in Excel, rather enter the calculation yourself. You divide the standard deviation by the square root of the count. I will call this the Standard Error. For example: =L23/(SQRT(L20))





7. Using the average, make a bar graph.
8. In Excel, in chart tools, select Add Chart Element, Error bars, More Error Bars Options. From the options that appear, select custom, then click Specify Value. Use the options to select the StdError for both the Positive Error Value and the Negative Error Value.
9. The chart title, the axes, etc. can be edited.



This can be viewed as a building block for statistical analysis tests. If you have multiple populations, many similar bars, then you can use a statically program to see if a bar is statistically different from another bar. This could be useful to answer questions or test a hypothesis, educated guess on the result of some happenings.

This concept is how I have analyzed measurements from other posts. See "Statistical Analysis of Portulaca Measurements – Part 2 – The Right Way" for a review.

How many samples should you take from a population?
From this post:
"2017 Planting Experiment - Amount to Plant - Baptisia Portion" I have this quote: The result of 8 is a good suggestion. “In some disciplines, group sizes of eight are almost universal and some referees may go so far as to reject a paper which deviates from this norm,” (http://www.ebd.csic.es/documents/240051/0/Sample_size_and_power_analysis.pdf)." Eight doesn't have to be the minimum, yet it appears a base standard in the scientific community.

Before going into too much detail, I will give my answer:
8 is the minimum rule.
10 is a good amount.
15 is the max.
If you need more than 15 then you need to work on standardizing your measurements/methods better.

Resource availability and other factors may determine the number of samples required. More samples can help reduce the error bars, fine tuning them if the difference among populations isn't as apparent through the statistical analysis. If you have funding for 10 sample and apply the minimum of 8 rule, then you are in an alright situation. If you have funding for 6 samples, maybe you should look into acquiring more funding.

More samples are better seems to be a good concept, but in some situations the resource cost is too high, even for ethical standards. I will try explain this concept in simple terms of a simplistic made up scenario, yet such experiments do and should display a justification for the number of samples needed and used. Suppose a number of animals, mice , are needed for samples. You may only need 8 mice that have to die for science in order to get your desired result. Raising mice is costly, but the needless slaughter of 800 mice when 8 mice will do and provide you the same result might seem cruel. The use of mice might seem bad to some, who really cares about the mice for others, but what if the test subjects are terminal human patients and the experimental treatment could lead to death. To use 800 people when 8 will provide the answer is wrong. People aren't just tested on like that, such experiments would have more justifications, explanations, and other tests in preparation work to determine the number. I hope you can try to understand that wasting resources is not good, even if the costs are low.

I wanted to be able to be able to explain with confidence:
How long are portulaca seed pods?
How wide are portulaca seed pods?
How many seeds are in portulaca seed pods?
What is the seed/volume of portulaca seed pods?

I wanted to determine how many samples I needed for measurements I was taking from my Portulaca experiment. I measured the size of seed pods and counted the number of seeds within each seed pod. The pods and seeds are very small and numerous for each portulaca plant. Taking measurements for one sample is tedious work, yet I had potential to take a lot of samples. To study the process of how many samples I should take, I investigated. I did not shy away from the tedious work, as I thought this to be a bad case scenario when I could test this under easier circumstances. But I wanted the challenge. There are probably other studies that go into detail about this stuff, but I learned a lot performing the examination on my own.

The top 3 graphs include all measurements. The measurements for number of seeds averaged way higher than the numbers from the other measurement. The other measurements look like blurred lines. To be able to see those bottom lines, one from another, in the bottom 3 graphs I removed the seed line in order to view the other lines more closely.

The left graphs show the averages. Note the x-axis has numbers, but the samples are times 5. In total I took measurements from 45 seed pods.

In the middle graphs, I looked at the Standard Deviation when additional samples were added, at 5 sample intervals.

In the right graphs, I looked at the Standard Error when additional samples were added, at 5 sample intervals.

Looking back at the left graphs, the averages, the 5 to 10 sample marks for some of the measurements types showed a drastic increase in comparison to the rest of the lines. It appeared that at 10 samples, the lines pretty much stay level regardless of the additional 35 samples. So basically 10 samples is justified. I didn't fine tune the examine to test 8 sample, due to the 8 rule, I just didn't.

I examined the Standard Deviation and the Standard Error to determine the "noise" or movement/variation of the line. I believe examining the Standard Error is inappropriate as it results from the Standard Deviation, it doesn't look at the "noise" I am talking about. The Standard Deviation examination seemed more valuable.

I will try to reexplain what I am doing. I want to keep adding samples until my line for the average levels out into a flat line. The flatter the line, the better. Rather than just eyeballing the graph to determine the levelness of the line, I looked at the Standard Deviation result to determine if the line remained constant. Again the flatter the line the better. Note the y-axis is small. The Standard Deviation graph does show some fluctuation at that small y-axis scale, yet it isn't really that much in my opinion. The Standard Deviation (SD) line fluctuates due to its sensitivity to adding additional sample that may differ from the previous. From 5 to 10 sample, many of the SD lines showed a decrease. From 10 to 15 samples, many of the SD lines showed an increase. From 15 to 20 samples, many of the SD lines showed another decreased. Yet all these little decreases and increases are still just little. Yet this was a way to view the line movement, or so this is how I am taking it. This could be a wrong method to look at the fluctuations on the line as a small scale, so I am open for discussion from others.

Overall, the averages of the lines of the measurements basically level out at 10 samples. I decided to use 15 samples because I had the resources to do so, but it was probably an unjustified waste. I am able to say, "I went with 15 samples just to make sure, but 10 would have been good enough." I say this in case people ask the question, why did you use 15 samples? The community seems to like to hear you used more, more is better and acceptable, than to show you used too few, and too few is definitely unacceptable. Once I decided on the 15 samples, I now have justification why I shouldn't do more. Taking 45 samples of each unit would be a waste, 15 is acceptable. That cuts my time and resources for performing all 45 by two-thirds. Testing this process did further help show me how to understand that more isn't better when it is just a waste.

If there were a situation where the line doesn't level out at 15 samples, then the methods for the measurements were poorly selected. There are probably other factors that need to be accounted for and a different measurement should be used. 15 samples is probably overkill already. If you need more, then you should do so with justification and explanation, or change your methods so that 8-15 does work. 10 could be a good number of samples for percentages uses.

I feel I need to further explain. Lets say your study is like my Planting Studies. Why didn't I just plant 15 seeds. For that study I am answering other questions under other circumstances. I tested seed germination over time and wanted enough samples to show germination over weeks. 15 seeds over 8 weeks all together could be used for 1 sample or as I mentioned above 1 population. Each seed would not be considered its own population. each of the 15 seeds would not get its own bar graph. All 15 would be included in one bar graph... so to say.

I'd like to see discussion from other on their opinions. This discussion no doubt has been asked before and it would be interesting to have other discuss what they have found.

Thank you for reading!

Have a Great day!

Comment if you would like!

No comments:

Post a Comment