Sunday, June 18, 2017

Descriptive Morphological Statistics

I had a lot of different sized seed pod measurements I graphed:



Of these, there are 3 different groups to look at. But these are a lot of dots. There are methods to look at averages, but I thought if I added a 0.1 [later I use 5%] area circle around each dot, then combined those circles by group, I would get an area. What percent area overlap would I get from the different groups?

To explain better:



Above a drew some random points that could be graphed on a chart with x,y values. Looking at them, pink and black overlap, but green doesn’t. How much do pink and black overlap?



I then drew a line around the points. I did this by hand so it is not exact, but that is ok for my example.



I then shaded the regions in. Where the pink and black overlapped, I shaded blue.
It could be possible to calculate the area for each shaded region.



Above I made up some random areas based on how they looked, but they weren’t measured exactly.



I then added the x and y axis for discussion. Note the overlap in measurements. I will go on a tangent here: so which is bigger? Well black is closer to zero of both axis, it must be small, although it does overlap a little with pink pods. I want to say pink is larger because it has more of a blob higher than the rest, but we can just say pink is wider than the rest. Green gets close to as long as pink, is never short in length compared to the others, but isn’t as wide as the pink. To get sizes, the length and width would have to be multiplied resulting in a single value. Then using those values you can compare the sizes to the other groups. Woah complicated and I explained it poorly. This isn’t what I am originally asking. I leave the tangent here. If the blob is closer to the 0,0 point of the chart, then it is smaller, but keep in mind something like a 1 width that is 100 long is larger than a 2 width that is 10 long.

Back on course. Let us say I measure a random width and length of a pod. I then plot that length and width on my graph as a red dot.



Based on my chart, there is no question our random red dot sample is of the pink variety. Here we make assumptions that the value we measured was a normal individual, not a mutant or abnormally different individual. Another tangent example, if I measured the height of all the children in a 3rd grade school class room. For visual think of it as green. Then I measured the basketball player Yao Ming who is over 7 feet tall, the red dot. I don’t know the height of 3rd graders, they are all shorter I know that much. Both groups are humans, but you couldn’t compare such to such a chart. That didn’t explain too well, but it was to get you thinking.

Back on course, if we assume we have normal individuals from the group, then we can be confident our sample red dot is a pinky.

Notice pink is a larger area. That means pink can be more different measurements than the other areas, more variation in sizes. Green has less area, so it has less variation of individuals.



Above I have a new measurement, the red dot now falls within the blue area. We can conclude our sample is possibly a black or a pink. I tried to avoid the table, but now we may need it.



Looking at the table, our red dot falls within the 12.5% chance area to be a pink and within the 25% chance area to be a black. But we aren’t going to know or ever find out, again I just reported what you could do with this type of analysis.



Above, now my red dot doesn’t fall in any of the shaded areas. This means it is not a normal black, pink, or green, it has 0% chance to be any of them. It could be something new. Could be another color. Could be a mutant of the other colors that is just abnormally different in size. We don’t know, it is a mystery! I made it up, so it doesn’t matter. We are ok.

I’m sure there are some statistical tests that explain the same stuff number-wise, but the graphics helped me explain it. Now back to my original graph with many points.



I can’t draw a circle perfect enough around all those points. Maybe I can use a program that examines area. I will try with ArcGIS, a mapping program. My points are basically x,y coordinates of a map. For the world, the coordinate system uses its own distance. Instead of trying to add a coordinate system, I just decided to try to add the data. To add the data, copy the data into a new excel document and save it. Then open the ArcGIS (10.3 I think) program and add the data by selecting the excel file and the sheet within it. Then from the table of contents, right click and display the data. A warning will pop-up saying many of the options for the resulting points won’t be functional, which I will fix in a little bit. Here is what it generated.



Not bad at all, seems it worked. I right clicked the resulting points on the table of contents and exported the data to a personal file geodatabase. I then added that exported data to the map, and removed the original from the excel data. This process now restored the functions the program warned me about.

I then changed some symbols and zoom into the area.



Above looks great. If I wanted to draw circles around the colors, I could, but now the program can do such activities for us. Just looking at the points, it seems blue has the largest area, largest possible variation. The green seems to be the smallest. I guess blue can get to be the biggest. Red seems to be able to be the smallest [I accurately determine size later]. All have overlap. My next steps will be to calculate the area and overlap percentages of the points.

After some trial and error, my measurements are small, too small to compare the distance, or to place a buffer. I multiplied my measurements by 100, then I was able to put a 15 meter buffer around the dots.



Why did I chose 15 m?  My average size for all measurements regardless of soil and plant type was 3.05 x 2.38. I then multiplied these numbers by .05 to get 5% of their distance: 0.15 x 0.12. I decided to go with the larger, the length, the 0.15 as a buffer distance of 5% of the average. Since I multiplied all my measurements by 100 for plotting purposes, I would do the same for my buffer distance of 0.15 x 100 = 15 m. I don’t know how statistically strong this is, but I am looking more for descriptive details anyway.



Then removing the dots:



Looking at the buffers, most form a nice connected blob. A speck of buffer are found within the blob indicating we didn’t have complete coverage within that blog. Blobs not connected to the main blog are a sort of outlier, but that is probably due to a lack of measurements to cover the in-between, meaning I would have to measure more pods of the true population of seed pods and it should eventually join those to the main blob. Non-connected blobs and inside specks should be fine for what I am doing.

To calculate the area of the blobs, in the ArcGIS program, to each buffer attributes table, I added a new column. I then added a new field. Then in an edit session, I calculated the geometry, area.



Now I have more accurately determined the exact size of each blob. Now I can confirm what I thought, green is the smallest and blue is the largest. Variation of measurements is comparative to size of blog area. Blue has a larger variation range for sizes than the others and so forth.

Using the Intersect Geoprocessing tool, I can get an area of overlaps for each blob, then calculate that area similar to my previous method.



More tables:



This table tells us the area of overlap between two colors.



Table reduced:




The table just tells us percent the colors overlap.

Now how about for areas where nothing is overlapping? To accomplish this, I used the Geoprocessing clip function twice, then the merge, then the dissolve function. It became complex. Example: I clipped red from the blue to give me the red x blue area. I then clipped green from the blue to give me the green x blue area. I then merged these two clips. I then dissolved the resulting merge. Then I added a new field and calculated the geometry of the area with overlap. I then took the total area of the blue and subtracted the area with overlap to obtain the area with nothing. I repeated this process for the other colors. –Yes it was that complex.

Now how about for areas where all 3 overlap each other? First I merged all three buffers, then dissolved the resulting shape. With this all merge, I then clipped one at a time, each color buffer from the all merge. Each cut reduced the area that overlapped with another buffer to the point where I was only left with the part that had all 3 overlapping.


So what did we accomplish? What was the point of all this? We generated 2 tools: 1. a variation range and 2. an estimate of possibilities.

  


What and how to use the tools?

Example: If I took another measurement of a seed pod, and it was of the green variety, what are the chances I could be confusing it’s size with a blue variety?

Answer: I look at the table. I go to the bottom green section. I find the Green Blue overlap. I read 83%.

Example: What are the chances that I would measure a seed pod and find it was of the Blue variety with no chance to be confused with any other variety?

Answer: I look at the table. I go to the blue section. I find the Blue None overlap. I read 55%.

Example: If I took a measurement of a seed pod, I could then compare using the x and y axis graph by plotting that point. Wherever the point falls, I will be able to explain it could be of a certain variety if it falls within its outline. See the following where I took 4 measurements to compare to the graph; measurements are labeled A, B, C, and D.

 

Well A could be any variety because it is within the buffer of all 3 colors. B has to be blue only. I have no clue what C could be. D could either be red or blue.

The boss told me to get only blue variety seed pods. I look at what we have and they all look the same. From the graph, the x –axis is the length. Seems if I pick only seed pods of length 4 or greater, I will only get blue variety seed pods. Note: I wrote the 4 in place, but I could generate a graph that has tick marks for measurements for the axes.

It is important to remember the assumptions that I am obtaining sample measurements from seed pods that are normal.

Thank you for viewing. Comment if you would like. Have a Great day!

No comments:

Post a Comment