Hello,
I have data for the size of rocks for two independent locations, A and B.
Data lists rock size from smallest to largest (shown by yellow column)
Red column shows the percentage of rocks under that size (out of 100%)
Green columns shows cumulative percentage (out of 100%)
**Normality test indicates rock data (yellow columns) is not normally distributed.**
**When I apply log transform to the data, it becomes normally distributed, and shows a straight (linear) line when plotted on a graph**
*I want to compare rock sizes between the two locations*
**What is the best way to compare rock sizes (yellow columns) given that these values are thresholds, not actual rock sizes?**

 Accepted Answer

Star Strider
Star Strider on 20 Jan 2020

1 vote

If they both have the same distribution (regardless of what that distribution is),and you are comparing two samples, the ranksum test is likely the most appropriate.

4 Comments

Hi Star,
Thank you.
My confusion comes from fact that data are not ACTUAL rock sizes, but some kind of threshold.
Is it possible to find the ACTUAL rock size from this data?
My pleasure!
The actual rock size data seem to have been discarded. What is left is essentially a histogram, with the bins being the yellow columns and the relative counts being the red columns.
One option is to create a single vector of rock threshold sizes (yellow column) for both sites (using the linspace function), then using interp1 with the same threshold rock size vector and the red columns from each site separately, estimate the frequencies of the rocks with those common sizes at each site. Also, do not extrapolate, so that the rock sizes that not shared by both sites will be NaN for site A. You can then use isnan to eliminate the NaN values from the site A interpolation.
Then use the interpolation results with ranksum. This is likely as close as you can get for derived data.
Thank you Star.
Is it possible to just do ranksum on the yellow column (the bins) for both data? Or would this violate a rule?
My pleasure.
It would likely be more appropriate to use it on the red columns, since (as I understand it) those are the relative frequencies of the sizes.

Sign in to comment.

More Answers (0)

Categories

Find more on Geology in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!