My goodness it's getting complicated! I thought the simple answer was You just don't have enough data outside to ensure you get all the data points you need randomly. The n-1 helps expand toward the "real" standard deviation. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Ask Question. Asked 11 years ago. Active 10 months ago.
Viewed k times. Improve this question. Tal Galili Tal Galili You ask them "why this? Watch this, it precisely answers you question. Add a comment. Active Oldest Votes. Improve this answer. Michael Lew Michael Lew In essence, the correction is n-1 rather than n-2 etc because the n-1 correction gives results that are very close to what we need. More exact corrections are shown here: en. What if it overestimates?
Show 1 more comment. Dror Atariah 2 2 silver badges 15 15 bronze badges. Why is it that the total variance of the population would be the sum of the variance of the sample from the sample mean and the variance of the sample mean itself? How come we sum the variances?
See here for intuition and proof. Show 4 more comments. I have to teach the students with the n-1 correction, so dividing in n alone is not an option. As written before me, to mention the connection to the second moment is not an option. Although to mention how the mean was already estimated thereby leaving us with less "data" for the sd - that's important. Regarding the bias of the sd - I remembered encountering it - thanks for driving that point home.
In other words, I interpreted "intuitive" in your question to mean intuitive to you. Thank you for the vote of confidence :. The loose of the degree of freedom for the estimation of the expectancy is one that I was thinking of using in class. But combining it with some of the other answers given in this thread will be useful to me, and I hope others in the future. Show 3 more comments.
You know non-mathers like us can't tell. I did say gradually. Mooncrater 2 2 gold badges 8 8 silver badges 19 19 bronze badges. Any way to sum-up the intuition, or is that not likely to be possible? I'm not sure it's really practical to use this approach with your students unless you adopt it for the entire course though. Mark L.
Stone Mark L. Stone I am unhappy to see the downvotes and can only guess that they are responding to the last sentence, which could easily be seen as attacking the O. Richard Hansen Richard Hansen 1 1 silver badge 3 3 bronze badges.
The first symbol stands for the actual value of the average of all the data. The latter stands for an estimate of the average of all the data.
Estimate of the average? I have a subtle distinction to make. We are used to thinking that the statistical mean is just a fancy word for "average", but there is a subtle difference. The average or should I say "an" average is one estimate of the mean. If I take another collection of data points from the whole set of them if I sample the population , then I get another estimate of the mean. One may ask "how good is this estimate? If you take one data point to compute the average kind of a silly average, since there is only one then you have no idea how good the average is.
But if you have the luxury of taking a bunch of data points, then you have some information about how close the average might be to the mean. I'm, not being very statistical here, but it seems like a good guess that the true mean would lie somewhere between the smallest data point and the largest. Let's be a bit more precise. This is kind of an important result. If you wish to improve the statistical accuracy of your estimate of the mean by, for example, a factor of two, then you need to average four points together.
If you want to improve your estimate by a factor of ten, you will need to average data points. Difference between sample and population standard deviation. Finally, I can state a little more precisely how to decide which formula is correct. It all comes down to how you arrived at your estimate of the mean.
If you have the actual mean, then you use the population standard deviation, and divide by n. If you come up with an estimate of the mean based on averaging the data, then you should use the sample standard deviation, and divide by n Why n -1???? The derivation of that particular number is a bit involved, so I won't explain it.
I would of course explain it if I understood it, but it's just too complicated for me. I can, however, motivate the correction a bit. Let's say you came upon a magic lamp, and got the traditional three wishes. I would guess that most people would use the first couple of wishes on money, power, and sex, or some combination thereof. Or maybe something dumb like good health.
But, I am sure most of me readers would opt for something different, like the ability to use a number other than x-bar the average in the formula for the sample standard deviation. You might pick the average, or you might pick a number just a bit smaller, or maybe a lot larger. If you tried a gazillion different numbers, you might find something interesting.
That summation thing in the numerator? It is the smallest when you happen to pick the average for x-bar. The SD computed this way with n-1 in the denominator is your best guess for the value of the SD in the overall population. If you simply want to quantify the variation in a particular set of data, and don't plan to extrapolate to make wider conclusions, then you can compute the SD using n in the denominator.
The resulting SD is the SD of those particular values. It makes no sense to compute the SD this way if you want to estimate the SD of the population from which those points were drawn.
It only makes sense to use n in the denominator when there is no sampling from a population, there is no desire to make general conclusions. The goal of science is always to generalize, so the equation with n in the denominator should not be used. The only example I can think of where it might make sense is in quantifying the variation among exam scores. But much better would be to show a scatterplot of every score, or a frequency distribution histogram.
Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required. Home Support.
0コメント