載入中...
相關課程

登入觀看
⇐ Use this menu to view and help create subtitles for this video in many different languages.
You'll probably want to hide YouTube's captions if using these subtitles.
相關課程
0 / 750
- In the last video we talked about different ways to
- represent the central tendency or the average of a data set.
- What we're going to do in this video is to expand that a
- little bit to understand how spread apart
- the data is as well.
- So let's just think about this a little bit.
- Let's say I have negative 10, 0, 10, 20 and 30.
- Let's say that's one data set right there.
- And let's say the other data set is 8, 9, 10, 11 and 12.
- Now let's calculate the arithmetic mean for both of
- these data sets.
- So let's calculate the mean.
- And when you go further on in statistics, you're going to
- understand the difference between a
- population and a sample.
- We're assuming that this is the entire
- population of our data.
- So we're going to be dealing with the population mean.
- We're going to be dealing with, as you see, the
- population measures of dispersion.
- I know these are all fancy words.
- In the future, you're not going to have all of the data.
- You're just going to have some samples of it, and you're
- going to try to estimate things for the entire
- population.
- So I don't want you to worry too much about that just now.
- But if you are going to go further in statistics, I just
- want to make that clarification.
- Now, the population mean, or the arithmetic mean of this
- data set right here, it is negative 10 plus 0 plus 10
- plus 20 plus 30 over-- we have five data points-- over 5.
- And what is this equal to?
- That negative 10 cancels out with that 10, 20 plus 30 is 50
- divided by 5, it's equal to 10.
- Now, what's the mean of this data set?
- 8 plus 9 plus 10 plus 11 plus 12, all of that over 5.
- And the way we could think about it, 8 plus 12 is 20, 9
- plus 11 is another 20, so that's 40, and then
- we have a 50 there.
- Add another 10.
- So this, once again, is going to be 50 over 5.
- So this has the exact same population means.
- Or if you don't want to worry about the word population or
- sample and all of that, both of these data sets have the
- exact same arithmetic mean.
- When you average all these numbers and divide by 5 or
- when you take the sum of these numbers and divide by 5, you
- get 10, some of these numbers and divide by 5,
- you get 10 as well.
- But clearly, these sets of numbers are different.
- You know, if you just looked at this number, you'd say, oh,
- maybe these sets are very similar to each other.
- But when you look at these two data sets, one thing might pop
- out at you.
- All of these numbers are very close to 10.
- I mean, the furthest number here is two away from 10.
- 12 is only two away from 10.
- Here, these numbers are further away from 10.
- Even the closer ones are still 10 away and these guys are 20
- away from 10.
- So this right here, this data set right here is more
- disperse, right?
- These guys are further away from our mean than these guys
- are from this mean.
- So let's think about different ways we can measure
- dispersion, or how far away we are from
- the center, on average.
- Now one way, this is kind of the most
- simple way, is the range.
- And you won't see it used too often, but it's kind of a very
- simple way of understanding how far is the spread between
- the largest and the smallest number.
- You literally take the largest number, which is 30 in our
- example, and from that, you subtract the smallest number.
- So 30 minus negative 10, which is equal to 40, which tells us
- that the difference between the largest and the smallest
- number is 40, so we have a range of 40 for this data set.
- Here, the range is the largest number, 12, minus the smallest
- number, which is 8, which is equal to 4.
- So here range is actually a pretty good measure of
- dispersion.
- We say, OK, both of these guys have a mean of 10.
- But when I look at the range, this guy has a much larger
- range, so that tells me this is a more disperse set.
- But range is always not going to tell you the whole picture.
- You might have two data sets with the exact same range
- where still, based on how things are bunched up, it
- could still have very different distributions of
- where the numbers lie.
- Now, the one that you'll see used most often
- is called the variance.
- Actually, we're going to see the standard
- deviation in this video.
- That's probably what's used most often, but it has a very
- close relationship to the variance.
- So the symbol for the variance-- and we're going to
- deal with the population variance.
- Once again, we're assuming that this is all of the data
- for our whole population, that we're not just sampling,
- taking a subset, of the data.
- So the variance, its symbol is literally this sigma, this
- Greek letter, squared.
- That is the symbol for variance.
- And we'll see that the sigma letter actually is the symbol
- for standard deviation.
- And that is for a reason.
- But anyway, the definition of a variance is you literally
- take each of these data points, find the difference
- between those data points and your mean, square them, and
- then take the average of those squares.
- I know that sounds very complicated, but when I
- actually calculate it, you're going to see it's not too bad.
- So remember, the mean here is 10.
- So I take the first data point.
- Let me do it over here.
- Let me scroll down a little bit.
- So I take the first data point.
- Negative 10.
- From that, I'm going to subtract our mean and I'm
- going to square that.
- So I just found the difference from that first data point to
- the mean and squared it.
- And that's essentially to make it positive.
- Plus the second data point, 0 minus 10, minus the mean--
- this is the mean; this is that 10 right there-- squared plus
- 10 minus 10 squared-- that's the middle 10 right there--
- plus 20 minus 10-- that's the 20-- squared
- plus 30 minus 10 squared.
- So this is the squared differences between each
- number and the mean.
- This is the mean right there.
- I'm finding the difference between every data point and
- the mean, squaring them, summing them up, and then
- dividing by that number of data points.
- So I'm taking the average of these numbers,
- of the squared distances.
- So when you say it kind of verbally, it sounds very
- complicated.
- But you're taking each number.
- What's the difference between that, the mean, square it,
- take the average of those.
- So I have 1, 2, 3, 4, 5, divided by 5.
- So what is this going to be equal to?
- Negative 10 minus 10 is negative 20.
- Negative 20 squared is 400.
- 0 minus 10 is negative 10 squared is 100, so plus 100.
- 10 minus 10 squared, that's just 0 squared, which is 0.
- Plus 20 minus 10 is 10 squared, is 100.
- Plus 30 minus 10, which is 20, squared is 400.
- All of that over 5.
- And what do we have here?
- 400 plus 100 is 500, plus another 500 is 1000.
- It's equal to 1000/5, which is equal to 200.
- So in this situation, our variance is going to be 200.
- That's our measure of dispersion there.
- And let's compare it to this data set over here.
- Let's compare it to the variance of this
- less-dispersed data set.
- So let me scroll over a little bit so we have some real
- estate, although I'm running out.
- Maybe I could scroll up here.
- There you go.
- Let me calculate the variance of this data set.
- So we already know its mean.
- So its variance of this data set is going to be equal to 8
- minus 10 squared plus 9 minus 10 squared plus 10 minus 10
- squared plus 11 minus 10-- let me scroll up a little bit--
- squared plus 12 minus 10 squared.
- Remember, that 10 is just the mean that we calculated.
- You have to calculate the mean first. Divided by-- we have 1,
- 2, 3, 4, 5 squared differences.
- So this is going to be equal to-- 8 minus 10 is negative 2
- squared, is positive 4.
- 9 minus 10 is negative 1 squared, is positive 1.
- 10 minus 10 is 0 squared.
- You still get 0.
- 11 minus 10 is 1.
- Square it, you get 1.
- 12 minus 10 is 2.
- Square it, you get 4.
- And what is this equal to?
- All of that over 5.
- This is 10/5.
- So this is going to be--all right, this is 10/5, which is
- equal to 2.
- So the variance here-- let me make sure I got that right.
- Yes, we have 10/5.
- So the variance of this less-dispersed data set is a
- lot smaller.
- The variance of this data set right here is only 2.
- So that gave you a sense.
- That tells you, look, this is definitely a less-dispersed
- data set then that there.
- Now, the problem with the variance is you're taking
- these numbers, you're taking the difference between them
- and the mean, then you're squaring it.
- It kind of gives you a bit of an arbitrary number, and if
- you're dealing with units, let's say
- if these are distances.
- So this is negative 10 meters, 0 meters, 10 meters, this is 8
- meters, so on and so forth, then when you square it, you
- get your variance in terms of meters squared.
- It's kind of an odd set of units.
- So what people like to do is talk in terms of standard
- deviation, which is just the square root of the variance,
- or the square root of sigma squared.
- And the symbol for the standard
- deviation is just sigma.
- So now that we've figured out the variance, it's very easy
- to figure out the standard deviation of both of these
- characters.
- The standard deviation of this first one up here, of this
- first data set, is going to be the square root of 200.
- The square root of 200 is what?
- The square root of 2 times 100.
- This is equal to 10 square roots of 2.
- That's that first data set.
- Now the standard deviation of the second data set is just
- going to be the square root of its variance, which is just 2.
- So the second data set has 1/10 the standard deviation as
- this first data set.
- This is 10 roots of 2, this is just the root of 2.
- So this is 10 times the standard deviation.
- And this, hopefully, will make a little bit more sense.
- Let's think about it.
- This has 10 times more the standard deviation than this.
- And let's remember how we calculated it.
- Variance, we just took each data point, how far it was
- away from the mean, squared that, took
- the average of those.
- Then we took the square root, really just to make the units
- look nice, but the end result is we said that that first
- data set has 10 times the standard deviation as the
- second data set.
- So let's look at the two data sets.
- This has 10 times the standard deviation, which makes sense
- intuitively, right?
- I mean, they both have a 10 in here, but each of these guys,
- 9 is only one away from the 10, 0 is 10 away
- from the 10, 10 less.
- 8 is only two away.
- This guy is 20 away.
- So it's 10 times, on average, further away.
- So the standard deviation, at least in my sense, is giving a
- much better sense of how far away, on average, we
- are from the mean.
- Anyway, hopefully, you found that useful.