When to use mean vs medianWhy use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median
What is Tony Stark injecting into himself in Iron Man 3?
Quitting employee has privileged access to critical information
PTIJ: Aliyot for the deceased
The past tense for the quoting particle って
The need of reserving one's ability in job interviews
Affine transformation of circular arc in 3D
Did Amazon pay $0 in taxes last year?
Why would the IRS ask for birth certificates or even audit a small tax return?
Should we avoid writing fiction about historical events without extensive research?
Too soon for a plot twist?
I've given my players a lot of magic items. Is it reasonable for me to give them harder encounters?
Iron deposits mined from under the city
Why aren't there more gauls like Obelix?
Giving a talk in my old university, how prominently should I tell students my salary?
Sundering Titan and basic normal lands and snow lands
Deal the cards to the players
How spaceships determine each other's mass in space?
Learning to quickly identify valid fingering for piano?
Does the US political system, in principle, allow for a no-party system?
Using the imperfect indicative vs. subjunctive with si
What's the best tool for cutting holes into duct work?
Python 3.6+ function to ask for a multiple-choice answer
Can a Mexican citizen living in US under DACA drive to Canada?
Rationale to prefer local variables over instance variables?
When to use mean vs median
Why use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median
$begingroup$
I'm new to data science and stats, so this might seems like a beginner question.
I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.
Can anyone explains, in which use-case we should use mean and when to use median?
statistics descriptive-statistics
New contributor
$endgroup$
add a comment |
$begingroup$
I'm new to data science and stats, so this might seems like a beginner question.
I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.
Can anyone explains, in which use-case we should use mean and when to use median?
statistics descriptive-statistics
New contributor
$endgroup$
add a comment |
$begingroup$
I'm new to data science and stats, so this might seems like a beginner question.
I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.
Can anyone explains, in which use-case we should use mean and when to use median?
statistics descriptive-statistics
New contributor
$endgroup$
I'm new to data science and stats, so this might seems like a beginner question.
I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.
Can anyone explains, in which use-case we should use mean and when to use median?
statistics descriptive-statistics
statistics descriptive-statistics
New contributor
New contributor
New contributor
asked 2 days ago
Mukul JainMukul Jain
1285
1285
New contributor
New contributor
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
$begingroup$
The arithmetic mean is denoted as $barx$
$$barx = frac1n sum_i=1^n x_i $$
where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.
In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.
For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.
Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.
$endgroup$
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
|
show 4 more comments
$begingroup$
It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.
The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.
There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?
As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.
New contributor
$endgroup$
add a comment |
$begingroup$
Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps
$endgroup$
add a comment |
$begingroup$
Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.
New contributor
$endgroup$
add a comment |
$begingroup$
I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.
So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.
Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46744%2fwhen-to-use-mean-vs-median%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The arithmetic mean is denoted as $barx$
$$barx = frac1n sum_i=1^n x_i $$
where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.
In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.
For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.
Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.
$endgroup$
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
|
show 4 more comments
$begingroup$
The arithmetic mean is denoted as $barx$
$$barx = frac1n sum_i=1^n x_i $$
where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.
In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.
For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.
Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.
$endgroup$
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
|
show 4 more comments
$begingroup$
The arithmetic mean is denoted as $barx$
$$barx = frac1n sum_i=1^n x_i $$
where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.
In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.
For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.
Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.
$endgroup$
The arithmetic mean is denoted as $barx$
$$barx = frac1n sum_i=1^n x_i $$
where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.
In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.
For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.
Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.
edited yesterday
answered 2 days ago
JahKnowsJahKnows
5,082625
5,082625
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
|
show 4 more comments
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
1
1
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago
1
1
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday
1
1
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday
|
show 4 more comments
$begingroup$
It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.
The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.
There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?
As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.
New contributor
$endgroup$
add a comment |
$begingroup$
It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.
The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.
There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?
As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.
New contributor
$endgroup$
add a comment |
$begingroup$
It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.
The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.
There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?
As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.
New contributor
$endgroup$
It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.
The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.
There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?
As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.
New contributor
edited yesterday
New contributor
answered yesterday
l0b0l0b0
2115
2115
New contributor
New contributor
add a comment |
add a comment |
$begingroup$
Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps
$endgroup$
add a comment |
$begingroup$
Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps
$endgroup$
add a comment |
$begingroup$
Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps
$endgroup$
Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps
answered yesterday
karthikeyankarthikeyan
307
307
add a comment |
add a comment |
$begingroup$
Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.
New contributor
$endgroup$
add a comment |
$begingroup$
Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.
New contributor
$endgroup$
add a comment |
$begingroup$
Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.
New contributor
$endgroup$
Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.
New contributor
New contributor
answered yesterday
nan hunan hu
112
112
New contributor
New contributor
add a comment |
add a comment |
$begingroup$
I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.
So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.
Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.
New contributor
$endgroup$
add a comment |
$begingroup$
I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.
So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.
Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.
New contributor
$endgroup$
add a comment |
$begingroup$
I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.
So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.
Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.
New contributor
$endgroup$
I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.
So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.
Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.
New contributor
New contributor
answered yesterday
armipunkarmipunk
112
112
New contributor
New contributor
add a comment |
add a comment |
Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.
Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.
Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.
Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46744%2fwhen-to-use-mean-vs-median%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown