When to use mean vs medianWhy use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median

What is Tony Stark injecting into himself in Iron Man 3?

Quitting employee has privileged access to critical information

PTIJ: Aliyot for the deceased

The past tense for the quoting particle って

The need of reserving one's ability in job interviews

Affine transformation of circular arc in 3D

Did Amazon pay $0 in taxes last year?

Why would the IRS ask for birth certificates or even audit a small tax return?

Should we avoid writing fiction about historical events without extensive research?

Too soon for a plot twist?

I've given my players a lot of magic items. Is it reasonable for me to give them harder encounters?

Iron deposits mined from under the city

Why aren't there more gauls like Obelix?

Giving a talk in my old university, how prominently should I tell students my salary?

Sundering Titan and basic normal lands and snow lands

Deal the cards to the players

How spaceships determine each other's mass in space?

Learning to quickly identify valid fingering for piano?

Does the US political system, in principle, allow for a no-party system?

Using the imperfect indicative vs. subjunctive with si

What's the best tool for cutting holes into duct work?

Python 3.6+ function to ask for a multiple-choice answer

Can a Mexican citizen living in US under DACA drive to Canada?

Rationale to prefer local variables over instance variables?

When to use mean vs median

Why use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median

I'm new to data science and stats, so this might seems like a beginner question.

I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.

Can anyone explains, in which use-case we should use mean and when to use median?

asked 2 days ago

Mukul Jain

1285

New contributor

add a comment |

I'm new to data science and stats, so this might seems like a beginner question.

Can anyone explains, in which use-case we should use mean and when to use median?

asked 2 days ago

Mukul Jain

1285

New contributor

add a comment |

I'm new to data science and stats, so this might seems like a beginner question.

Can anyone explains, in which use-case we should use mean and when to use median?

asked 2 days ago

Mukul Jain

1285

New contributor

I'm new to data science and stats, so this might seems like a beginner question.

Can anyone explains, in which use-case we should use mean and when to use median?

statistics descriptive-statistics

asked 2 days ago

Mukul Jain

1285

New contributor

asked 2 days ago

Mukul Jain

1285

New contributor

asked 2 days ago

Mukul Jain

1285

New contributor

asked 2 days ago

Mukul Jain

1285

asked 2 days ago

Mukul Jain

1285

New contributor

Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

5 Answers
5

active

oldest

votes

The arithmetic mean is denoted as $barx$

$$barx = frac1n sum_i=1^n x_i $$

where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.

In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.

For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.

Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.

edited yesterday

answered 2 days ago

JahKnows

5,082625

1

$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago

1

$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday

$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday

$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday

1

$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday

|
show 4 more comments

It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.

The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.

There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?

As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.

edited yesterday

answered yesterday

l0b0

2115

New contributor

add a comment |

Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps

answered yesterday

karthikeyan

307

add a comment |

Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.

answered yesterday

nan hu

112

New contributor

add a comment |

I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.

So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.

Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.

answered yesterday

armipunk

112

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46744%2fwhen-to-use-mean-vs-median%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

The arithmetic mean is denoted as $barx$

$$barx = frac1n sum_i=1^n x_i $$

where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.

Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.

edited yesterday

answered 2 days ago

JahKnows

5,082625

1

$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago

1

$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday

$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday

$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday

1

$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday

|
show 4 more comments

The arithmetic mean is denoted as $barx$

$$barx = frac1n sum_i=1^n x_i $$

where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.

Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.

edited yesterday

answered 2 days ago

JahKnows

5,082625

1

$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago

1

$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday

$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday

$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday

1

$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday

|
show 4 more comments

The arithmetic mean is denoted as $barx$

$$barx = frac1n sum_i=1^n x_i $$

where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.

Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.

edited yesterday

answered 2 days ago

JahKnows

5,082625

The arithmetic mean is denoted as $barx$

$$barx = frac1n sum_i=1^n x_i $$

where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.

Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.

edited yesterday

answered 2 days ago

JahKnows

5,082625

edited yesterday

answered 2 days ago

JahKnows

5,082625

answered 2 days ago

JahKnows

5,082625

answered 2 days ago

JahKnows

5,082625

1

$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago

1

$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday

$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday

$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday

1

$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday

|
show 4 more comments

1

$begingroup$
That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
$endgroup$
– Mukul Jain
2 days ago

1

$begingroup$
@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
$endgroup$
– JahKnows
yesterday

$begingroup$
I think you could explain this better using the term "outlier"
$endgroup$
– MilkyWay90
yesterday

$begingroup$
@MilkyWay90, feel free to edit and make this into a community post.
$endgroup$
– JahKnows
yesterday

1

$begingroup$
So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
$endgroup$
– Mukul Jain
yesterday

That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?

– Mukul Jain
2 days ago

@MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.

– JahKnows
yesterday

I think you could explain this better using the term "outlier"

– MilkyWay90
yesterday

@MilkyWay90, feel free to edit and make this into a community post.

– JahKnows
yesterday

So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)

– Mukul Jain
yesterday

|
show 4 more comments

As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.

edited yesterday

answered yesterday

l0b0

2115

New contributor

add a comment |

As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.

edited yesterday

answered yesterday

l0b0

2115

New contributor

add a comment |

As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.

edited yesterday

answered yesterday

l0b0

2115

New contributor

As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.

edited yesterday

answered yesterday

l0b0

2115

New contributor

edited yesterday

answered yesterday

l0b0

2115

New contributor

answered yesterday

l0b0

2115

answered yesterday

l0b0

2115

New contributor

l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

answered yesterday

karthikeyan

307

add a comment |

answered yesterday

karthikeyan

307

add a comment |

answered yesterday

karthikeyan

307

answered yesterday

karthikeyan

307

answered yesterday

karthikeyan

307

answered yesterday

karthikeyan

307

answered yesterday

karthikeyan

307

add a comment |

Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.

answered yesterday

nan hu

112

New contributor

add a comment |

Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.

answered yesterday

nan hu

112

New contributor

add a comment |

Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.

answered yesterday

nan hu

112

New contributor

Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.

answered yesterday

nan hu

112

New contributor

answered yesterday

nan hu

112

New contributor

answered yesterday

nan hu

112

answered yesterday

nan hu

112

New contributor

nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

answered yesterday

armipunk

112

New contributor

add a comment |

answered yesterday

armipunk

112

New contributor

add a comment |

answered yesterday

armipunk

112

New contributor

answered yesterday

armipunk

112

New contributor

answered yesterday

armipunk

112

New contributor

answered yesterday

armipunk

112

answered yesterday

armipunk

112

New contributor

armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

5 Answers
5

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Житомир

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Житомир

5 Answers
5

5 Answers
5

5 Answers
5