When to use mean vs medianWhy use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median

What is Tony Stark injecting into himself in Iron Man 3?

Quitting employee has privileged access to critical information

PTIJ: Aliyot for the deceased

The past tense for the quoting particle って

The need of reserving one's ability in job interviews

Affine transformation of circular arc in 3D

Did Amazon pay $0 in taxes last year?

Why would the IRS ask for birth certificates or even audit a small tax return?

Should we avoid writing fiction about historical events without extensive research?

Too soon for a plot twist?

I've given my players a lot of magic items. Is it reasonable for me to give them harder encounters?

Iron deposits mined from under the city

Why aren't there more gauls like Obelix?

Giving a talk in my old university, how prominently should I tell students my salary?

Sundering Titan and basic normal lands and snow lands

Deal the cards to the players

How spaceships determine each other's mass in space?

Learning to quickly identify valid fingering for piano?

Does the US political system, in principle, allow for a no-party system?

Using the imperfect indicative vs. subjunctive with si

What's the best tool for cutting holes into duct work?

Python 3.6+ function to ask for a multiple-choice answer

Can a Mexican citizen living in US under DACA drive to Canada?

Rationale to prefer local variables over instance variables?



When to use mean vs median


Why use bootstrapping?How does Seaborn calculate error bars when using estimators other than the arithmetic mean?Median function in RWhy is this Binning by Median code wrong?Making Use of the Target Values for RegressionShould I use harmonic mean for averaging metrics in repeat runs of classifier evaluation?Domain adaption vs. heirarchical model - when to use which?What Does the Normalization Factor Mean in the AdaBoost Algorithm?A dataset has skewness = 1 with missing data. Standard deviation around median is 1.5. How much data will be unaffected?Data unaffected based on mean,deviation,median













5












$begingroup$


I'm new to data science and stats, so this might seems like a beginner question.



I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.



Can anyone explains, in which use-case we should use mean and when to use median?










share|improve this question







New contributor




Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    5












    $begingroup$


    I'm new to data science and stats, so this might seems like a beginner question.



    I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.



    Can anyone explains, in which use-case we should use mean and when to use median?










    share|improve this question







    New contributor




    Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      5












      5








      5


      2



      $begingroup$


      I'm new to data science and stats, so this might seems like a beginner question.



      I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.



      Can anyone explains, in which use-case we should use mean and when to use median?










      share|improve this question







      New contributor




      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I'm new to data science and stats, so this might seems like a beginner question.



      I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the mean of growth. But someone is suggesting me to use median for this.



      Can anyone explains, in which use-case we should use mean and when to use median?







      statistics descriptive-statistics






      share|improve this question







      New contributor




      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      Mukul JainMukul Jain

      1285




      1285




      New contributor




      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Mukul Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          5 Answers
          5






          active

          oldest

          votes


















          7












          $begingroup$

          The arithmetic mean is denoted as $barx$



          $$barx = frac1n sum_i=1^n x_i $$



          where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.



          In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.



          For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.



          Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.






          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
            $endgroup$
            – Mukul Jain
            2 days ago






          • 1




            $begingroup$
            @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
            $endgroup$
            – JahKnows
            yesterday










          • $begingroup$
            I think you could explain this better using the term "outlier"
            $endgroup$
            – MilkyWay90
            yesterday










          • $begingroup$
            @MilkyWay90, feel free to edit and make this into a community post.
            $endgroup$
            – JahKnows
            yesterday






          • 1




            $begingroup$
            So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
            $endgroup$
            – Mukul Jain
            yesterday


















          11












          $begingroup$

          It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.



          The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.



          There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?



          As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.






          share|improve this answer










          New contributor




          l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$




















            2












            $begingroup$

            Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps






            share|improve this answer









            $endgroup$




















              1












              $begingroup$

              Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.






              share|improve this answer








              New contributor




              nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$




















                1












                $begingroup$

                I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.



                So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.



                Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.






                share|improve this answer








                New contributor




                armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$












                  Your Answer





                  StackExchange.ifUsing("editor", function ()
                  return StackExchange.using("mathjaxEditing", function ()
                  StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                  StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                  );
                  );
                  , "mathjax-editing");

                  StackExchange.ready(function()
                  var channelOptions =
                  tags: "".split(" "),
                  id: "557"
                  ;
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function()
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled)
                  StackExchange.using("snippets", function()
                  createEditor();
                  );

                  else
                  createEditor();

                  );

                  function createEditor()
                  StackExchange.prepareEditor(
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: false,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: null,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader:
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  ,
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  );



                  );






                  Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.









                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function ()
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46744%2fwhen-to-use-mean-vs-median%23new-answer', 'question_page');

                  );

                  Post as a guest















                  Required, but never shown

























                  5 Answers
                  5






                  active

                  oldest

                  votes








                  5 Answers
                  5






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  7












                  $begingroup$

                  The arithmetic mean is denoted as $barx$



                  $$barx = frac1n sum_i=1^n x_i $$



                  where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.



                  In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.



                  For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.



                  Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.






                  share|improve this answer











                  $endgroup$








                  • 1




                    $begingroup$
                    That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                    $endgroup$
                    – Mukul Jain
                    2 days ago






                  • 1




                    $begingroup$
                    @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                    $endgroup$
                    – JahKnows
                    yesterday










                  • $begingroup$
                    I think you could explain this better using the term "outlier"
                    $endgroup$
                    – MilkyWay90
                    yesterday










                  • $begingroup$
                    @MilkyWay90, feel free to edit and make this into a community post.
                    $endgroup$
                    – JahKnows
                    yesterday






                  • 1




                    $begingroup$
                    So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                    $endgroup$
                    – Mukul Jain
                    yesterday















                  7












                  $begingroup$

                  The arithmetic mean is denoted as $barx$



                  $$barx = frac1n sum_i=1^n x_i $$



                  where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.



                  In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.



                  For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.



                  Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.






                  share|improve this answer











                  $endgroup$








                  • 1




                    $begingroup$
                    That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                    $endgroup$
                    – Mukul Jain
                    2 days ago






                  • 1




                    $begingroup$
                    @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                    $endgroup$
                    – JahKnows
                    yesterday










                  • $begingroup$
                    I think you could explain this better using the term "outlier"
                    $endgroup$
                    – MilkyWay90
                    yesterday










                  • $begingroup$
                    @MilkyWay90, feel free to edit and make this into a community post.
                    $endgroup$
                    – JahKnows
                    yesterday






                  • 1




                    $begingroup$
                    So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                    $endgroup$
                    – Mukul Jain
                    yesterday













                  7












                  7








                  7





                  $begingroup$

                  The arithmetic mean is denoted as $barx$



                  $$barx = frac1n sum_i=1^n x_i $$



                  where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.



                  In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.



                  For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.



                  Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.






                  share|improve this answer











                  $endgroup$



                  The arithmetic mean is denoted as $barx$



                  $$barx = frac1n sum_i=1^n x_i $$



                  where each $x_i$ represent an unique observation. The arithmetic mean measures the average value for a given set of numbers.



                  In contrast to this, the median is the value which falls directly in the middle of your dataset. The median is especially useful when you are dealing with a wide range or when there is an outlier (a very high or low number compared to the rest) which would skew the mean.



                  For example, salaries are usually discussed using medians. This due to the large disparity between the majority of people and a very few people with a lot of money (with the few people with a lot of money being the outliers). Thus, looking at the 50% percentile individual will give a more representative value than the mean in this circumstance.



                  Alternatively, grades are usually described using the mean (average) because most students should be near the average and few will be far below or far above.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited yesterday

























                  answered 2 days ago









                  JahKnowsJahKnows

                  5,082625




                  5,082625







                  • 1




                    $begingroup$
                    That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                    $endgroup$
                    – Mukul Jain
                    2 days ago






                  • 1




                    $begingroup$
                    @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                    $endgroup$
                    – JahKnows
                    yesterday










                  • $begingroup$
                    I think you could explain this better using the term "outlier"
                    $endgroup$
                    – MilkyWay90
                    yesterday










                  • $begingroup$
                    @MilkyWay90, feel free to edit and make this into a community post.
                    $endgroup$
                    – JahKnows
                    yesterday






                  • 1




                    $begingroup$
                    So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                    $endgroup$
                    – Mukul Jain
                    yesterday












                  • 1




                    $begingroup$
                    That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                    $endgroup$
                    – Mukul Jain
                    2 days ago






                  • 1




                    $begingroup$
                    @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                    $endgroup$
                    – JahKnows
                    yesterday










                  • $begingroup$
                    I think you could explain this better using the term "outlier"
                    $endgroup$
                    – MilkyWay90
                    yesterday










                  • $begingroup$
                    @MilkyWay90, feel free to edit and make this into a community post.
                    $endgroup$
                    – JahKnows
                    yesterday






                  • 1




                    $begingroup$
                    So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                    $endgroup$
                    – Mukul Jain
                    yesterday







                  1




                  1




                  $begingroup$
                  That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                  $endgroup$
                  – Mukul Jain
                  2 days ago




                  $begingroup$
                  That's a great answer. So, If I think it like this, I can plot my data and see if it values are continuous, then we can use mean and if they're more clustered (some high and some low), then median would be better, right?
                  $endgroup$
                  – Mukul Jain
                  2 days ago




                  1




                  1




                  $begingroup$
                  @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                  $endgroup$
                  – JahKnows
                  yesterday




                  $begingroup$
                  @MukulJain, Yes it depends on the distribution of the data as you mentioned. Plotting is always my go to way to get a sense of my data. Easy to spot anomalies and get a sense of its spread.
                  $endgroup$
                  – JahKnows
                  yesterday












                  $begingroup$
                  I think you could explain this better using the term "outlier"
                  $endgroup$
                  – MilkyWay90
                  yesterday




                  $begingroup$
                  I think you could explain this better using the term "outlier"
                  $endgroup$
                  – MilkyWay90
                  yesterday












                  $begingroup$
                  @MilkyWay90, feel free to edit and make this into a community post.
                  $endgroup$
                  – JahKnows
                  yesterday




                  $begingroup$
                  @MilkyWay90, feel free to edit and make this into a community post.
                  $endgroup$
                  – JahKnows
                  yesterday




                  1




                  1




                  $begingroup$
                  So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                  $endgroup$
                  – Mukul Jain
                  yesterday




                  $begingroup$
                  So, if data has lots of outliers, is it good to use median right? Outliers can be calculated using z-score (<3 or >-3)
                  $endgroup$
                  – Mukul Jain
                  yesterday











                  11












                  $begingroup$

                  It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.



                  The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.



                  There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?



                  As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.






                  share|improve this answer










                  New contributor




                  l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$

















                    11












                    $begingroup$

                    It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.



                    The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.



                    There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?



                    As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.






                    share|improve this answer










                    New contributor




                    l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.






                    $endgroup$















                      11












                      11








                      11





                      $begingroup$

                      It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.



                      The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.



                      There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?



                      As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.






                      share|improve this answer










                      New contributor




                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$



                      It depends what question you are trying to answer. You are looking at the rate of change of a time series, and it sounds like you are trying to show how that changed over time. The mean gives the reader one intuitive insight: they can trivially estimate the number of followers at any date $d$ days since the start by multiplying by the mean rate of change.



                      The downside to this single metric is that it doesn't illustrate something which is very common in series such as this: the rate of change is not fixed over time. One reasonable metric for giving readers an idea of whether the rate of change is static is giving them the median. If they know the minimum of the series (presumably zero in your case), the current value, the mean and the median, they can in many cases get a "feel for" how close to linear the increase has been.



                      There is a great cautionary tale in Anscombe's quartet - four completely different time series which all share several important statistical measures. Basically it always comes back to what you are trying to answer. Are you trying to find users which are likely to become prominent soon? Users which are steadily accruing followers year by year? One hit wonders? Botnets?



                      As you've probably guessed, this means it's not possible to universally call mean or median "better" than the other.







                      share|improve this answer










                      New contributor




                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      share|improve this answer



                      share|improve this answer








                      edited yesterday





















                      New contributor




                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      answered yesterday









                      l0b0l0b0

                      2115




                      2115




                      New contributor




                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.





                      New contributor





                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      l0b0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.





















                          2












                          $begingroup$

                          Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps






                          share|improve this answer









                          $endgroup$

















                            2












                            $begingroup$

                            Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps






                            share|improve this answer









                            $endgroup$















                              2












                              2








                              2





                              $begingroup$

                              Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps






                              share|improve this answer









                              $endgroup$



                              Simply to say, If your data is corrupted with noise or say erroneous no.of twitter followers as in your case, Taking mean as a metric could be detrimental as the model will perform badly. In this case, If you take the median of the values, It will take care of outliers in the data. Hope it helps







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered yesterday









                              karthikeyankarthikeyan

                              307




                              307





















                                  1












                                  $begingroup$

                                  Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.






                                  share|improve this answer








                                  New contributor




                                  nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                  Check out our Code of Conduct.






                                  $endgroup$

















                                    1












                                    $begingroup$

                                    Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.






                                    share|improve this answer








                                    New contributor




                                    nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    $endgroup$















                                      1












                                      1








                                      1





                                      $begingroup$

                                      Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.






                                      share|improve this answer








                                      New contributor




                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      $endgroup$



                                      Often median is more robust to extreme value to mean. Try to think it as a minimization task. Median corresponds to absolute loss while mean corresponds to square loss.







                                      share|improve this answer








                                      New contributor




                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.









                                      share|improve this answer



                                      share|improve this answer






                                      New contributor




                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.









                                      answered yesterday









                                      nan hunan hu

                                      112




                                      112




                                      New contributor




                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.





                                      New contributor





                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      nan hu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.





















                                          1












                                          $begingroup$

                                          I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.



                                          So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.



                                          Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.






                                          share|improve this answer








                                          New contributor




                                          armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                          Check out our Code of Conduct.






                                          $endgroup$

















                                            1












                                            $begingroup$

                                            I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.



                                            So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.



                                            Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.






                                            share|improve this answer








                                            New contributor




                                            armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                            Check out our Code of Conduct.






                                            $endgroup$















                                              1












                                              1








                                              1





                                              $begingroup$

                                              I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.



                                              So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.



                                              Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.






                                              share|improve this answer








                                              New contributor




                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              $endgroup$



                                              I find myself explaining this a lot and the example I use is the famous Bill Gates version. Bill Gates is in your data science class. Your instructor asks you: what is the average income or net worth of this class? Bill Gates sheepishly obliges and tells you what his income is. Now when you say the average income of your group is a zillion dollars - technically correct but does not describe the reality - that Bill Gates is an outlier skewing everything.



                                              So you line up all the people in your group in ascending or descending order - whatever the person in the middle is making - that is your median. In this example, everybody but Bill Gates is likely to be in spitting distance of that median, and Bill Gates will be the only one making anything close to the mean.



                                              Now say buddy Bill Gates is hiring a money manager. Based on the returns they produced so far. Should he look at their average returns over a 10 year period or their median return or a combination of the two? Did they outperform the market each year? Some years? How does portfolio size factor in? In the case of Twitter followers, Obama would have a different growth compared to someone with say 500K-1MM followers. As @l0b0 alludes to in his excellent answer - it all depends. Are you measuring follower growth or the rate of change of follower growth and what is the question you are trying to answer, strategy/product you are trying to develop - accordingly you pick mean or median. Getting the mean and median is always the easy part. It's always better to never ever have the average of 2.1 kids. Have a whole number of kids. But what can you say about population growth rates if mean number of kids is 2.1 and median is 1 or 2? Or median is 3 or more? Is growth accelerating or decelerating? What is mode doing? Compute all the basics first - and then ask the reason why you are using mean versus median.







                                              share|improve this answer








                                              New contributor




                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              share|improve this answer



                                              share|improve this answer






                                              New contributor




                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              answered yesterday









                                              armipunkarmipunk

                                              112




                                              112




                                              New contributor




                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.





                                              New contributor





                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              armipunk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.




















                                                  Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.









                                                  draft saved

                                                  draft discarded


















                                                  Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.












                                                  Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.











                                                  Mukul Jain is a new contributor. Be nice, and check out our Code of Conduct.














                                                  Thanks for contributing an answer to Data Science Stack Exchange!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid


                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.

                                                  Use MathJax to format equations. MathJax reference.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function ()
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46744%2fwhen-to-use-mean-vs-median%23new-answer', 'question_page');

                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  1928 у кіно

                                                  Захаров Федір Захарович

                                                  Ель Греко