# Finding the Mode of Grouped Data

The mode of a list of data values is simply the most common value (or values … if any). When data is grouped (binned) as in a histogram, we normally talk only about the modal class (the class, or group, with the greatest frequency), because we don’t know the individual values. But some sources teach a formula for finding (actually just estimating) the mode. We’ve had a number of questions about that formula.

## The formula

I had never heard of such a formula until 2007, when a question was asked about applying it in a special case. That answer wasn’t archived, but when we got another question about it a year later, it was time to publish what I had figured out. Here is the 2008 question, from Saptarshi:

Different Formulas for Calculating Mode

I am a M.B.A student. Our teacher tells a formula to find out mode, that is Z=L1+(F1-F0)/(2F1-F0-F2)*i

where: L1 = lower limit of modal class
F1 = modal class frequency.
F2 = just after the modal class frequency.
F0 = just previous the modal class frequency.
i = class interval.
Z = the mode value.

But I saw in most of cases the highest frequency is the mode. They don't use that formula. (I saw that when searching about mode in google). So why we need that formula? Can you please explain me.

I think he is saying that whereas he was taught this formula for the mode, most sources he found online do as I have usually seen, identifying only the class with the greatest frequency as the mode (actually the modal class). So, why was he taught this formula, and what does it mean?

The formula, which I now find more easily around the Web than I could back then, takes several forms. His, in more readable format, is $$Z = L_1 + \frac{F_1 – F_0}{2F_1 – F_0 – F_2}\cdot i.$$ The form we had previously been asked about was a little different: $$Z = L_1 + \frac{d_1}{d_1 + d_2}\cdot i,$$ where $$d_1$$ and $$d_2$$ are the differences between the frequency of the modal class and those of its nearest neighbors.

I started this answer by stating what the formula does, and showing the two formulas to be equivalent:

This formula gives a linear interpolation to estimate the actual value of the mode from grouped data; otherwise, all you really know is the modal class (which is sufficient for many purposes).

Your formula can be written differently if we take

d1 = F1 - F0  (difference between modal class and previous class)
d2 = F1 - F2  (difference between modal class and next class)

Then d1 + d2 = (F1 - F0) + (F1 - F2) = 2F1 - F0 - F1, so the formula is

Z = L1 + d1/(d1 + d2) * i

I have never found an explanation of the formula in a mathematical source that explains its proper derivation and the conditions under which it is valid; there are several sites that explain it after-the-fact as I will do below, but most sources I find are at a basic level where they just state the formula and tell students to use it. Most, in fact, just state that it gives the mode, whereas, as stated above, it is really only a guess — an estimate of what the actual mode might be, based on the shape of the histogram. We don’t really know how the data are distributed within any of the classes, so it is impossible to know the actual mode; it may not even be in the modal class. On the other hand, the actual mode may just reflect that some random data points happen to be identical; a number based on the overall shape may really be more meaningful! So this is a valid concept, at least in some situations.

One source that gives this formula with a proper description is

Math Is Fun: Mean, Median and Mode from Grouped Frequencies

which says, under “Estimating the Mode from Grouped Data”,

We can easily find the modal group (the group with the highest frequency), which is 61 – 65.

We can say “the modal group is 61 – 65″.

But the actual Mode may not even be in that group! Or there may be more than one mode. Without the raw data we don’t really know.

But, we can estimate the Mode using the following formula:
Estimated Mode =$$L + \frac{f_m − f_{m-1}}{(f_m − f_{m-1}) + (f_m − f_{m+1})} × w$$
where:

• L is the lower class boundary of the modal group
• fm-1 is the frequency of the group before the modal group
• fm is the frequency of the modal group
• fm+1 is the frequency of the group after the modal group
• w is the group width

I gave some references, and then quoted what I had said in answering the 2007 question:

The formula these sites give, with definitions of the variables, is (using the second site's version):

When data are already grouped in a frequency
distribution, we can assume that the mode is
located in the class with the most items. In
order to determine a single value for the
mode from this modal class, we use

mode = LBMo + [d1 /(d1+d2)] (Width)

where

LBMo = lower boundary of the modal class
Width = width of the modal class interval
d1 = frequency of the modal class minus
the frequency of the class directly below it
d2 = frequency of the modal class minus
the frequency of the class directly above it

Note that d1 and d2 relate to the classes on the left and on the right in the histogram.  If there is no class on the left, then you can imagine a class with frequency zero.  Then the formula applies easily.

Note that this source rightly said the formula only gives “a single value for the mode”, not “the actual value of the mode”.

The purpose of this formula is to identify one value within the modal class that seems likely to be the peak of the curve if you smoothed out the histogram.  It does this by taking the value within the interval whose distance from the class on either side is proportional to how much less the frequency is on either side.   You can see this by rewriting the formula:

mode - L1     d1
--------- = -------
Width     d1 + d2

That is, the distance from the lower bound (left end) of the modal class, as a fraction of the width of the modal class, is the ratio of the left difference to the sum of the differences.

In thinking about this relationship, I saw a graphical meaning to the formula (which I now see on various other sites; I’m sure I’m not the first to have seen it):

There is a simple geometrical way you could find this point.  Just draw lines from the top corners of the modal bar to the near corners of the neighboring bars, and the mode estimate lies at the intersection:

+---------+
|  \    / |d2
d1|     X   |
|   / :   +---------+
| /   :   |         |
+---------+     :   |         |
|         |     :   |         |
|         |     :   |         |
|         |     :   |         |
|         |     :   |         |
+---------+-----:---+---------+
L1   mode
|<------->|
width

This puts the estimated mode closer to the higher neighboring bar, which makes sense. (I’ll have more to say about that below.) If you’re not sure how this relates to the proportion I wrote, look for a pair of similar triangles …

I closed with an example (again, quoted from my response a year earlier, and using that writer’s notation, where “85<91” meant the six numbers starting at 85, and less than 91):

For an example, take these classes:

85<91          10
91<97           8
97<103          3
103<109         8
109<115         0
115<121         7

The modal class is 85<91.

LBmo = 85
width = 6
d1 = 10 - 0 = 10 (since the frequency on the left is 0)
d2 = 10 - 8 = 2  (since the frequency on the right is 8)

mode = LBMo + [d1 /(d1+d2)] (Width)
= 85 + (10/12)(6)
= 85 + 5
= 90

This is 5 from the left and 1 from the right, a ratio of 5:1, while the differences in frequency are 10:2.

## The “mode” depends on the classes

Mode's Fickle Formula?

The formula for mode is not telling me the actual mode. In fact, after grouping data, I have found many situations where the mode changes.

For example, given these data:

1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4

The mode is 1.

But after grouping data, as below, the mode becomes approximately 3.3:

CLASS       FREQUENCY
1-3            5
3-5            6

Why does the mode of data change like this?

It appears that Gaurav had not been taught that the formula gives only a guess at the mode, and can’t be expected to give the actual mode, since it doesn’t have access to the actual data. But the question provided a good opportunity to examine more closely what the formula actually does. I replied:

The formula you have presumably been given for the mode of grouped data does not necessarily give the actual mode. Rather, it gives you a guess that is considered reasonable under some conditions.

When you group data, you lose information, so you should expect not to be able to recover detail using any formula.

In other words, the mode didn't change; you just guessed the mode from insufficient data.

I don't actually know of any theoretical basis for the formula that would make it reasonable to expect it to be correct for some particular kind of data (e.g., approximately normal). But given the questions that we math doctors routinely see about this subject, it appears that it is commonly taught without explaining what the formula really is: an approximation, at best.

I gave a link to the answer above, to make sure we were talking about the same formula. Then I showed how the actual data provided (in the form of a “dot plot”) compare to the histogram:

Note that your data are not normally distributed, so it is not at all surprising that the formula would not work. Also, the actual data (*'s)
and the grouped data (bars) look quite different:

+---+
+---+   |
|*  |   |
|*  |* *|
|*  |* *|
|* *|* *|
--+---+---+--
1 2 3 4

Looking at that, we see that the mode of the actual data is not even in the modal class; this is because the data are not smoothly distributed, so the grouping changes its character. (My guess is that the formula is considered valid, as I suggested, for normally distributed data; it would be at least reasonable for a smooth and symmetrical distribution.)

We should check his work with the formula. Using the formula in the first form I showed above, $$Z = L_1 + \frac{F_1 – F_0}{2F_1 – F_0 – F_2}\cdot i,$$ we have $$L_1 = 2.5, F_0 = 5, F_1 = 6, F_2 = 0, i = 2$$ so $$Z = 3 + \frac{6 – 5}{2\cdot 6 – 5 – 0}\cdot 2 = 3 + \frac{1}{7}\cdot 2 = 3.29.$$ Here I took $$L_1$$ to be 3, the lower class limit as stated in the first form I quoted above, rather than 2.5, the lower class boundary, as in most versions I have found, in order to get his answer. I think the latter is the proper definition of the variable; I hadn’t noticed this discrepancy until now.

Why would the mode of grouped data depend on the frequency of pre- and post-modal classes?

This is essentially asking for a deeper explanation of the formula. I replied:

The page I referred you to explains the formula as well as I can. The basic idea is that if you have data that looks like a normal distribution (one symmetrical hump), but group the data, the classes on either side would be asymmetrical if the actual mode is not centered in the modal class; so looking at the adjacent classes can help estimate where the mode would be within the class.

Here are two examples:

symmetrical             asymmetrical

|                         |
+-*-+                    +--*+
*   *                    |*  |*
*|   |*                   *   +-*-+
+-*-+   +-*-+               *|   |  *|
*   |   |   *            +*--+   |   +*--+
+*--+   |   |   +--*+    +-*-+   |   |   |   *
+---+---+---+---+---+    +---+---+---+---+---+

The symmetrical histogram should have its mode in the middle of the modal class. The histogram on the right -- with a higher bar on the right of the modal class -- should have its mode closer to the higher side. The formula does this in the simplest possible way.

I have made a histogram by binning the standard normal distribution in various ways, and found that the formula does give the mode quite accurately in that case. When I did the same for a triangular distribution, it was less accurate.

## What if there are two modal classes?

Here is a question from 2016:

Breaking the Mode

How do you find the mode of this grouped data?

data    freq
-------------
10-14      5
15-19     12
20-24     12
25-29     10
30-34      4

I know the mode formula:

Mo = L + (d1/(d1 + d2))*width

I calculated its parts like this:

L = 14.5

d1 = 12 - 5
= 7

d2 = 12 - 10
= 2

width = 24.5 - 14.5
= 10

But I'm confused about the last two. Should d2 = 12 - 12 = 0? Should the width be 5?

From there, I went on to determine

mode = 14.5 + 7/(7 + 2)*10
= 14.5 + 7.8
= 22.3

Is my work true?

The answer seems reasonable (it is at least within a modal class). But does the formula work when the “modal class” is double-wide?

First, we have to keep in mind that we don’t even know what it would mean for an answer to be correct, since we don’t know the actual data! But I answered:

The formula you are using does not really tell you "the mode"; it just makes a reasonable estimate of where the mode might be if the underlying distribution is, say, approximately normal. Since it is not exact in the first place, it probably doesn't matter much how you apply it in special cases. If you have been taught the formula without any further explanation, then you can't be expected to follow any particular rules for this case.

I have never found a source for this formula that explains its theoretical basis, or the conditions under which it should be used, or how it applies in unusual cases (which should be an inference from the theory, if there were one). I've explained what I can guess from the formula, and from what sources I do find, here:

Different Formulas for Calculating Mode
http://mathforum.org/library/drmath/view/72977.html

This explanation for it assumes generally that each class has the same width, so it doesn't quite apply when the "modal class" has twice the width of the others, which is the way you are treating it.

I made a suggestion, to rework the classes so they all have the same width, which is that of the double modal class:

I would probably rework the data so that there are fewer (equal width) classes, and just one modal class:

data    freq
------------
5- 9      0     [added implied empty class]
10-14      5
15-19     12
20-24     12
25-29     10
30-34      4

data    freq
------------
5-14      5     [combined classes in pairs]
15-24     24
25-34     14

The formula applies directly now:

L = 14.5

d1 = 24 - 5
= 19

d2 = 24 - 14
= 10

width = 24.5 - 14.5
= 10

Mo = L + (d1/(d1 + d2))*width
= 14.5 + (19/(19 + 10))*10
= 21.05

Again, that seems to fit a little better with the derivation, but I don't think it makes much difference, since there is really no "correct" mode anyway! Your answer is not necessarily a bad one.

If anyone reading this knows an original source for the formula that gives a solid foundation for it, rather than just an ad-hoc linear interpolation, I would love to know.

### 33 thoughts on “Finding the Mode of Grouped Data”

1. I Was Wondering, How To Find The Mode If The Modal Class Is The Extreme First Or The Extreme Last Because In These Cases We Do Not Find The Frequency Of Both It’s Neighbours ? I Read In Class X And Found The Formula In My Maths Book In The ‘STATISTICS’ Chapter.

1. As I said in the post above, “If there is no class on the left, then you can imagine a class with frequency zero. Then the formula applies easily.” The same applies when there is no class on the right.

This was, in fact, the context in which I first came across this formula; in the quoted answer, Different Formulas for Calculating Mode, I introduced my reference to an earlier discussion by saying, “I was asked about this formula a year ago, with specific reference to the case where the modal class is the first class. I had not seen the formula previously, but could see how it arose”.

Many students seem to have the same question!

1. In this case, although the actual data may have a mode, you have no information with which to even guess at it, so you can’t identify a mode.

The situation is essentially what is described in these two answers at Ask Dr. Math, though they do not refer to grouped data:

These point out that you can either say that there is no mode, or that everything is a mode; but either way, you can’t identify “the” mode.

1. Chengetai chiwanza

What do u do if the mode u find is lower than the lower boundary of the mode class

2. In all of the following cases:
1. Begining class interval has highest frequency
2.last class interval has highest frequency
3. Two are more classes have same maximum frequency i.e bi-modal or multimodal
Correct way of finding mode is by Grouping method. Here we create 6 columns for frequency including 1st column as original frequency. Col2 is addition of 2-frequencies,col3 is obtained by adding 2frequencies leaving 1st. Col4 by adding 3frequencies, col5 by adding 3freq leaving 1st,col6 by adding 3freq leaving 1st 2. Now starting from col1 we take maximum freqiency from each column. Then we write numbers/classes against these freq which contributed to this max freq. Now count the number/class which occcurs maximum times is the mode/modal class. Then for ungrouped data that number is mode and for grouped it is modal class then apply formula. Please refer sc gupta and vk kapoor mode section for example on this.

1. Thanks for the reference. I don’t have access to Indian books, but have long been aware that many questions on this topic come from India, and have been interested in seeing how it is taught there. I was able to find this book, Fundamentals of Mathematical Statistics: a Modern Approach, available online, and read what is said in section 2.7. (The edition I found may not be identical with yours, however.)

Your answer, unfortunately, is not directly relevant to the question, as it is a method for finding something they call a mode in the case of a discrete (ungrouped) frequency distribution, not for grouped data. And even in that case, it amounts to redefining “mode”, and appears to be merely a way to be able to identify some “most frequent” value in cases where there really is no mode. The fact that it is called “the method of grouping” can easily lead to confusion with grouped distributions.

The authors then move on to continuous frequency distributions, giving the formula that is the focus of this post. I have long wanted to find a proper derivation of the formula, so when I saw the heading “Derivation of the mode formula (2-7)”, I was hopeful. But that turns out to be just an explanation of the formula, similar to my own, based on an unjustified assumption that the intersection of the two lines in the picture should be taken as the location of the mode. I am still hoping someone will refer me to an advanced treatment of the formula that will prove that it is the best approximation of the actual mode under suitable conditions, such as normality.

I see no mention in the book of how to apply the formula in the cases that have been asked about (grouped data with no one modal class, or no neighboring class on one side), so that question, too, is still open. Applying the “method of grouping” to grouped data could, in fact, lead to nonsense, as the “modal class” you find might not be greater than those on either side, making the formula impossible to apply. The formula only makes sense when applied to a class that is actually greater than its neighbors.

1. I would certainly hope that anyone who teaches a course in statistics would know the subject.

It happens, however, that I am not a statistician; where I come from, mathematics and statistics are thought of as distinct fields. That’s why I don’t teach even an introductory statistics course, even though I did take one graduate-level course in the subject. I would want to be thoroughly comfortable with the subject well beyond what is taught in the specific course.

On the other hand, this particular material interests me because it is so often taught without giving any mathematical justification, and I think that looking at it from a mathematical perspective is important.

2. If in a given grouped data there is no preceding class then value of f(zero)?, f(one)=is frequency of modal class,
f(two)=is frequency of class succeeding the modal class.
Then,f(zero)=?

1. I think your question is essentially the same as the first comment above (from October 3, 2019). See my answer there.

If the modal class is the first class, then take the preceding class as having frequency zero, f_{m-1} = 0.

2. Danayn Servillon

Is is possible for the actual mode of the distribution to be found outside the modal class?

1. Yes. If the actual data within the modal class happen to be distributed among the values in that class, while the data within a less common class happen to all have the same value, for example, this could happen.

3. Christian Baqueros

Hi Sir, how do I compute the mode of this data having two equal highest number of frequency:
X f
10 – 13 4
14 – 17 3
18 – 21 5
22 – 25 11
26 – 29 3
30 – 33 6
34 – 37 11
38 – 41 7
n = 40

Please reply ASAP. Thanks. Btw, I have the date of all scores of the students. Is it necessary to use the formula of the grouped data when you can have solved thru the ungrouped data?

1. This is clearly bimodal, so it would be wrong to give a single mode. You can apply the formula separately to each modal class to obtain two modes; but since the formula seems to be intended to be used for approximately normal distributions, the result would not necessarily be significant.

But if you have the actual data, then absolutely you should use that rather than grouped data for most purposes, as grouping loses information. Any formula for mode, median, etc. of grouped data is meant only as an approximation to the actual statistics, which are obtained from the original data. (Grouped data can be used for histograms and the like; but even then, a change in grouping can greatly change your results.)

4. christian baqueros

X f
10 – 13 4
14 – 17 3
18 – 21 5
22 – 25 11
26 – 29 3
30 – 33 6
34 – 37 11
38 – 41 7

25 34 12 23 33
21 25 10 18 40
36 29 22 15 36
22 33 11 28 38
16 25 35 36 40
29 10 28 38 35
30 18 40 36 16
34 36 35 33 19
33 38 38 22 25
25 24 19 30 36

1. Looking closely at the actual data (sorting it, and trying various histograms in addition to the grouping you show), I see that the actual mode is 36 (occurs 6 times), but 25 occurs 5 times, and 33 4 times. In your grouping, the modal classes are 22-25 and 34-37, which include the two most common individual values, but not the third. If I vary the grouping, it is clear that the data are quite erratic (which is typical of fairly small populations); at width 5, it no longer looks bimodal. In my mind, the histogram itself is far more meaningful than any one or two numbers you could give.

5. How to find the rough mode of this grouped data?
 ===================================================== Class Intervals Class Midpoint f < F ----------------------------------------------------- 110 – 116 113 2 170 103 – 109 106 2 168 96 – 102 99 1 166 89 – 95 92 3 165 82 – 88 85 1 162 75 – 81 78 0 161 68 – 74 71 2 161 61 – 67 64 7 159 54 – 60 57 7 152 47 – 53 50 11 145 40 – 46 43 30 134 33 – 39 36 30 104 26 – 32 29 30 74 19 – 25 22 23 44 12 – 18 15 16 21 5 – 11 8 5 5 

1. Since 3 successive classes have the same frequency, as you have presumably observed, the formula yields an indeterminate 33 + 0/0*7.

I’d put the estimated mode at the midpoint of the middle class, namely at 36, since classes further out will not have much effect.

But one could do as I did in the post with two consecutive equal classes, and combine groups so that the three equal classes form one class; then the three classes we use have frequencies 47, 90, 25, and the formula gives 34.4, just a little less than the midpoint, as I would expect.

6. Class Interval Frequency
0 78
1-12 26
13-24 120
25-36 127
37-48 74
>49 16
How to find mean, median, mode, standard deviation & IQR for the above data ?

1. Ultimately, I don’t think you can really do it.

First, you have to understand that any numbers you get from a grouped distribution like this will be only an approximation; but in particular, the open-ended final class is very problematic for the mean (and therefore the standard deviation), as explained in Grouped Data: Open-ended Classes? Otherwise, you can see Mean and Standard Deviation of Grouped Data.

For the median, you can use the estimation formula discussed in Finding the Median of Grouped Data. You can extend that method to find the quartiles, and hence the IQR. If you need help doing that, I suggest writing to us at https://www.themathdoctors.org/ask/, where we can give it more attention (and perhaps eventually make the answer a blog post).

For the mode, despite the fact that all classes are not the same width, you should be able to use the formula in this post. Have you tried? On the other hand, since the first class contains only one value and has such a high frequency, I think there is very little doubt that the actual mode is 0.

7. what happens if you have a table with grouped and ungrouped data? for example
1 29
2 45
3 48
4 93
5 28
6 21
7-10 12
11-15 5

1. I would describe this simply as having variable-sized classes, most of which have width 1. Since the wider classes at the end have smaller frequencies than the others, they have no effect on the mode. The mode is just the single value with the highest frequency, namely 4.

1. See the series of comments on June 9. The quick answer is, yes, you can, but there is no guarantee that the numbers you get are actual modes. (In fact, the same can always be said, since the formula is only an estimate.) The more a distribution deviates from normal, the less meaningful the formula becomes.

8. I don’t really understand how this works:

mode – L1 d1
——— = ——-
Width d1 + d2

Being “proportional” makes sense but where exactly does this even come from? You mentioned linear interpolation but I’m failing to see how that is applied here. Can you please explain a bit more? (I know that it only gives an estimate, I’m just trying to figure out where it even comes from)

1. Keep in mind that I was not deriving the formula from some theoretical basis, but trying to understand a formula I had been given.

The equation (mode – L1)/Width = d1/(d1 + d2) came from rearranging the formula I was analyzing, namely mode = L1 + d1/(d1 + d2) * Width. I did this because I saw that it would make the equation look like a proportion.

Then I looked at why they (that is, whoever invented this formula) would want that particular proportion, and saw that the left-hand side, (mode – L1)/Width, is the fraction of the width of a class at which the mode is, and the right-hand side, d1/(d1 + d2), is another ratio of part to whole.

So the formula finds a number, “mode”, that is as far across the class as the left-hand difference is of the total difference. That is, by definition, a “linear interpolation”: a certain fraction of the way between two known points. Why do that? That’s what I can’t prove to be “correct”, only say that it feels reasonable, as I explained later in terms of histograms. You want the mode to be closer to the side where the histogram drops less, and a linear interpolation based on the ratio of the drops accomplishes that.

Another way to look at the proportion is to see that the ratio of the distance of the mode from the left to the right side of the class equals the ratio of the drop on the left to the drop on the right. That’s what the picture of the two lines crossing on a histogram represents.

9. I didn’t understand yet how the ratio can be equal to each other.
Know that this is the only web site that at least I found that takes things to be considered more formally!!

1. Which ratios are you referring to? And are you saying you don’t believe it is possible that they are equal, or that you don’t understand how it is known?

If you are referring to the ratios mentioned in my last response, it is not proved at all! It is just a reasonable guess.

This site uses Akismet to reduce spam. Learn how your comment data is processed.