Skip to content

229. Majority Element II

On LeetCode ->

Reformulated question

Find all values that appear more than \(\lfloor n/3 \rfloor\) times in an array.

  • There can be at most 2 such values.
  • Aim for \(O(n)\) time and \(O(1)\) extra space.

Example:

nums = [1,2,2,3,2,1,1]
n = 7, floor(n/3) = 2
answer = [1,2]   # both appear 3 times

Key trick

Use the Boyer-Moore voting idea with 2 candidates.

  • More than \(n/3\) means at most 2 winners can exist.
  • First pass:
    • keep 2 candidates and 2 counters
    • cancel triples of distinct values
  • Second pass:
    • verify the candidates really occur more than \(\lfloor n/3 \rfloor\)

Trap

  • Forgetting the verification pass.
    • The first pass gives only possible candidates.
  • Using elif in the wrong order and breaking candidate replacement logic.
  • Forgetting that answers can have size 0, 1, or 2.
  • Returning duplicates when both candidate slots end up equal.

Why is this question interesting?

It looks like counting, but the follow-up forces a stronger invariant.

  • It tests whether you can generalize majority voting from \(n/2\) to \(n/3\).
  • It is a good example of "few possible winners, so track only a few candidates".

Solve the problem with idiomatic python

class Solution:
    def majorityElement(self, nums: list[int]) -> list[int]:
        # At most two values can appear more than n//3 times.
        cand1 = cand2 = None
        count1 = count2 = 0

        # First pass: find possible candidates.
        for x in nums:
            if x == cand1:
                count1 += 1
            elif x == cand2:
                count2 += 1
            elif count1 == 0:
                cand1, count1 = x, 1
            elif count2 == 0:
                cand2, count2 = x, 1
            else:
                count1 -= 1
                count2 -= 1

        # Second pass: verify.
        count1 = count2 = 0
        for x in nums:
            if x == cand1:
                count1 += 1
            elif x == cand2:
                count2 += 1

        ans = []
        limit = len(nums) // 3
        if count1 > limit:
            ans.append(cand1)
        if count2 > limit:
            ans.append(cand2)
        return ans

Pytest test

import pytest

@pytest.mark.parametrize(
    ("nums", "expected"),
    [
        ([3, 2, 3], [3]),
        ([1], [1]),
        ([1, 2], [1, 2]),
        ([1, 1, 1, 3, 3, 2, 2, 2], [1, 2]),
        ([2, 2], [2]),
        ([1, 2, 3, 4], []),
        ([0, 0, 0], [0]),
        ([4, 4, 4, 4, 2, 2, 2, 3, 3], [4]),
        ([-1, -1, -1, 0, 1, 2, 3], [-1]),
    ],
)
def test_majority_element(nums, expected):
    assert sorted(Solution().majorityElement(nums)) == sorted(expected)

Comment my solution

Your solution is correct for many cases, but it does not meet the follow-up space bound.

  • It uses a hash map, so space is \(O(m)\) where \(m\) is the number of distinct values.
  • This problem’s trick is that you never need to track all counts, only 2 candidates.
  • Your early append logic is fine for the map approach, but the interview-optimal answer is Boyer-Moore plus verification.
# pass test but it's not O(1) in space
class Solution:
    def majorityElement(self, nums: list[int]) -> list[int]:
        k = len(nums) // 3
        counts = {}
        ans = []
        for num in nums:
            if num in counts:
                if counts[num] <= k:
                    counts[num] += 1
                    if counts[num] > k:
                        ans.append(num)
            else:
                counts[num] = 1
                if k == 0:
                    ans.append(num)
        return ans

Extra

In my mind I was confusing time linearity and 1 pass. I also didn't understand why the first loop picks the right candidates. So I asked AI for helped.


One passe and time linearity

Yes - 2 passes is still linear time.

  • If one pass costs about n, and another pass costs about n, total is:
n + n = 2n
  • In Big-O, constant factors are ignored, so:
O(2n) = O(n)
  • More generally:
    • 3 full passes over the array is still O(n)
    • 100n is still O(n)
    • n + 5 is still O(n)

It stops being linear when the work grows faster than proportional to n.

  • Examples:
    • one loop over array: O(n)
    • two separate loops over array: O(n)
    • nested loop over array inside array: O(n^2)
    • divide array in half recursively with total work proportional to all elements once per level: maybe O(n log n)

For interviews, "linear time" usually means:

  • the total number of operations is bounded by c * n + b for constants c, b
  • so a fixed number of passes is completely fine

Why the first pass works

The key fact is:

  • If an element appears more than n/3 times, there can be at most 2 such elements.

Because:

  • 3 different values each appearing more than n/3 times would total more than n, impossible.

So it is enough to track 2 candidates.

Intuition of the cancellation

When you see a new number that matches neither candidate, and both counters are positive, you do:

count1 -= 1
count2 -= 1

You can think of that as removing:

  • one occurrence of candidate 1
  • one occurrence of candidate 2
  • the current new value

So you cancel a group of 3 distinct values.

That is the whole trick.

Why this does not lose a real answer

Suppose some value x appears more than n/3 times.

  • Every cancellation removes at most one x.
  • But each cancellation also removes 2 non-x elements.
  • So x cannot be completely wiped out by these triple cancellations, because there are not enough non-x elements to pair against all of its occurrences.

A cleaner way to say it:

  • Repeatedly delete triples of distinct values.
  • Deleting such triples does not change which values occur more than one third of the time.
  • After all possible deletions, at most 2 values can remain.
  • Those remaining possible winners are exactly what the first pass tracks.

Why the candidates from pass 1 are only "possible"

The first pass guarantees:

  • every true answer must be among cand1 and cand2

It does not guarantee:

  • both candidates are valid answers

Example:

[1,2,3,4]

You may end with some candidates, but no number appears more than 4//3 = 1 times.

So the second pass is required to count and verify.

Conditions that make this correct

These are the important ones:

  • Threshold is > n/3
  • Therefore there are at most 2 valid answers
  • The algorithm maintains at most 2 candidate slots
  • When a third different value appears and both slots are occupied, cancel all three
  • A true majority-over-n/3 cannot be fully eliminated by this process
  • Therefore every true answer survives as one of the final candidates
  • Final verification confirms which candidates really pass the threshold

Short mental model

Use this mental picture:

  • for > n/2, track 1 candidate
  • for > n/3, track 2 candidates
  • for > n/k, track k-1 candidates

Because there can be at most k-1 such frequent elements.

Tiny example

nums = [1,2,1,3,1,2,2]
threshold = floor(7/3) = 2
answers should be [1,2]

Cancellation view:

  • remove (1,2,3) once
  • left with [1,1,2,2]

Now the only possible winners are clearly 1 and 2.

That is what the first pass is simulating, without actually deleting elements.