229. Majority Element II

Reformulated question¶

Find all values that appear more than \(\lfloor n/3 \rfloor\) times in an array.

There can be at most 2 such values.
Aim for \(O(n)\) time and \(O(1)\) extra space.

Example:

nums = [1,2,2,3,2,1,1]
n = 7, floor(n/3) = 2
answer = [1,2]   # both appear 3 times

Key trick¶

Use the Boyer-Moore voting idea with 2 candidates.

More than \(n/3\) means at most 2 winners can exist.
First pass:
- keep 2 candidates and 2 counters
- cancel triples of distinct values
Second pass:
- verify the candidates really occur more than \(\lfloor n/3 \rfloor\)

Trap¶

Forgetting the verification pass.
- The first pass gives only possible candidates.
Using elif in the wrong order and breaking candidate replacement logic.
Forgetting that answers can have size 0, 1, or 2.
Returning duplicates when both candidate slots end up equal.

Why is this question interesting?¶

It looks like counting, but the follow-up forces a stronger invariant.

It tests whether you can generalize majority voting from \(n/2\) to \(n/3\).
It is a good example of "few possible winners, so track only a few candidates".

Solve the problem with idiomatic python¶

class Solution:
    def majorityElement(self, nums: list[int]) -> list[int]:
        # At most two values can appear more than n//3 times.
        cand1 = cand2 = None
        count1 = count2 = 0

        # First pass: find possible candidates.
        for x in nums:
            if x == cand1:
                count1 += 1
            elif x == cand2:
                count2 += 1
            elif count1 == 0:
                cand1, count1 = x, 1
            elif count2 == 0:
                cand2, count2 = x, 1
            else:
                count1 -= 1
                count2 -= 1

        # Second pass: verify.
        count1 = count2 = 0
        for x in nums:
            if x == cand1:
                count1 += 1
            elif x == cand2:
                count2 += 1

        ans = []
        limit = len(nums) // 3
        if count1 > limit:
            ans.append(cand1)
        if count2 > limit:
            ans.append(cand2)
        return ans

Pytest test¶

import pytest

@pytest.mark.parametrize(
    ("nums", "expected"),
    [
        ([3, 2, 3], [3]),
        ([1], [1]),
        ([1, 2], [1, 2]),
        ([1, 1, 1, 3, 3, 2, 2, 2], [1, 2]),
        ([2, 2], [2]),
        ([1, 2, 3, 4], []),
        ([0, 0, 0], [0]),
        ([4, 4, 4, 4, 2, 2, 2, 3, 3], [4]),
        ([-1, -1, -1, 0, 1, 2, 3], [-1]),
    ],
)
def test_majority_element(nums, expected):
    assert sorted(Solution().majorityElement(nums)) == sorted(expected)

Comment my solution¶

Your solution is correct for many cases, but it does not meet the follow-up space bound.

It uses a hash map, so space is \(O(m)\) where \(m\) is the number of distinct values.
This problem’s trick is that you never need to track all counts, only 2 candidates.
Your early append logic is fine for the map approach, but the interview-optimal answer is Boyer-Moore plus verification.

# pass test but it's not O(1) in space
class Solution:
    def majorityElement(self, nums: list[int]) -> list[int]:
        k = len(nums) // 3
        counts = {}
        ans = []
        for num in nums:
            if num in counts:
                if counts[num] <= k:
                    counts[num] += 1
                    if counts[num] > k:
                        ans.append(num)
            else:
                counts[num] = 1
                if k == 0:
                    ans.append(num)
        return ans

Extra¶

In my mind I was confusing time linearity and 1 pass. I also didn't understand why the first loop picks the right candidates. So I asked AI for helped.

One passe and time linearity¶

Yes - 2 passes is still linear time.

If one pass costs about n, and another pass costs about n, total is:

n + n = 2n

In Big-O, constant factors are ignored, so:

O(2n) = O(n)

More generally:
- 3 full passes over the array is still O(n)
- 100n is still O(n)
- n + 5 is still O(n)

It stops being linear when the work grows faster than proportional to n.

Examples:
- one loop over array: O(n)
- two separate loops over array: O(n)
- nested loop over array inside array: O(n^2)
- divide array in half recursively with total work proportional to all elements once per level: maybe O(n log n)

For interviews, "linear time" usually means:

the total number of operations is bounded by c * n + b for constants c, b
so a fixed number of passes is completely fine

Why the first pass works¶

The key fact is:

If an element appears more than n/3 times, there can be at most 2 such elements.

Because:

3 different values each appearing more than n/3 times would total more than n, impossible.

So it is enough to track 2 candidates.

Intuition of the cancellation¶

When you see a new number that matches neither candidate, and both counters are positive, you do:

count1 -= 1
count2 -= 1

You can think of that as removing:

one occurrence of candidate 1
one occurrence of candidate 2
the current new value

So you cancel a group of 3 distinct values.

That is the whole trick.

Why this does not lose a real answer¶

Suppose some value x appears more than n/3 times.

Every cancellation removes at most one x.
But each cancellation also removes 2 non-x elements.
So x cannot be completely wiped out by these triple cancellations, because there are not enough non-x elements to pair against all of its occurrences.

A cleaner way to say it:

Repeatedly delete triples of distinct values.
Deleting such triples does not change which values occur more than one third of the time.
After all possible deletions, at most 2 values can remain.
Those remaining possible winners are exactly what the first pass tracks.

Why the candidates from pass 1 are only "possible"¶

The first pass guarantees:

every true answer must be among cand1 and cand2

It does not guarantee:

both candidates are valid answers

Example:

[1,2,3,4]

You may end with some candidates, but no number appears more than 4//3 = 1 times.

So the second pass is required to count and verify.

Conditions that make this correct¶

These are the important ones:

Threshold is > n/3
Therefore there are at most 2 valid answers
The algorithm maintains at most 2 candidate slots
When a third different value appears and both slots are occupied, cancel all three
A true majority-over-n/3 cannot be fully eliminated by this process
Therefore every true answer survives as one of the final candidates
Final verification confirms which candidates really pass the threshold

Short mental model¶

Use this mental picture:

for > n/2, track 1 candidate
for > n/3, track 2 candidates
for > n/k, track k-1 candidates

Because there can be at most k-1 such frequent elements.

Tiny example¶

nums = [1,2,1,3,1,2,2]
threshold = floor(7/3) = 2
answers should be [1,2]

Cancellation view:

remove (1,2,3) once
left with [1,1,2,2]

Now the only possible winners are clearly 1 and 2.

That is what the first pass is simulating, without actually deleting elements.