49. Group Anagrams

Reformulated question¶

Group strings that are anagrams of each other.

Input: list[str]
Output: list[list[str]]
Rule: words belong to the same group if their letters and counts match; output order does not matter.

Example:

in:  ["eat","tea","tan","ate","nat","bat"]
out: [["eat","tea","ate"],["tan","nat"],["bat"]]

Key trick¶

Use a canonical key for each word so all anagrams map to the same bucket.

Simple key:
- sort the letters
Example:
- "eat" -> "aet"
- "tea" -> "aet"

Then group with a dictionary from key to list of words.

Trap¶

Comparing output directly without normalizing order.
Forgetting that:
- group order does not matter
- word order inside each group does not matter
Using sorting per word is fine here, but not noticing there is also a linear-per-word frequency-count key.

Why this question is interesting¶

It tests a very common interview pattern:

turn each item into a canonical representation
use hashing to group efficiently

It also checks whether you separate the problem's real requirement from irrelevant output ordering.

Solution with idiomatic Python¶

class Solution:
    def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
        # Map canonical form -> all words with that form.
        groups: dict[str, list[str]] = {}

        for s in strs:
            # All anagrams have the same sorted-letter representation.
            key = "".join(sorted(s))

            if key not in groups:
                groups[key] = []

            groups[key].append(s)

        return list(groups.values())

Alternative with defaultdict:

from collections import defaultdict


class Solution:
    def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
        groups = defaultdict(list)

        for s in strs:
            key = "".join(sorted(s))
            groups[key].append(s)

        return list(groups.values())

Pytest test¶

import pytest


def normalize(groups: list[list[str]]) -> list[list[str]]:
    return sorted([sorted(group) for group in groups])


@pytest.mark.parametrize(
    ("strs", "expected"),
    [
        (
            ["eat", "tea", "tan", "ate", "nat", "bat"],
            [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]],
        ),
        (
            [""],
            [[""]],
        ),
        (
            ["a"],
            [["a"]],
        ),
        (
            ["", ""],
            [["", ""]],
        ),
        (
            ["ab", "ba", "abc", "cab", "bac", "foo"],
            [["ab", "ba"], ["abc", "cab", "bac"], ["foo"]],
        ),
        (
            ["zzz", "zz", "z", "zz"],
            [["zzz"], ["zz", "zz"], ["z"]],
        ),
    ],
)
def test_groupAnagrams(strs, expected):
    assert normalize(Solution().groupAnagrams(strs)) == normalize(expected)

Comment on your solution¶

Your solution is good and interview-valid.

What is good:
- correct idea
- clear helper for canonical key
- good use of a dictionary
Small simplifications:
- inline anagram_id if used once
- return list(groups.values()) is enough
Complexity:
- \(O(n \cdot k \log k)\) time, where \(k\) is average word length
- \(O(n \cdot k)\) extra space

## Solution
class Solution:
    def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
        def anagram_id(s):
            return "".join(sorted(s))
        groups = {}
        for s in strs:
            ana_id = anagram_id(s)
            if ana_id in groups:
                groups[ana_id].append(s)
            else:
                groups[ana_id] = [s]
        return [list(anagrams) for anagrams in groups.values()]

Solution().groupAnagrams(["eat","tea","tan","ate","nat","bat"]) # [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Solution().groupAnagrams(["", ""]) # [['', '']]

Slightly cleaned-up version of your approach:

class Solution:
    def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
        groups: dict[str, list[str]] = {}

        for s in strs:
            key = "".join(sorted(s))
            groups.setdefault(key, []).append(s)

        return list(groups.values())