Skip to content

929. Unique Email Addresses

On LeetCode ->

Reformulated question

Given email strings, normalize each one by:

  • Removing . from the local part
  • Ignoring everything after the first + in the local part
  • Leaving the domain part unchanged

Return how many distinct normalized emails remain.

Compact example:

["a.b+c@x.com", "ab@x.com", "a@x.com"] -> 2
normalize:
- "a.b+c@x.com" -> "ab@x.com"
- "ab@x.com"    -> "ab@x.com"
- "a@x.com"     -> "a@x.com"

Key trick

Normalize only the local part, then insert the canonical email into a set.

Trap

  • Applying . or + rules to the domain part
  • Ignoring after every + instead of the first one
  • Building the canonical address incorrectly around @
  • Using repeated string concatenation in a loop when cleaner slicing works

Why is this question interesting?

It is a small parsing problem that tests whether you can:

  • Read rules precisely
  • Separate local and domain logic
  • Turn normalization into deduplication with a set

Solve the problem with idiomatic python

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        seen = set()

        for email in emails:
            # Split once into local and domain.
            local, domain = email.split("@")

            # Ignore everything after the first '+'.
            local = local.split("+", 1)[0]

            # Dots in local do not matter.
            local = local.replace(".", "")

            # Domain stays unchanged.
            seen.add(f"{local}@{domain}")

        return len(seen)

If you want the same idea as a one-liner style:

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        return len(
            {
                f"{local.split('+', 1)[0].replace('.', '')}@{domain}"
                for local, domain in (email.split("@") for email in emails)
            }
        )

Pytest test

import pytest

@pytest.mark.parametrize(
    ("emails", "expected"),
    [
        (
            [
                "test.email+alex@leetcode.com",
                "test.e.mail+bob.cathy@leetcode.com",
                "testemail+david@lee.tcode.com",
            ],
            2,
        ),
        (
            [
                "a@leetcode.com",
                "b@leetcode.com",
                "c@leetcode.com",
            ],
            3,
        ),
        (
            [
                "a.b@x.com",
                "ab@x.com",
            ],
            1,
        ),
        (
            [
                "ab+one@x.com",
                "ab+two@x.com",
                "ab@x.com",
            ],
            1,
        ),
        (
            [
                "a.b+c@x.com",
                "ab+d@x.com",
                "a+b@x.com",
            ],
            2,
        ),
        (
            [
                "x@y.com",
            ],
            1,
        ),
    ],
)
def test_num_unique_emails(emails, expected):
    assert Solution().numUniqueEmails(emails) == expected

Comment my solution

Your solution is correct in idea and passes the rules, but it is more manual than needed.

  • Good:

    • Correctly stops at +
    • Correctly ignores .
    • Correctly keeps the domain unchanged
    • Uses a set
  • Main bug:

    • domain = email[i:] makes the result contain two @ characters because email[i:] already starts with @, and then you add another one.
  • Example of the bug:

    email = "a@x.com"
    local = "a"
    domain = "@x.com"
    local + "@" + domain = "a@@x.com"
    

  • Simpler approach:

    • Split once at @
    • Normalize only local
    • Rebuild with one @
  • Also cleaner:

    • Prefer split("+", 1) and replace(".", "") over char-by-char parsing here
## Solution
class Solution:
    def numUniqueEmails(self, emails: List[str]) -> int:
        canonical_emails = set()
        for email in emails:
            i = 0
            local = ""
            while email[i] != "@":
                ch = email[i]
                if ch == "+":
                    while email[i] != "@":
                        i += 1
                    break
                elif ch != ".":
                    local += ch
                    i += 1
                else:
                    i += 1

            domain = email[i:]
            canonical_emails.add(local + "@" + domain)
        return len(canonical_emails)

Extra

Are you sure that using split and replace rather that doing a char-by-char parsing is the right interview answer? Because, it seems they are testing exactly this "simple one pass parsing". What do you think?


Yes - for this problem, split + replace is usually a very good interview answer.

Short version

Interviewers usually test:

  • whether you understood the rules exactly
  • whether you can separate local vs domain logic
  • whether you know to use a set for deduplication
  • whether you can write clean, correct code fast

They are not usually testing whether you can manually parse every string character-by-character unless:

  • they explicitly ask for one-pass parsing
  • the language utilities would hide important complexity
  • the input is huge and constant-factor optimization matters

Why split/replace is often better

It shows good engineering judgment:

  • use built-ins for simple string transformations
  • make the code shorter
  • reduce bug surface
  • make intent obvious

Your normalization logic becomes almost identical to the English statement:

  • split at @
  • cut local at first +
  • remove . from local
  • keep domain
  • add to set

That is excellent interview code.

What you may be missing

A lot of candidates assume "manual parsing = more algorithmic = better."

That is not always true.

In interviews, "better" usually means:

  • correct
  • easy to explain
  • easy to verify
  • hard to break
  • written quickly

For this question, built-ins help on all five.

Is char-by-char parsing ever better here?

Yes, but mostly in these situations:

  • the interviewer says "do it in one pass"
  • they ask for no extra intermediate strings
  • they want you to discuss constant factors
  • they are pushing beyond the easy version

Then a manual parser is a nice follow-up.

Complexity nuance

Both approaches are effectively linear in the total input size.

With built-ins:

  • split("@")
  • split("+", 1)
  • replace(".", "")

Each scans part of the string, so it is still \(O(n)\) per email.

With manual parsing:

  • one pass over the email
  • also \(O(n)\) per email

So asymptotically, there is no improvement.

The real difference is:

  • manual parsing may reduce constant factors and temporary strings
  • built-ins improve readability and reliability

For LeetCode-sized constraints, readability wins.

The real trap in interviews

Sometimes candidates over-optimize too early.

If you jump straight to manual parsing, you risk:

  • index bugs
  • off-by-one bugs
  • mishandling @
  • mishandling first +
  • making code harder to explain

You actually had exactly this kind of bug with the domain reconstruction.

That is a strong signal that the cleaner version is safer.

Best interview strategy

A strong answer is often:

  • first give the simple built-in solution
  • state complexity clearly
  • mention that a manual one-pass parser is possible if they want fewer intermediate strings

Something like:

I would normalize each email by splitting local and domain, truncating local at the first plus, removing dots from local, and inserting the canonical form into a set. This is linear in total input size. If you want, I can also write a one-pass parser version, but I think the built-in version is clearer and equally optimal asymptotically.

That sounds mature.

When built-ins are a bad look

They are a worse answer if:

  • you use many chained operations without explaining cost
  • you rely on regex for a simple problem
  • you use obscure tricks that hurt readability
  • you ignore an interviewer hint toward parsing

So:

  • split + replace = good
  • regex golf = bad
  • unreadable comprehension-only solution = risky

My recommendation

Use the built-in version as your main interview answer.

Then be ready to say:

  • "This is \(O(\text{total characters})\)."
  • "A one-pass parser is possible, but I prefer this for clarity unless you want the lower-level version."

That is usually the best balance of correctness, clarity, and interview judgment.