929. Unique Email Addresses

Reformulated question¶

Given email strings, normalize each one by:

Removing . from the local part
Ignoring everything after the first + in the local part
Leaving the domain part unchanged

Return how many distinct normalized emails remain.

Compact example:

["a.b+c@x.com", "ab@x.com", "a@x.com"] -> 2
normalize:
- "a.b+c@x.com" -> "ab@x.com"
- "ab@x.com"    -> "ab@x.com"
- "a@x.com"     -> "a@x.com"

Key trick¶

Normalize only the local part, then insert the canonical email into a set.

Trap¶

Applying . or + rules to the domain part
Ignoring after every + instead of the first one
Building the canonical address incorrectly around @
Using repeated string concatenation in a loop when cleaner slicing works

Why is this question interesting?¶

It is a small parsing problem that tests whether you can:

Read rules precisely
Separate local and domain logic
Turn normalization into deduplication with a set

Solve the problem with idiomatic python¶

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        seen = set()

        for email in emails:
            # Split once into local and domain.
            local, domain = email.split("@")

            # Ignore everything after the first '+'.
            local = local.split("+", 1)[0]

            # Dots in local do not matter.
            local = local.replace(".", "")

            # Domain stays unchanged.
            seen.add(f"{local}@{domain}")

        return len(seen)

If you want the same idea as a one-liner style:

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        return len(
            {
                f"{local.split('+', 1)[0].replace('.', '')}@{domain}"
                for local, domain in (email.split("@") for email in emails)
            }
        )

Pytest test¶

import pytest

@pytest.mark.parametrize(
    ("emails", "expected"),
    [
        (
            [
                "test.email+alex@leetcode.com",
                "test.e.mail+bob.cathy@leetcode.com",
                "testemail+david@lee.tcode.com",
            ],
            2,
        ),
        (
            [
                "a@leetcode.com",
                "b@leetcode.com",
                "c@leetcode.com",
            ],
            3,
        ),
        (
            [
                "a.b@x.com",
                "ab@x.com",
            ],
            1,
        ),
        (
            [
                "ab+one@x.com",
                "ab+two@x.com",
                "ab@x.com",
            ],
            1,
        ),
        (
            [
                "a.b+c@x.com",
                "ab+d@x.com",
                "a+b@x.com",
            ],
            2,
        ),
        (
            [
                "x@y.com",
            ],
            1,
        ),
    ],
)
def test_num_unique_emails(emails, expected):
    assert Solution().numUniqueEmails(emails) == expected

Comment my solution¶

Your solution is correct in idea and passes the rules, but it is more manual than needed.

Good:
- Correctly stops at +
- Correctly ignores .
- Correctly keeps the domain unchanged
- Uses a set
Main bug:
- domain = email[i:] makes the result contain two @ characters because email[i:] already starts with @, and then you add another one.

Example of the bug:

email = "a@x.com"
local = "a"
domain = "@x.com"
local + "@" + domain = "a@@x.com"

Simpler approach:
- Split once at @
- Normalize only local
- Rebuild with one @
Also cleaner:
- Prefer split("+", 1) and replace(".", "") over char-by-char parsing here

## Solution
class Solution:
    def numUniqueEmails(self, emails: List[str]) -> int:
        canonical_emails = set()
        for email in emails:
            i = 0
            local = ""
            while email[i] != "@":
                ch = email[i]
                if ch == "+":
                    while email[i] != "@":
                        i += 1
                    break
                elif ch != ".":
                    local += ch
                    i += 1
                else:
                    i += 1

            domain = email[i:]
            canonical_emails.add(local + "@" + domain)
        return len(canonical_emails)

Extra¶

Are you sure that using split and replace rather that doing a char-by-char parsing is the right interview answer? Because, it seems they are testing exactly this "simple one pass parsing". What do you think?

Yes - for this problem, split + replace is usually a very good interview answer.

Short version¶

Interviewers usually test:

whether you understood the rules exactly
whether you can separate local vs domain logic
whether you know to use a set for deduplication
whether you can write clean, correct code fast

They are not usually testing whether you can manually parse every string character-by-character unless:

they explicitly ask for one-pass parsing
the language utilities would hide important complexity
the input is huge and constant-factor optimization matters

Why `split`/`replace` is often better¶

It shows good engineering judgment:

use built-ins for simple string transformations
make the code shorter
reduce bug surface
make intent obvious

Your normalization logic becomes almost identical to the English statement:

split at @
cut local at first +
remove . from local
keep domain
add to set

That is excellent interview code.

What you may be missing¶

A lot of candidates assume "manual parsing = more algorithmic = better."

That is not always true.

In interviews, "better" usually means:

correct
easy to explain
easy to verify
hard to break
written quickly

For this question, built-ins help on all five.

Is char-by-char parsing ever better here?¶

Yes, but mostly in these situations:

the interviewer says "do it in one pass"
they ask for no extra intermediate strings
they want you to discuss constant factors
they are pushing beyond the easy version

Then a manual parser is a nice follow-up.

Complexity nuance¶

Both approaches are effectively linear in the total input size.

With built-ins:

split("@")
split("+", 1)
replace(".", "")

Each scans part of the string, so it is still \(O(n)\) per email.

With manual parsing:

one pass over the email
also \(O(n)\) per email

So asymptotically, there is no improvement.

The real difference is:

manual parsing may reduce constant factors and temporary strings
built-ins improve readability and reliability

For LeetCode-sized constraints, readability wins.

The real trap in interviews¶

Sometimes candidates over-optimize too early.

If you jump straight to manual parsing, you risk:

index bugs
off-by-one bugs
mishandling @
mishandling first +
making code harder to explain

You actually had exactly this kind of bug with the domain reconstruction.

That is a strong signal that the cleaner version is safer.

Best interview strategy¶

A strong answer is often:

first give the simple built-in solution
state complexity clearly
mention that a manual one-pass parser is possible if they want fewer intermediate strings

Something like:

I would normalize each email by splitting local and domain, truncating local at the first plus, removing dots from local, and inserting the canonical form into a set. This is linear in total input size. If you want, I can also write a one-pass parser version, but I think the built-in version is clearer and equally optimal asymptotically.

That sounds mature.

When built-ins are a bad look¶

They are a worse answer if:

you use many chained operations without explaining cost
you rely on regex for a simple problem
you use obscure tricks that hurt readability
you ignore an interviewer hint toward parsing

So:

split + replace = good
regex golf = bad
unreadable comprehension-only solution = risky

My recommendation¶

Use the built-in version as your main interview answer.

Then be ready to say:

"This is \(O(\text{total characters})\)."
"A one-pass parser is possible, but I prefer this for clarity unless you want the lower-level version."

That is usually the best balance of correctness, clarity, and interview judgment.