929. Unique Email Addresses
On LeetCode ->Reformulated question¶
Given email strings, normalize each one by:
- Removing
.from the local part - Ignoring everything after the first
+in the local part - Leaving the domain part unchanged
Return how many distinct normalized emails remain.
Compact example:
["a.b+c@x.com", "ab@x.com", "a@x.com"] -> 2
normalize:
- "a.b+c@x.com" -> "ab@x.com"
- "ab@x.com" -> "ab@x.com"
- "a@x.com" -> "a@x.com"
Key trick¶
Normalize only the local part, then insert the canonical email into a set.
Trap¶
- Applying
.or+rules to the domain part - Ignoring after every
+instead of the first one - Building the canonical address incorrectly around
@ - Using repeated string concatenation in a loop when cleaner slicing works
Why is this question interesting?¶
It is a small parsing problem that tests whether you can:
- Read rules precisely
- Separate local and domain logic
- Turn normalization into deduplication with a
set
Solve the problem with idiomatic python¶
class Solution:
def numUniqueEmails(self, emails: list[str]) -> int:
seen = set()
for email in emails:
# Split once into local and domain.
local, domain = email.split("@")
# Ignore everything after the first '+'.
local = local.split("+", 1)[0]
# Dots in local do not matter.
local = local.replace(".", "")
# Domain stays unchanged.
seen.add(f"{local}@{domain}")
return len(seen)
If you want the same idea as a one-liner style:
class Solution:
def numUniqueEmails(self, emails: list[str]) -> int:
return len(
{
f"{local.split('+', 1)[0].replace('.', '')}@{domain}"
for local, domain in (email.split("@") for email in emails)
}
)
Pytest test¶
import pytest
@pytest.mark.parametrize(
("emails", "expected"),
[
(
[
"test.email+alex@leetcode.com",
"test.e.mail+bob.cathy@leetcode.com",
"testemail+david@lee.tcode.com",
],
2,
),
(
[
"a@leetcode.com",
"b@leetcode.com",
"c@leetcode.com",
],
3,
),
(
[
"a.b@x.com",
"ab@x.com",
],
1,
),
(
[
"ab+one@x.com",
"ab+two@x.com",
"ab@x.com",
],
1,
),
(
[
"a.b+c@x.com",
"ab+d@x.com",
"a+b@x.com",
],
2,
),
(
[
"x@y.com",
],
1,
),
],
)
def test_num_unique_emails(emails, expected):
assert Solution().numUniqueEmails(emails) == expected
Comment my solution¶
Your solution is correct in idea and passes the rules, but it is more manual than needed.
-
Good:
- Correctly stops at
+ - Correctly ignores
. - Correctly keeps the domain unchanged
- Uses a
set
- Correctly stops at
-
Main bug:
domain = email[i:]makes the result contain two@characters becauseemail[i:]already starts with@, and then you add another one.
-
Example of the bug:
-
Simpler approach:
- Split once at
@ - Normalize only
local - Rebuild with one
@
- Split once at
-
Also cleaner:
- Prefer
split("+", 1)andreplace(".", "")over char-by-char parsing here
- Prefer
## Solution
class Solution:
def numUniqueEmails(self, emails: List[str]) -> int:
canonical_emails = set()
for email in emails:
i = 0
local = ""
while email[i] != "@":
ch = email[i]
if ch == "+":
while email[i] != "@":
i += 1
break
elif ch != ".":
local += ch
i += 1
else:
i += 1
domain = email[i:]
canonical_emails.add(local + "@" + domain)
return len(canonical_emails)
Extra¶
Are you sure that using split and replace rather that doing a
char-by-char parsing is the right interview answer? Because, it seems
they are testing exactly this "simple one pass parsing". What do you
think?
Yes - for this problem, split + replace is usually a very good interview answer.
Short version¶
Interviewers usually test:
- whether you understood the rules exactly
- whether you can separate local vs domain logic
- whether you know to use a
setfor deduplication - whether you can write clean, correct code fast
They are not usually testing whether you can manually parse every string character-by-character unless:
- they explicitly ask for one-pass parsing
- the language utilities would hide important complexity
- the input is huge and constant-factor optimization matters
Why split/replace is often better¶
It shows good engineering judgment:
- use built-ins for simple string transformations
- make the code shorter
- reduce bug surface
- make intent obvious
Your normalization logic becomes almost identical to the English statement:
- split at
@ - cut local at first
+ - remove
.from local - keep domain
- add to set
That is excellent interview code.
What you may be missing¶
A lot of candidates assume "manual parsing = more algorithmic = better."
That is not always true.
In interviews, "better" usually means:
- correct
- easy to explain
- easy to verify
- hard to break
- written quickly
For this question, built-ins help on all five.
Is char-by-char parsing ever better here?¶
Yes, but mostly in these situations:
- the interviewer says "do it in one pass"
- they ask for no extra intermediate strings
- they want you to discuss constant factors
- they are pushing beyond the easy version
Then a manual parser is a nice follow-up.
Complexity nuance¶
Both approaches are effectively linear in the total input size.
With built-ins:
split("@")split("+", 1)replace(".", "")
Each scans part of the string, so it is still \(O(n)\) per email.
With manual parsing:
- one pass over the email
- also \(O(n)\) per email
So asymptotically, there is no improvement.
The real difference is:
- manual parsing may reduce constant factors and temporary strings
- built-ins improve readability and reliability
For LeetCode-sized constraints, readability wins.
The real trap in interviews¶
Sometimes candidates over-optimize too early.
If you jump straight to manual parsing, you risk:
- index bugs
- off-by-one bugs
- mishandling
@ - mishandling first
+ - making code harder to explain
You actually had exactly this kind of bug with the domain reconstruction.
That is a strong signal that the cleaner version is safer.
Best interview strategy¶
A strong answer is often:
- first give the simple built-in solution
- state complexity clearly
- mention that a manual one-pass parser is possible if they want fewer intermediate strings
Something like:
I would normalize each email by splitting local and domain, truncating local at the first plus, removing dots from local, and inserting the canonical form into a set. This is linear in total input size. If you want, I can also write a one-pass parser version, but I think the built-in version is clearer and equally optimal asymptotically.
That sounds mature.
When built-ins are a bad look¶
They are a worse answer if:
- you use many chained operations without explaining cost
- you rely on regex for a simple problem
- you use obscure tricks that hurt readability
- you ignore an interviewer hint toward parsing
So:
split+replace= good- regex golf = bad
- unreadable comprehension-only solution = risky
My recommendation¶
Use the built-in version as your main interview answer.
Then be ready to say:
- "This is \(O(\text{total characters})\)."
- "A one-pass parser is possible, but I prefer this for clarity unless you want the lower-level version."
That is usually the best balance of correctness, clarity, and interview judgment.