230. Kth Smallest Element in a BST

On LeetCode ->

Reformulated question¶

Given a BST root and an integer k, return the kth smallest value in the tree, where k is 1-indexed.

Example: root=[5,3,6,2,4,null,null,1], k=3 -> 3
Why: inorder traversal of a BST visits values in sorted order: [1,2,3,4,5,6]

Key trick¶

Use inorder traversal.

In a BST, left -> node -> right yields values in increasing order.
So the answer is simply the kth visited node.
You can stop early instead of traversing the whole tree.

Trap¶

Common mistakes:

Forgetting k is 1-indexed.
Doing a full traversal into a list when early stop is enough.
Using generic tree logic and missing the BST property.
Recursive inorder may be fine here, but iterative avoids recursion depth issues.

Why this question is interesting¶

It tests whether you connect a BST property to traversal order.

Easy brute force: collect and sort.
Better answer: inorder with early stop in \(O(h + k)\) time and \(O(h)\) space, where \(h\) is tree height.
Follow-up naturally leads to augmented BSTs with subtree sizes.

Solution with idiomatic Python¶

from typing import Optional


class Solution:
    def kthSmallest(self, root: Optional[TreeNode], k: int) -> int:
        # Iterative inorder traversal:
        # in a BST, inorder visits nodes in sorted order.
        stack = []
        node = root

        while stack or node:
            # Go as left as possible first.
            while node:
                stack.append(node)
                node = node.left

            node = stack.pop()
            k -= 1

            # The kth visited node is the kth smallest.
            if k == 0:
                return node.val

            # Then explore the right subtree.
            node = node.right

Pytest test¶

import pytest

from utils import tree_build
from solution import Solution


@pytest.mark.parametrize(
    ("values", "k", "expected"),
    [
        ([3, 1, 4, None, 2], 1, 1),
        ([3, 1, 4, None, 2], 2, 2),
        ([3, 1, 4, None, 2], 3, 3),
        ([5, 3, 6, 2, 4, None, None, 1], 3, 3),
        ([2, 1, 3], 1, 1),
        ([2, 1, 3], 2, 2),
        ([2, 1, 3], 3, 3),
        ([1], 1, 1),
        ([5, 3, 7, 2, 4, 6, 8], 7, 8),
    ],
)
def test_kthSmallest(values, k, expected):
    root = tree_build(values)
    assert Solution().kthSmallest(root, k) == expected

Comment on my solution¶

Your solution is correct and already good.

Strong point:
- It does iterative inorder and stops early.
What can be simplified:
- Use the standard while stack or node pattern.
- Decrement k directly instead of keeping a separate count.
- No need to special-case the initial left descent and right child handling.
Your version rewritten in the more common interview style is shorter and easier to verify.

## Solution

class Solution:
    def kthSmallest(self, root: Optional[TreeNode], k: int) -> int:
        # inorder walk of tree.  The kth in that order is the kth smallest.
        node = root
        stack = []
        while node:
            stack.append(node)
            node = node.left
        count = 0
        while stack:
            node = stack.pop()
            val = node.val
            count +=1
            if count == k:
                return val
            if node.right:
                node = node.right
                while node:
                    stack.append(node)
                    node = node.left

Extra¶

Order-statistics tree (BST augmented with subtree sizes)¶

An augmented BST is a normal BST where each node stores extra metadata.

For this follow-up, the useful metadata is:

size
- The number of nodes in the subtree rooted at that node.
- Including the node itself.

Example:

        5(size=8)
       /         \
   3(size=4)    6(size=3)
   /     \           \
2(size=2) 4(1)      7(size=2)
 /                     \
1(1)                   8(1)

At node `3`:

- left subtree has size `2`
- so:
    - the 1st and 2nd smallest are in the left subtree
    - the 3rd smallest is `3` itself
    - anything larger is in the right subtree

#### Why use subtree sizes in general

They let you answer order-statistic queries efficiently:

- find the `k`th smallest
- find the rank of a value
    - "how many values are `< x`?"
- predecessor / successor with rank logic

Without subtree sizes:

- `kthSmallest` is usually done by inorder traversal
- that costs $O(h + k)$, worst case $O(n)$ each query

With subtree sizes:

- `kthSmallest` becomes $O(h)$
- because at each node you know exactly which side to go

That matters when:

- the tree changes often
- and you ask `kthSmallest` often

#### Why the follow-up naturally leads to it

The original solution is:

- do inorder
- stop at the `k`th node

Good for one query.

Bad if you repeatedly do:

- insert
- delete
- `kthSmallest`
- `kthSmallest`
- `kthSmallest`

Each query may walk many nodes again.

So the natural idea is:

- keep extra info updated during insert/delete
- then queries become faster

The exact extra info needed for "kth smallest" is subtree size.

This structure is commonly called:

- an order-statistics tree
- or a BST augmented with subtree sizes

#### How `kthSmallest` works with sizes

At a node:

- let `left_size = size(node.left)`

Then:

- if `k <= left_size`
    - answer is in the left subtree
- if `k == left_size + 1`
    - current node is the answer
- otherwise
    - answer is in the right subtree
    - new `k` becomes `k - left_size - 1`

So you move down one path only.

#### Small example

Suppose:

```text
        5(size=6)
       /         \
   3(size=3)    7(size=2)
   /     \       /
 2(1)    4(1)  6(1)

Find k=4:

at 5, left_size=3
k == left_size + 1
answer is 5

Find k=5:

at 5, left_size=3
go right with k = 5 - 3 - 1 = 1
at 7, left_size=1
k <= left_size, go left
at 6, left_size=0, k=1
answer is 6

Important caveat¶

A plain BST can become skewed.

So complexities are:

\(O(h)\), not always \(O(\log n)\)

If you want guaranteed \(O(\log n)\):

use a balanced BST
like AVL / Red-Black Tree / Treap
and maintain subtree sizes too

For interview follow-up, saying this is enough.

Implementation for the follow-up¶

Below is a concise implementation of a BST augmented with subtree sizes.

It supports:

insert
delete
kth_smallest

Average-case complexity on a regular BST:

insert: \(O(h)\)
delete: \(O(h)\)
kth smallest: \(O(h)\)

from dataclasses import dataclass
from typing import Optional


@dataclass
class Node:
    val: int
    left: Optional["Node"] = None
    right: Optional["Node"] = None
    size: int = 1


def _size(node: Optional[Node]) -> int:
    return node.size if node else 0


def _recalc(node: Optional[Node]) -> None:
    if node:
        node.size = 1 + _size(node.left) + _size(node.right)


class OrderStatisticBST:
    def __init__(self):
        self.root: Optional[Node] = None

    def insert(self, val: int) -> None:
        self.root = self._insert(self.root, val)

    def _insert(self, node: Optional[Node], val: int) -> Node:
        if node is None:
            return Node(val)

        if val < node.val:
            node.left = self._insert(node.left, val)
        else:
            node.right = self._insert(node.right, val)

        _recalc(node)
        return node

    def delete(self, val: int) -> None:
        self.root = self._delete(self.root, val)

    def _delete(self, node: Optional[Node], val: int) -> Optional[Node]:
        if node is None:
            return None

        if val < node.val:
            node.left = self._delete(node.left, val)
        elif val > node.val:
            node.right = self._delete(node.right, val)
        else:
            if node.left is None:
                return node.right
            if node.right is None:
                return node.left

            # Replace with inorder successor.
            succ = self._min_node(node.right)
            node.val = succ.val
            node.right = self._delete(node.right, succ.val)

        _recalc(node)
        return node

    def _min_node(self, node: Node) -> Node:
        while node.left:
            node = node.left
        return node

    def kth_smallest(self, k: int) -> int:
        if k < 1 or k > _size(self.root):
            raise IndexError("k out of range")

        node = self.root
        while node:
            left_size = _size(node.left)

            if k <= left_size:
                node = node.left
            elif k == left_size + 1:
                return node.val
            else:
                k -= left_size + 1
                node = node.right

        raise RuntimeError("unreachable")

Example use¶

bst = OrderStatisticBST()

for x in [5, 3, 6, 2, 4, 1]:
    bst.insert(x)

print(bst.kth_smallest(1))  # 1
print(bst.kth_smallest(3))  # 3
print(bst.kth_smallest(6))  # 6

bst.delete(3)

print(bst.kth_smallest(3))  # 4

If you want the LeetCode follow-up answer in interview style¶

A concise good answer is:

If the BST changes often and we query kthSmallest frequently, augment each node with the size of its subtree.
Update that size after insert/delete.
Then kthSmallest is no longer an inorder scan; at each node, compare k with the left subtree size and move left or right.
Query becomes \(O(h)\) instead of re-traversing many nodes each time.
If we also keep the tree balanced, it becomes \(O(\log n)\) per update and query.

One subtle thing you may be missing¶

This is not "store the sorted list somewhere".

It is:

keep local counts on each node
so global rank queries can be answered by walking one root-to-leaf path

That is the key idea of augmentation in data structures in general:

store a little extra information
maintain it on updates
use it to answer expensive queries much faster

Follow-up on `OrderStatisticBST` above implementation¶

It seems your implementation of OrderStatisticBST allows different
nodes with the same value.  Am I right?  Is it OK?  Why did you make
that choice?  Isn't more common to allow only different values in BST?
What are the use cases for both?  Be concise.

Yes.

This implementation allows duplicates.

Because insertion uses:

if val < node.val:
    node.left = ...
else:
    node.right = ...

So equal values go to the right subtree.

Is it OK?¶

Yes, as long as the BST rule is defined consistently.

Two common variants are:

strict BST
- left < node < right
- no duplicates
duplicate-friendly BST
- left < node, right >= node
- or left <= node, right > node

Both are valid if all operations follow the same rule.

Why I made that choice¶

Mostly for simplicity.

It makes insert easy.
It keeps the structure general.
It matches many practical "multiset" style uses.

For the follow-up idea, duplicates do not change the core point:

maintain subtree sizes
use them to answer rank / kth queries

Isn't it more common to allow only distinct values?¶

For interview BST problems:

often yes
because examples usually assume unique keys unless duplicates are mentioned

For real data structures:

both are common
it depends on whether the structure is a set or a multiset

Use cases for distinct-only BST¶

You want set/map semantics.

membership
insert if absent
delete exact key
ordered dictionary
unique IDs, usernames, primary keys

Examples:

set
dict-like ordered map keyed by unique key

In that case, on insert:

if value already exists, do nothing
or update payload instead of creating a new node

Use cases for duplicate-friendly BST¶

You want multiset semantics.

repeated values are meaningful
frequency matters
rank queries over repeated values matter

Examples:

scores with ties
event timestamps
inventory counts by repeated key
streaming median with repeated numbers

A more common way to support duplicates¶

Instead of creating many equal-valued nodes, store a count per node.

Each node keeps:

val
count
- number of times this value appears
size
- total number of elements in subtree, including duplicates

That is usually cleaner because:

tree shape is better than chaining equal keys to one side
delete is simpler for duplicates
rank/kth logic stays natural

Example¶

Instead of:

use:

5(count=3)

What I would do in an interview¶

If duplicates are unspecified:

I would ask whether values are unique.

If I must choose silently:

for a LeetCode BST problem, I would usually assume unique values
for a reusable order-statistics tree, I would mention both options
and prefer count per node if duplicates matter

Distinct-only change¶

If you want unique keys only, change insert to:

if val < node.val:
    node.left = ...
elif val > node.val:
    node.right = ...
else:
    return node

Bottom line¶

Yes, my version allows duplicates.
That is valid, not a bug.
For production-like design, duplicates are often better handled with a per-node count.
For interview BST questions, unique keys are often the default unless stated otherwise.