Skip to content

542. 01 Matrix

On LeetCode ->

Reformulated question

Given a binary matrix, return a matrix where each cell contains its shortest Manhattan distance to any 0.

Example:

input:
[[0,1,1],
 [1,1,1]]

output:
[[0,1,2],
 [1,2,3]]

meaning:
cell (1,2) is 3 steps from nearest 0

Key trick

Use multi-source BFS:

  • Start the queue with all 0 cells.
  • Expand outward one layer at a time.
  • The first time a 1 is reached is its shortest distance.

Trap

Common mistakes:

  • Running BFS/DFS from every 1, which is too slow: \(O((mn)^2)\).
  • Using recursion with cycles between neighboring 1s.
  • Building the matrix with:
d = [[-1] * n] * m

which aliases rows. - Bad bounds like i > m instead of i >= m. - Returning wrong values for out-of-bounds cells.

Why this question is interesting

It looks like DP, but the clean solution is really a graph shortest path from many sources at once.

It tests whether you can recognize when reversing the search makes the problem easy.

Solve the problem with idiomatic python

from collections import deque

class Solution:
    def updateMatrix(self, mat: list[list[int]]) -> list[list[int]]:
        m, n = len(mat), len(mat[0])

        # -1 means "not processed yet"
        dist = [[-1 for _ in range(n)] for _ in range(m)]
        q = deque()

        # All zeros are BFS sources with distance 0.
        for i in range(m):
            for j in range(n):
                if mat[i][j] == 0:
                    dist[i][j] = 0
                    q.append((i, j))

        # Standard BFS from all zeros at once.
        while q:
            i, j = q.popleft()

            for ni, nj in ((i - 1, j), (i + 1, j), (i, j - 1), (i, j + 1)):
                if 0 <= ni < m and 0 <= nj < n and dist[ni][nj] == -1:
                    dist[ni][nj] = dist[i][j] + 1
                    q.append((ni, nj))

        return dist

A valid alternative is the 2-pass DP solution:

  • Top-left to bottom-right.
  • Bottom-right to top-left.

It also runs in \(O(mn)\), but BFS is usually the best interview answer here.

Pytest test

import pytest

@pytest.mark.parametrize(
    ("mat", "expected"),
    [
        (
            [[0, 0, 0], [0, 1, 0], [0, 0, 0]],
            [[0, 0, 0], [0, 1, 0], [0, 0, 0]],
        ),
        (
            [[0, 0, 0], [0, 1, 0], [1, 1, 1]],
            [[0, 0, 0], [0, 1, 0], [1, 2, 1]],
        ),
        (
            [[1, 1, 1], [1, 0, 1], [1, 1, 1]],
            [[2, 1, 2], [1, 0, 1], [2, 1, 2]],
        ),
        (
            [[0]],
            [[0]],
        ),
        (
            [[1, 0, 1, 1, 0]],
            [[1, 0, 1, 1, 0]],
        ),
        (
            [[1], [1], [0], [1]],
            [[2], [1], [0], [1]],
        ),
    ],
)
def test_update_matrix(mat, expected):
    assert Solution().updateMatrix(mat) == expected

Comment my solution

Main issues:

  • The recurrence is not safe here because neighbors depend on each other cyclically.
  • Recursive DFS does not guarantee shortest distance unless you do much more work.
  • d = [[-1]*n]*m creates shared rows.
  • Bounds are wrong:
    • i > m should be i >= m
    • j > n should be j >= n
  • Returning 1 for out-of-bounds is incorrect.
  • acc should not be stored into d[i][j] when hitting a 0; distance to nearest 0 from a 0 is always 0.
  • This approach can recurse deeply and revisit states many times.

Better mental model:

  • Treat each cell as a graph node.
  • Push all 0s first.
  • BFS computes shortest distance to the nearest 0 in one pass.
# - I think it's a dynamic programming problem with the following recursion:
#
#    d(i,j) = 0 if mat(i,j) == 0
#           = 1 + min(d(i-1,j), d(i+1,j), d(i,j-1), d(i, j +1))
#
# - the recursion is garanted to stop because at least one 0 in the matrix
# - we have to do memoization to not recurse deeply when we already computed the value
# WRONG
class Solution:
    def updateMatrix(self, mat: list[list[int]]) -> list[list[int]]:
        m = len(mat) # line number
        n = len(mat[0]) # column number
        d = [[-1]*n]*m # distance matrix no value computed yet
        def dist(acc,i,j):
            if i < 0 or i > m or j < 0 or j > n:
                return 1 # we just discard them
            if d[i][j] != -1:
                return d[i][j]
            if mat[i][j] == 0:
                d[i][j] = acc
                return acc
            return min(dist(acc + 1,i-1,j), dist(acc + 1, i+1,j), dist(acc + 1,i,j-1), dist(acc + 1, i, j+1))
        for i in range(m):
            for j in range(n):
                d[i][j] = dist(0, i, j)
        return d

Extra

While I completely understood the solution, I needed to know more about DFS and graph. I felt I was missing context and vocabulary. So I asked AI for help.


What you're missing is mostly graph vocabulary.

A matrix like this can be seen as a graph:

  • Each cell is a node.
  • Two cells are connected by an edge if they are neighbors up/down/left/right.
  • Every edge has weight 1 because moving one step costs 1.

So the problem becomes:

  • For every node, find the shortest path length to any node whose value is 0.

That is a classic shortest path in an unweighted graph problem.

Why BFS?

BFS means Breadth-First Search.

Its key property is:

  • It explores nodes in order of increasing distance from the start.

That is exactly why BFS solves shortest path in an unweighted graph.

People often first see BFS on trees, but it is really for graphs in general.

A tree is just a special kind of graph.

So here:

  • The matrix is not “just a matrix”.
  • It is an implicit graph.
  • “Implicit” means you do not build adjacency lists explicitly.
  • You compute neighbors on the fly with:
(i - 1, j), (i + 1, j), (i, j - 1), (i, j + 1)

Why not DFS?

DFS goes deep along one path first.

That does not guarantee shortest distance.

Example:

0 - a - b - c
 \
  x

If DFS explores through a, b, c first, it may find a longer route before the short one.

BFS avoids this because it explores by layers:

  • distance 0
  • then distance 1
  • then distance 2
  • etc.

What are the “sources” here?

Usually BFS starts from one node.

Here we start from all zeros at once.

This is called multi-source BFS.

Meaning:

  • Put every 0 in the queue initially.
  • Give them distance 0.
  • Then expand outward.

This is equivalent to asking:

  • If “waves” start simultaneously from every 0,
  • when does each 1 get hit first?

That first hit time is its answer.

Why does it reach every cell?

Because the graph of the grid is connected enough through neighbor moves inside each region, and the problem guarantees at least one 0.

More directly:

  • Every cell can move step by step toward some 0.
  • Since at least one 0 exists, every cell has some finite Manhattan path to a 0.
  • BFS from all 0s spreads through all valid neighbors.
  • So eventually every cell is visited.

In a rectangular grid with 4-direction moves:

  • from any cell (i, j) to any other cell (x, y),
  • you can move vertically |i - x| times,
  • and horizontally |j - y| times.

So the grid graph is connected.

That is enough to guarantee all cells are reachable from the set of zero cells.

Why is the first visit the shortest distance?

This is the core BFS fact.

Think in layers.

  • Initially, queue contains all cells at distance 0 from a zero:
    • exactly the zero cells.
  • When you pop a cell with distance d, all unvisited neighbors get distance d + 1.
  • Therefore queue contents are discovered in nondecreasing distance order.

So if a cell is first discovered with distance k:

  • there is a path of length k to a zero,
  • and no shorter path could exist,
  • because otherwise BFS would have discovered it earlier from layer < k.

That is the proof idea.

A compact version:

  • BFS visits nodes by increasing path length.
  • The first time a node is visited is the shortest path to it.
  • Starting from all 0s means shortest path to the nearest 0.

The useful recognition pattern

When you see:

  • grid / matrix
  • move in 4 directions
  • each move costs 1
  • shortest number of steps
  • nearest something
  • spread from many starting points

you should think:

  • graph
  • unweighted shortest path
  • BFS
  • maybe multi-source BFS

Vocabulary cheatsheet

Useful words to recognize and say in interview:

  • Grid as graph
    • cells are nodes, neighbor moves are edges
  • Implicit graph
    • neighbors are generated, not stored
  • Unweighted graph
    • every move costs the same
  • Shortest path
    • minimum number of edges/steps
  • BFS layer
    • all nodes at the same distance
  • Multi-source BFS
    • BFS starting from several nodes at once
  • First visit is optimal
    • in BFS, first discovery gives shortest distance in unweighted graphs

Tiny mental model

Imagine dropping ink on every 0 at the same time:

  • after 0 seconds, only zeros are colored
  • after 1 second, all cells at distance 1 are colored
  • after 2 seconds, all cells at distance 2 are colored

The time when a cell gets colored is exactly its answer.

That is BFS.