Trie (Prefix Tree + Bitwise Trie)
Trie (Prefix Tree)
What is a Trie?
A Trie (pronounced "try", from retrieval) is a tree-shaped data structure designed for efficient storage and retrieval of strings over a fixed alphabet. Unlike a binary search tree where each node stores a complete key, in a trie each node represents a single character, and the key is reconstructed by walking from the root down to a marked node.
Core properties:
- Every path from the root to a marked node spells out a stored string.
- Common prefixes share the same path -- this is the trie's superpower.
- The root represents the empty string
"". - The depth of a node equals the length of the prefix it represents.
Intuition. Imagine a phone book organized not alphabetically page-by-page, but character-by-character. To look up "apple", you first go to the "a" section, then within that to "ap", then "app", then "appl", then "apple". If many words share the prefix "app" (apply, application, apparatus...), they all share the same first three steps. A trie encodes exactly this structure.
Visual Structure
Storing: ["apple", "app", "apt", "bat", "bad"]
Observations:
- "app" and "apple" share the path root -> a -> p -> p. The node at "app" is marked as end-of-word, and the path continues to "apple".
- "apt" branches off from "ap" (the second 'p' goes one way, 't' goes another).
- "bat" and "bad" share root -> b -> a, then diverge.
How many nodes? In the worst case (no shared prefixes), inserting N words of average length L creates O(N * L) nodes. With heavy prefix sharing, far fewer.
Node Structure
Fixed-size children array (lowercase English letters)
struct TrieNode {
TrieNode* children[26]; // one slot per letter a-z
bool isEnd; // true if a complete word ends here
int prefixCount; // how many words pass through this node
int endCount; // how many words end exactly here (for duplicates)
TrieNode() {
for (int i = 0; i < 26; i++) children[i] = nullptr;
isEnd = false;
prefixCount = 0;
endCount = 0;
}
};
Why 26? For lowercase English letters 'a' through 'z', we map character c to index
c - 'a'. If the alphabet is larger (uppercase + lowercase + digits), increase the array size
or use a hash map (discussed later).
Operations with Full Visual Walkthroughs
Insert
Goal: Insert the word into the trie, creating new nodes as needed.
Algorithm:
- Start at root.
- For each character in the word, compute its index.
- If the child at that index is
nullptr, create a new node. - Move to the child, increment its
prefixCount. - After the last character, mark
isEnd = true.
Walkthrough: Inserting "apple" into an empty trie.
Step 0: Start at root
(root)
Step 1: Process 'a' (index 0)
No child at index 0 -> create node
(root)
\
a [prefixCount=1]
Step 2: Process 'p' (index 15)
No child at index 15 of node 'a' -> create node
(root)
\
a
\
p [prefixCount=1]
Step 3: Process 'p' (index 15)
No child at index 15 of node 'p' -> create node
(root)
\
a
\
p
\
p [prefixCount=1]
Step 4: Process 'l' (index 11)
No child at index 11 -> create node
(root)
\
a
\
p
\
p
\
l [prefixCount=1]
Step 5: Process 'e' (index 4)
No child at index 4 -> create node, mark isEnd=true
(root)
\
a
\
p
\
p
\
l
\
e* [prefixCount=1, isEnd=true]
Now insert "app":
Follow existing path: root -> a -> p -> p
All three nodes already exist! Just increment prefixCount along the way.
At the second 'p', mark isEnd = true.
(root)
\
a [prefixCount=2]
\
p [prefixCount=2]
\
p* [prefixCount=2, isEnd=true] <-- newly marked
\
l [prefixCount=1]
\
e* [prefixCount=1]
Key insight: "app" doesn't create any new nodes -- it reuses the existing prefix path.
Search
Goal: Return true if the exact word exists in the trie (not just as a prefix).
Algorithm:
- Start at root.
- For each character, try to follow the corresponding child.
- If any child is
nullptr, returnfalse. - After processing all characters, return
isEndof the current node.
Walkthrough: Search for "app" -- FOUND
Trie state (contains "apple" and "app"):
root -> a -> p -> p* -> l -> e*
Step 1: 'a' -> follow child[0] -> node 'a' (exists)
Step 2: 'p' -> follow child[15] -> node 'p' (exists)
Step 3: 'p' -> follow child[15] -> node 'p*' (exists)
End of word. Check isEnd: TRUE -> return true
Walkthrough: Search for "ap" -- NOT FOUND
Step 1: 'a' -> follow child[0] -> node 'a' (exists)
Step 2: 'p' -> follow child[15] -> node 'p' (exists)
End of word. Check isEnd: FALSE -> return false
"ap" exists as a PREFIX in the trie, but no word ends there.
Walkthrough: Search for "axe" -- NOT FOUND
Step 1: 'a' -> follow child[0] -> node 'a' (exists)
Step 2: 'x' -> follow child[23] -> nullptr -> return false immediately
StartsWith (Prefix Search)
Goal: Return true if ANY stored word starts with the given prefix.
Algorithm: Same as search, but skip the isEnd check at the end. If we can follow
the entire prefix without hitting nullptr, return true.
Walkthrough: startsWith("ap") -> TRUE
Step 1: 'a' -> node 'a' (exists)
Step 2: 'p' -> node 'p' (exists)
Reached end of prefix without hitting nullptr -> return true
(Words "app" and "apple" both start with "ap")
Walkthrough: startsWith("ba") on trie with only "apple"/"app" -> FALSE
Step 1: 'b' -> child[1] is nullptr -> return false
Delete
Goal: Remove a word from the trie. Must be careful not to remove nodes that serve as prefixes for other words.
Strategy:
- First, verify the word exists (search for it).
- Walk down the word. At each node, decrement
prefixCount. - At the end node, set
isEnd = false, decrementendCount. - On the way back up (using recursion), if a node's
prefixCountdrops to 0, delete it and set the parent's child pointer tonullptr.
Walkthrough: Delete "apple" from trie containing {"apple", "app"}
Before:
root -> a(2) -> p(2) -> p*(2) -> l(1) -> e*(1)
(numbers in parentheses are prefixCounts)
Processing 'a': prefixCount 2 -> 1 (still > 0, keep node)
Processing 'p': prefixCount 2 -> 1 (still > 0, keep node)
Processing 'p': prefixCount 2 -> 1 (still > 0, keep node, isEnd stays true)
Processing 'l': prefixCount 1 -> 0 (now 0! delete this node)
Processing 'e': prefixCount 1 -> 0 (now 0! delete this node)
After:
root -> a(1) -> p(1) -> p*(1)
"app" is still intact. Only the "le" suffix was removed.
Walkthrough: Now delete "app"
Before:
root -> a(1) -> p(1) -> p*(1)
Processing 'a': prefixCount 1 -> 0 (delete!)
Processing 'p': prefixCount 1 -> 0 (delete!)
Processing 'p': prefixCount 1 -> 0 (delete!, unmark isEnd)
After:
root (empty trie)
All nodes cleaned up since no words remain.
Complete Implementation
#include <bits/stdc++.h>
using namespace std;
class Trie {
struct TrieNode {
TrieNode* children[26];
bool isEnd;
int prefixCount;
TrieNode() : isEnd(false), prefixCount(0) {
memset(children, 0, sizeof(children));
}
};
TrieNode* root;
// Helper: recursively delete a word, cleaning up unused nodes
bool deleteHelper(TrieNode* node, const string& word, int depth) {
if (depth == (int)word.size()) {
if (!node->isEnd) return false; // word not found
node->isEnd = false;
// If node has no children, it can be deleted
return isEmpty(node);
}
int idx = word[depth] - 'a';
if (!node->children[idx]) return false; // word not found
node->children[idx]->prefixCount--;
bool shouldDelete = deleteHelper(node->children[idx], word, depth + 1);
if (shouldDelete) {
delete node->children[idx];
node->children[idx] = nullptr;
// Check if current node is also deletable
return !node->isEnd && isEmpty(node);
}
return false;
}
bool isEmpty(TrieNode* node) {
for (int i = 0; i < 26; i++)
if (node->children[i]) return false;
return true;
}
public:
Trie() { root = new TrieNode(); }
// --- Insert a word ---
void insert(const string& word) {
TrieNode* curr = root;
for (char c : word) {
int idx = c - 'a';
if (!curr->children[idx])
curr->children[idx] = new TrieNode();
curr = curr->children[idx];
curr->prefixCount++;
}
curr->isEnd = true;
}
// --- Search for exact word ---
bool search(const string& word) {
TrieNode* curr = root;
for (char c : word) {
int idx = c - 'a';
if (!curr->children[idx]) return false;
curr = curr->children[idx];
}
return curr->isEnd;
}
// --- Check if any word starts with given prefix ---
bool startsWith(const string& prefix) {
TrieNode* curr = root;
for (char c : prefix) {
int idx = c - 'a';
if (!curr->children[idx]) return false;
curr = curr->children[idx];
}
return true;
}
// --- Count words with given prefix ---
int countWordsWithPrefix(const string& prefix) {
TrieNode* curr = root;
for (char c : prefix) {
int idx = c - 'a';
if (!curr->children[idx]) return 0;
curr = curr->children[idx];
}
return curr->prefixCount;
}
// --- Delete a word ---
void erase(const string& word) {
deleteHelper(root, word, 0);
}
};
int main() {
Trie trie;
trie.insert("apple");
trie.insert("app");
trie.insert("apt");
trie.insert("bat");
trie.insert("bad");
cout << trie.search("app") << endl; // 1 (true)
cout << trie.search("ap") << endl; // 0 (false)
cout << trie.startsWith("ap") << endl; // 1 (true)
cout << trie.countWordsWithPrefix("ap") << endl; // 3 (apple, app, apt)
cout << trie.countWordsWithPrefix("ba") << endl; // 2 (bat, bad)
trie.erase("apple");
cout << trie.search("apple") << endl; // 0 (false)
cout << trie.search("app") << endl; // 1 (true, still exists)
return 0;
}
Memory Optimization: Using HashMap for Children
When the alphabet is very large (e.g., Unicode characters, or mixed case + digits + symbols), a fixed-size array wastes enormous memory. Use a hash map instead:
struct TrieNode {
unordered_map<char, TrieNode*> children;
bool isEnd = false;
int prefixCount = 0;
};
Trade-offs:
In competitive programming, the fixed array is almost always preferred for speed.
Array-Based Trie (Memory Pool)
For competitive programming, avoid new/delete overhead with a flat array:
const int MAXNODES = 1000005; // pre-allocate
int trie[MAXNODES][26];
int cnt[MAXNODES]; // prefixCount
bool isEnd[MAXNODES];
int tot = 0; // next available node index
int newNode() {
tot++;
memset(trie[tot], 0, sizeof(trie[tot]));
cnt[tot] = 0;
isEnd[tot] = false;
return tot;
}
void insert(const string& word) {
int cur = 0; // root = 0
for (char c : word) {
int ch = c - 'a';
if (!trie[cur][ch])
trie[cur][ch] = newNode();
cur = trie[cur][ch];
cnt[cur]++;
}
isEnd[cur] = true;
}
bool search(const string& word) {
int cur = 0;
for (char c : word) {
int ch = c - 'a';
if (!trie[cur][ch]) return false;
cur = trie[cur][ch];
}
return isEnd[cur];
}
This avoids dynamic allocation entirely and is much faster in practice.
Bitwise Trie (XOR Trie)
A powerful variant where we store integers bit-by-bit, from the most significant bit (MSB) down to the least significant bit (LSB). Each node has exactly 2 children (bit 0 or bit 1).
Why?
The classic application: given an array of integers, find two elements whose XOR is maximum.
Key insight: To maximize XOR, at each bit position (from MSB to LSB), we want to go in the opposite direction of the query number's bit. If the query bit is 1, we want to follow the 0 branch (so the XOR bit is 1), and vice versa.
Visual Example
Insert: 5 (101), 2 (010), 7 (111), 1 (001) -- using 3 bits for clarity:
Max XOR query for 2 (010):
Bit 2 = 0 -> want 1 -> go right to '1'
Bit 1 = 1 -> want 0 -> go left to '0' (XOR bit = 1)
Bit 0 = 0 -> want 1 -> go right to '1' (XOR bit = 1)
Reached 5 (101). XOR = 010 ^ 101 = 111 = 7. Maximum!
Implementation
class BitTrie {
struct Node {
int children[2];
int count; // how many numbers pass through this node
Node() : count(0) { children[0] = children[1] = -1; }
};
vector<Node> nodes;
int BITS; // number of bits to consider (e.g., 30 for values up to ~10^9)
public:
BitTrie(int bits = 30) : BITS(bits) {
nodes.push_back(Node()); // root = node 0
}
void insert(int num) {
int curr = 0;
for (int i = BITS; i >= 0; i--) {
int bit = (num >> i) & 1;
if (nodes[curr].children[bit] == -1) {
nodes[curr].children[bit] = nodes.size();
nodes.push_back(Node());
}
curr = nodes[curr].children[bit];
nodes[curr].count++;
}
}
void erase(int num) {
int curr = 0;
for (int i = BITS; i >= 0; i--) {
int bit = (num >> i) & 1;
curr = nodes[curr].children[bit];
nodes[curr].count--;
}
}
// Find the maximum XOR achievable between `num` and any inserted number
int maxXor(int num) {
int curr = 0, result = 0;
for (int i = BITS; i >= 0; i--) {
int bit = (num >> i) & 1;
int want = 1 - bit; // opposite bit maximizes XOR
if (nodes[curr].children[want] != -1 &&
nodes[nodes[curr].children[want]].count > 0) {
result |= (1 << i);
curr = nodes[curr].children[want];
} else {
curr = nodes[curr].children[bit];
}
}
return result;
}
};
Complexity Analysis
Where:
- L = length of the word
- P = length of the prefix
- N = number of words
- SIGMA = alphabet size (26 for lowercase English)
Space analysis in detail:
- Each node uses
SIGMApointers + a few fields. - Maximum number of nodes = sum of lengths of all inserted words + 1 (for root).
- In practice, prefix sharing significantly reduces node count.
For the Bitwise Trie:
- Each number uses O(B) nodes where B = number of bits (typically 30 or 31).
- Maximum nodes = O(N * B).
Classic Problems
1. Implement Trie (LeetCode 208)
Straightforward implementation of insert, search, startsWith.
class Trie {
struct Node {
Node* ch[26] = {};
bool end = false;
};
Node* root = new Node();
public:
void insert(string word) {
Node* cur = root;
for (char c : word) {
if (!cur->ch[c-'a']) cur->ch[c-'a'] = new Node();
cur = cur->ch[c-'a'];
}
cur->end = true;
}
bool search(string word) {
Node* cur = root;
for (char c : word) {
if (!cur->ch[c-'a']) return false;
cur = cur->ch[c-'a'];
}
return cur->end;
}
bool startsWith(string prefix) {
Node* cur = root;
for (char c : prefix) {
if (!cur->ch[c-'a']) return false;
cur = cur->ch[c-'a'];
}
return true;
}
};
2. Word Search II (LeetCode 212) -- Trie + DFS Backtracking
Problem: Given a 2D board of characters and a list of words, find all words that can be formed by connecting adjacent cells (up/down/left/right), using each cell at most once per word.
Key insight: Build a trie from the word list. Then DFS on the board, following trie edges. This prunes the search: if the current path doesn't match any trie prefix, stop immediately.
class Solution {
struct TrieNode {
TrieNode* ch[26] = {};
string* word = nullptr; // points to the actual word if this node is end
int cnt = 0; // number of words in subtree (for pruning)
};
TrieNode* root = new TrieNode();
int dx[4] = {0, 0, 1, -1};
int dy[4] = {1, -1, 0, 0};
void addWord(TrieNode* node, string& w) {
for (char c : w) {
if (!node->ch[c-'a']) node->ch[c-'a'] = new TrieNode();
node = node->ch[c-'a'];
node->cnt++;
}
node->word = &w;
}
void dfs(vector<vector<char>>& board, int i, int j, TrieNode* node,
vector<string>& result) {
char c = board[i][j];
if (c == '#' || !node->ch[c - 'a'] || node->ch[c - 'a']->cnt == 0) return;
node = node->ch[c - 'a'];
if (node->word) {
result.push_back(*node->word);
node->word = nullptr;
// Decrement counts up the trie to prune found words
// (simplified: just decrement cnt later)
}
board[i][j] = '#'; // mark visited
for (int d = 0; d < 4; d++) {
int ni = i + dx[d], nj = j + dy[d];
if (ni >= 0 && ni < (int)board.size() &&
nj >= 0 && nj < (int)board[0].size()) {
dfs(board, ni, nj, node, result);
}
}
board[i][j] = c; // unmark
}
public:
vector<string> findWords(vector<vector<char>>& board, vector<string>& words) {
for (auto& w : words) addWord(root, w);
vector<string> result;
for (int i = 0; i < (int)board.size(); i++)
for (int j = 0; j < (int)board[0].size(); j++)
dfs(board, i, j, root, result);
return result;
}
};
3. Maximum XOR of Two Numbers in an Array (LeetCode 421)
Problem: Given an array of integers, find the maximum XOR of any two elements.
Approach: Insert all numbers into a bitwise trie. For each number, query the trie for the maximum XOR partner.
class Solution {
public:
int findMaximumXOR(vector<int>& nums) {
// Build bit trie
int maxBit = 30;
vector<array<int,2>> trie(1, {-1, -1}); // node 0 = root
auto insert = [&](int num) {
int cur = 0;
for (int i = maxBit; i >= 0; i--) {
int bit = (num >> i) & 1;
if (trie[cur][bit] == -1) {
trie[cur][bit] = trie.size();
trie.push_back({-1, -1});
}
cur = trie[cur][bit];
}
};
auto query = [&](int num) -> int {
int cur = 0, res = 0;
for (int i = maxBit; i >= 0; i--) {
int bit = (num >> i) & 1;
int want = 1 - bit;
if (trie[cur][want] != -1) {
res |= (1 << i);
cur = trie[cur][want];
} else {
cur = trie[cur][bit];
}
}
return res;
};
for (int x : nums) insert(x);
int ans = 0;
for (int x : nums) ans = max(ans, query(x));
return ans;
}
};
Time: O(N * B) where B = 31 bits. Space: O(N * B).
4. Design Add and Search Words Data Structure (LeetCode 211) -- Trie with Wildcard
Problem: Support addWord(word) and search(word) where word may contain . which
matches any single character.
Approach: Standard trie for addWord. For search, when we encounter ., try all 26
children recursively.
class WordDictionary {
struct Node {
Node* ch[26] = {};
bool end = false;
};
Node* root = new Node();
bool dfs(Node* node, const string& word, int i) {
if (i == (int)word.size()) return node->end;
if (word[i] == '.') {
for (int c = 0; c < 26; c++)
if (node->ch[c] && dfs(node->ch[c], word, i + 1))
return true;
return false;
}
int c = word[i] - 'a';
return node->ch[c] && dfs(node->ch[c], word, i + 1);
}
public:
void addWord(string word) {
Node* cur = root;
for (char c : word) {
if (!cur->ch[c-'a']) cur->ch[c-'a'] = new Node();
cur = cur->ch[c-'a'];
}
cur->end = true;
}
bool search(string word) {
return dfs(root, word, 0);
}
};
Time: addWord is O(L). search is O(L) without wildcards, but O(26^D * L) in the worst
case where D is the number of . characters (exponential, but pruned heavily in practice).
5. Replace Words (LeetCode 648)
Problem: Given a dictionary of root words and a sentence, replace every word in the sentence with the shortest root that is a prefix of it.
Approach: Build a trie from the dictionary. For each word in the sentence, walk the trie;
as soon as we hit an isEnd node, that's the shortest root.
class Solution {
public:
string replaceWords(vector<string>& dictionary, string sentence) {
// Build trie
struct Node {
Node* ch[26] = {};
bool end = false;
};
Node* root = new Node();
for (auto& w : dictionary) {
Node* cur = root;
for (char c : w) {
if (!cur->ch[c-'a']) cur->ch[c-'a'] = new Node();
cur = cur->ch[c-'a'];
}
cur->end = true;
}
// Process sentence
istringstream iss(sentence);
string word, result;
bool first = true;
while (iss >> word) {
if (!first) result += ' ';
first = false;
// Find shortest prefix in trie
Node* cur = root;
bool found = false;
for (int i = 0; i < (int)word.size(); i++) {
int c = word[i] - 'a';
if (!cur->ch[c]) break;
cur = cur->ch[c];
if (cur->end) {
result += word.substr(0, i + 1);
found = true;
break;
}
}
if (!found) result += word;
}
return result;
}
};
6. Longest Word in Dictionary (LeetCode 720)
Problem: Find the longest word in the dictionary such that every prefix of the word is also in the dictionary. If tie, return lexicographically smallest.
Approach: Insert all words into trie. BFS/DFS from root, only following children that
are marked isEnd. The deepest such path is the answer.
class Solution {
public:
string longestWord(vector<string>& words) {
struct Node {
Node* ch[26] = {};
bool end = false;
string word = "";
};
Node* root = new Node();
for (auto& w : words) {
Node* cur = root;
for (char c : w) {
if (!cur->ch[c-'a']) cur->ch[c-'a'] = new Node();
cur = cur->ch[c-'a'];
}
cur->end = true;
cur->word = w;
}
string ans = "";
// BFS: only visit children marked as isEnd
queue<Node*> q;
q.push(root);
while (!q.empty()) {
Node* node = q.front(); q.pop();
for (int i = 25; i >= 0; i--) { // reverse order for lex smallest
if (node->ch[i] && node->ch[i]->end) {
if ((int)node->ch[i]->word.size() > (int)ans.size())
ans = node->ch[i]->word;
q.push(node->ch[i]);
}
}
}
return ans;
}
};
Advanced Applications
Counting Distinct Substrings
Insert all suffixes of a string into the trie. Each node in the trie corresponds to a distinct substring. Count of distinct substrings = total number of nodes (excluding root).
For a string of length n, this takes O(n^2) time and space. For better complexity, use a suffix array + LCP array or suffix automaton.
Autocomplete System
Augment each trie node with a priority queue or frequency map of the top-k words passing through it. On each character typed, traverse the trie and return suggestions.
Compressed Trie (Patricia Trie / Radix Tree)
Compress chains of single-child nodes into one edge labeled with a string.
When to Use a Trie
| Use Case | Why Trie? |
|---|---|
| Prefix-based queries (autocomplete, spell-check) | O(L) prefix lookup, shares common prefixes |
| Word dictionary with prefix lookup | Better than hash set when prefix operations needed |
| XOR-related problems | Bitwise trie enables greedy bit-by-bit optimization |
| Counting distinct substrings | Each node = one distinct substring |
| Longest common prefix of multiple strings | LCA in the trie |
| Word break / dictionary matching | DFS on trie during string scan |
When NOT to use a trie:
- Simple "does this exact word exist?" queries -> use
unordered_set(simpler, often faster). - When memory is extremely tight -> trie uses much more memory than a hash set.
- When the alphabet size is huge and prefix operations aren't needed.
Summary
The trie is a fundamental structure that trades space for prefix-query efficiency. Its key strength is that operations on a string of length L take O(L) time regardless of how many strings are stored -- they don't even need to look at other strings. The bitwise variant transforms integer XOR problems into tree traversals. Mastering the trie unlocks an entire class of string and bit-manipulation problems.