Serialization & Deserialization
Table of Contents
Converting data structures to flat strings and reconstructing them perfectly — the technique behind LeetCode's tree format, network protocols, and every JSON.stringify call you've ever made.
What Is Serialization?
Serialization converts a data structure or object into a sequence of bits (typically a string) so it can be stored in a file, sent over a network, or passed between processes.
Deserialization reverses the process — reconstructing the original data structure from the flat representation.
The core challenge: the serialized format must capture enough information to reconstruct the original structure unambiguously. A serialized binary tree needs to encode not just the values but also the shape — which nodes are missing, which child is left vs. right.
Original Structure Serialize → "1,2,#,#,3,#,#"
1
/ \ ← Deserialize
2 3
Key properties of a good serialization scheme:
| Property | Why it matters |
|---|---|
| Lossless | Must reconstruct the exact original structure |
| Unambiguous | One string maps to exactly one structure |
| Compact | Minimize space overhead |
| Streamable | Can begin decoding before receiving the entire string |
String Serialization — Length-Prefix Encoding
The Problem
Encode a list of strings into a single string, then decode it back. The strings can contain any character — including your delimiter.
LeetCode 271: Encode and Decode Strings
Why Delimiters Alone Fail
If you join with ,, how do you decode "a,b,c"? Was the input ["a", "b", "c"] or ["a,b", "c"] or ["a,b,c"]? Ambiguous.
The Fix: Length-Prefix Encoding
Prepend each string with its length followed by a fixed separator (like #). The length tells you exactly how many characters to read — no ambiguity regardless of content.
Input: ["hello", "world", "a,b"]
Encode: "5#hello5#world3#a,b"
↑ ↑
len=5, read 5 len=3, read 3 chars including the comma
Implementation
class Codec {
public:
// Encodes a list of strings to a single string
string encode(vector<string>& strs) {
string encoded;
for (const string& s : strs) {
encoded += to_string(s.size()) + '#' + s;
}
return encoded;
}
// Decodes a single string to a list of strings
vector<string> decode(string s) {
vector<string> result;
int i = 0;
while (i < s.size()) {
int j = s.find('#', i); // find the separator
int len = stoi(s.substr(i, j - i)); // extract length
result.push_back(s.substr(j + 1, len)); // extract string
i = j + 1 + len; // advance past this string
}
return result;
}
};
Complexity
| Time | Space | |
|---|---|---|
| Encode | O(n) | O(n) |
| Decode | O(n) | O(n) |
Where n is total characters across all strings.
Why This Works
The length field is self-delimiting — you know where the payload ends without scanning for a separator character. This makes it safe for arbitrary content, including binary data and strings that contain # itself.
Array Serialization — Delimiter Encoding
Simple Case: Integer Arrays
When elements come from a restricted alphabet (e.g. integers), a plain delimiter works:
// Encode: [1, 23, 456] → "1,23,456"
string encode(vector<int>& nums) {
string s;
for (int i = 0; i < nums.size(); i++) {
if (i > 0) s += ',';
s += to_string(nums[i]);
}
return s;
}
// Decode: "1,23,456" → [1, 23, 456]
vector<int> decode(string s) {
vector<int> result;
stringstream ss(s);
string token;
while (getline(ss, token, ',')) {
result.push_back(stoi(token));
}
return result;
}
This is safe because integers never contain ,. For string arrays, use length-prefix encoding from the previous section.
Nested Arrays (Run-Length Encoding Variant)
For nested structures like [[1,2],[3],[4,5,6]], use a bracket-counting or recursive approach:
// Encode nested: use bracket notation directly
// "[[1,2],[3],[4,5,6]]"
// Decode by parsing brackets with a stack
vector<vector<int>> decodeNested(string s) {
vector<vector<int>> result;
int i = 1; // skip outer '['
while (i < s.size() - 1) {
if (s[i] == '[') {
vector<int> inner;
i++; // skip '['
while (s[i] != ']') {
int j = i;
while (j < s.size() && s[j] != ',' && s[j] != ']') j++;
inner.push_back(stoi(s.substr(i, j - i)));
i = (s[j] == ',') ? j + 1 : j;
}
result.push_back(inner);
i++; // skip ']'
if (i < s.size() && s[i] == ',') i++; // skip ','
}
}
return result;
}
Binary Tree Serialization — Preorder with Nulls
The Core Idea
A preorder traversal (Root -> Left -> Right) that explicitly marks null children with a sentinel (like #) produces an unambiguous encoding. The null markers capture the tree's shape.
Tree: 1 Preorder with nulls:
/ \
2 3 "1,2,#,#,3,4,#,#,5,#,#"
/ \
4 5
LeetCode 297: Serialize and Deserialize Binary Tree
Why Preorder + Nulls Works
Without null markers, you can't distinguish these two trees:
Tree A: 1 Tree B: 1
/ \
2 2
Both give preorder: [1, 2]
With nulls: A = "1,2,#,#,#" B = "1,#,2,#,#" ← now distinct
The null markers encode the structure. Since preorder visits the root first, the deserializer knows the first value is the root, then recursively builds left (until it hits #), then right.
Implementation
class Codec {
public:
// Serialize: preorder DFS, null nodes marked as "#"
string serialize(TreeNode* root) {
if (!root) return "#";
return to_string(root->val) + ","
+ serialize(root->left) + ","
+ serialize(root->right);
}
// Deserialize: consume tokens left to right
TreeNode* deserialize(string data) {
queue<string> tokens;
stringstream ss(data);
string token;
while (getline(ss, token, ',')) {
tokens.push(token);
}
return buildTree(tokens);
}
private:
TreeNode* buildTree(queue<string>& tokens) {
string val = tokens.front();
tokens.pop();
if (val == "#") return nullptr;
TreeNode* node = new TreeNode(stoi(val));
node->left = buildTree(tokens); // recursion consumes left subtree
node->right = buildTree(tokens); // then right subtree
return node;
}
};
Step-by-Step Trace
Serialize tree: 1
/ \
2 3
/ \
4 5
serialize(1)
= "1," + serialize(2) + "," + serialize(3)
= "1," + "2," + serialize(null) + "," + serialize(null) + "," + serialize(3)
= "1," + "2,#,#," + "3," + serialize(4) + "," + serialize(5)
= "1," + "2,#,#," + "3," + "4,#,#," + "5,#,#"
= "1,2,#,#,3,4,#,#,5,#,#"
Deserialize "1,2,#,#,3,4,#,#,5,#,#":
tokens: [1, 2, #, #, 3, 4, #, #, 5, #, #]
build() → pop "1" → node(1)
node(1).left = build() → pop "2" → node(2)
node(2).left = build() → pop "#" → null
node(2).right = build() → pop "#" → null
node(1).right = build() → pop "3" → node(3)
node(3).left = build() → pop "4" → node(4)
node(4).left = build() → pop "#" → null
node(4).right = build() → pop "#" → null
node(3).right = build() → pop "5" → node(5)
node(5).left = build() → pop "#" → null
node(5).right = build() → pop "#" → null
Complexity
| Time | Space | |
|---|---|---|
| Serialize | O(n) | O(n) for the string + O(h) call stack |
| Deserialize | O(n) | O(n) for queue + O(h) call stack |
Where n = number of nodes, h = height of tree.
Binary Tree Serialization — Level Order (BFS)
Why Level Order?
This is the format LeetCode uses: [1,2,3,null,null,4,5]. It's more human-readable for balanced trees and naturally maps to BFS.
Implementation
class Codec {
public:
string serialize(TreeNode* root) {
if (!root) return "";
string result;
queue<TreeNode*> q;
q.push(root);
while (!q.empty()) {
TreeNode* node = q.front();
q.pop();
if (!result.empty()) result += ",";
if (node) {
result += to_string(node->val);
q.push(node->left);
q.push(node->right);
} else {
result += "#";
}
}
// Optional: trim trailing nulls for compactness
return result;
}
TreeNode* deserialize(string data) {
if (data.empty()) return nullptr;
// Tokenize
vector<string> tokens;
stringstream ss(data);
string token;
while (getline(ss, token, ',')) tokens.push_back(token);
// Build tree level by level
TreeNode* root = new TreeNode(stoi(tokens[0]));
queue<TreeNode*> q;
q.push(root);
int i = 1;
while (!q.empty() && i < tokens.size()) {
TreeNode* parent = q.front();
q.pop();
// Left child
if (tokens[i] != "#") {
parent->left = new TreeNode(stoi(tokens[i]));
q.push(parent->left);
}
i++;
// Right child
if (i < tokens.size() && tokens[i] != "#") {
parent->right = new TreeNode(stoi(tokens[i]));
q.push(parent->right);
}
i++;
}
return root;
}
};
Preorder vs Level Order
| Aspect | Preorder (DFS) | Level Order (BFS) |
|---|---|---|
| Encoding | "1,2,#,#,3,4,#,#,5,#,#" |
"1,2,3,#,#,4,5,#,#,#,#" |
| Readability | Less intuitive | Matches visual tree layout |
| Streaming | Can start building immediately | Need all data for full tree |
| Trailing nulls | Always explicit | Can be trimmed |
| Implementation | Recursive (simpler) | Iterative with queue |
| Space for skewed tree | More compact (fewer trailing #) | Many trailing # values |
BST Serialization — Preorder Without Nulls
The Key Insight
A Binary Search Tree has an extra constraint: left < root < right. This means the structure is fully determined by the preorder traversal alone — no null markers needed.
Why? Given a preorder sequence, you can determine the split point between left and right subtrees using the BST property: all values less than root go left, all greater go right.
BST: 5 Preorder: "5,3,1,4,8,7,9"
/ \
3 8 (No null markers — 7 values instead of 15!)
/ \ / \
1 4 7 9
LeetCode 449: Serialize and Deserialize BST
Implementation
class Codec {
public:
string serialize(TreeNode* root) {
string s;
serializeHelper(root, s);
return s;
}
TreeNode* deserialize(string data) {
if (data.empty()) return nullptr;
queue<int> q;
stringstream ss(data);
string token;
while (getline(ss, token, ',')) q.push(stoi(token));
return buildBST(q, INT_MIN, INT_MAX);
}
private:
void serializeHelper(TreeNode* node, string& s) {
if (!node) return;
if (!s.empty()) s += ',';
s += to_string(node->val);
serializeHelper(node->left, s);
serializeHelper(node->right, s);
}
TreeNode* buildBST(queue<int>& q, int minVal, int maxVal) {
if (q.empty() || q.front() < minVal || q.front() > maxVal) {
return nullptr;
}
int val = q.front();
q.pop();
TreeNode* node = new TreeNode(val);
node->left = buildBST(q, minVal, val); // left subtree: values in (min, val)
node->right = buildBST(q, val, maxVal); // right subtree: values in (val, max)
return node;
}
};
Why This Is More Efficient
For a BST with n nodes:
- Generic binary tree serialization: 2n + 1 tokens (n values + n+1 null markers)
- BST serialization: exactly n tokens
This matters for large, sparse trees where null markers dominate the serialized size.
Step-by-Step Deserialization Trace
Input: "5,3,1,4,8,7,9"
Queue: [5, 3, 1, 4, 8, 7, 9]
buildBST(q, -INF, +INF)
pop 5 → node(5), range (-INF, +INF)
left = buildBST(q, -INF, 5)
pop 3 → node(3), range (-INF, 5)
left = buildBST(q, -INF, 3)
pop 1 → node(1), range (-INF, 3)
left = buildBST(q, -INF, 1)
front=4 > 1 → return null
right = buildBST(q, 1, 3)
front=4 > 3 → return null
right = buildBST(q, 3, 5)
pop 4 → node(4), range (3, 5)
left = buildBST(q, 3, 4)
front=8 > 4 → return null
right = buildBST(q, 4, 5)
front=8 > 5 → return null
right = buildBST(q, 5, +INF)
pop 8 → node(8), range (5, +INF)
left = buildBST(q, 5, 8)
pop 7 → node(7), range (5, 8)
left/right → both null (front=9 out of range)
right = buildBST(q, 8, +INF)
pop 9 → node(9), range (8, +INF)
left/right → both null (queue empty)
Result: 5
/ \
3 8
/ \ / \
1 4 7 9 ✓
N-ary Tree Serialization
The Challenge
Each node can have any number of children. We need to encode how many children each node has.
Strategy: Preorder with Child Count
N-ary tree: 1 Serialize: "1 3 2 0 3 2 6 0 7 0 4 0"
/ | \ ↑ ↑
2 3 4 value=1, 3 children
/ \
6 7
Each node is stored as value childCount, then recursively its children.
LeetCode 428: Serialize and Deserialize N-ary Tree
Implementation
class Codec {
public:
string serialize(Node* root) {
if (!root) return "";
string s;
serializeDFS(root, s);
return s;
}
Node* deserialize(string data) {
if (data.empty()) return nullptr;
queue<string> tokens;
stringstream ss(data);
string token;
while (ss >> token) tokens.push(token);
return deserializeDFS(tokens);
}
private:
void serializeDFS(Node* node, string& s) {
if (!s.empty()) s += ' ';
s += to_string(node->val) + ' ' + to_string(node->children.size());
for (Node* child : node->children) {
serializeDFS(child, s);
}
}
Node* deserializeDFS(queue<string>& tokens) {
if (tokens.empty()) return nullptr;
int val = stoi(tokens.front()); tokens.pop();
int childCount = stoi(tokens.front()); tokens.pop();
Node* node = new Node(val);
for (int i = 0; i < childCount; i++) {
node->children.push_back(deserializeDFS(tokens));
}
return node;
}
};
Alternative: Sentinel-Based (No Child Count)
Use a sentinel to mark the end of a node's children:
// Serialize: "1 2 $ 3 6 $ 7 $ $ 4 $ $"
// $ means "this node has no more children, go back to parent"
void serializeDFS(Node* node, string& s) {
s += to_string(node->val) + ' ';
for (Node* child : node->children) {
serializeDFS(child, s);
}
s += "$ "; // end-of-children marker
}
Classic Problems with Full Solutions
Problem 1: Serialize and Deserialize Binary Tree (LC 297)
Problem: Design an algorithm to serialize and deserialize a binary tree.
This is the preorder approach covered above. Here's the complete solution with edge cases:
class Codec {
public:
string serialize(TreeNode* root) {
if (!root) return "#";
return to_string(root->val) + ","
+ serialize(root->left) + ","
+ serialize(root->right);
}
TreeNode* deserialize(string data) {
queue<string> tokens;
stringstream ss(data);
string token;
while (getline(ss, token, ',')) tokens.push(token);
return build(tokens);
}
private:
TreeNode* build(queue<string>& q) {
string val = q.front(); q.pop();
if (val == "#") return nullptr;
TreeNode* node = new TreeNode(stoi(val));
node->left = build(q);
node->right = build(q);
return node;
}
};
// Time: O(n), Space: O(n)
Problem 2: Encode and Decode Strings (LC 271)
Problem: Design an algorithm to encode a list of strings to a string and decode it back. The strings may contain any character.
class Codec {
public:
string encode(vector<string>& strs) {
string result;
for (const string& s : strs) {
result += to_string(s.size()) + '#' + s;
}
return result;
}
vector<string> decode(string s) {
vector<string> result;
int i = 0;
while (i < s.size()) {
int j = s.find('#', i);
int len = stoi(s.substr(i, j - i));
result.push_back(s.substr(j + 1, len));
i = j + 1 + len;
}
return result;
}
};
// Time: O(n), Space: O(n)
Problem 3: Serialize and Deserialize BST (LC 449)
Problem: Same as LC 297 but for a BST — exploit the BST property for a more compact encoding.
class Codec {
public:
string serialize(TreeNode* root) {
string s;
preorder(root, s);
return s;
}
TreeNode* deserialize(string data) {
if (data.empty()) return nullptr;
queue<int> q;
stringstream ss(data);
string tok;
while (getline(ss, tok, ',')) q.push(stoi(tok));
return build(q, INT_MIN, INT_MAX);
}
private:
void preorder(TreeNode* node, string& s) {
if (!node) return;
if (!s.empty()) s += ',';
s += to_string(node->val);
preorder(node->left, s);
preorder(node->right, s);
}
TreeNode* build(queue<int>& q, int lo, int hi) {
if (q.empty() || q.front() < lo || q.front() > hi) return nullptr;
int val = q.front(); q.pop();
TreeNode* node = new TreeNode(val);
node->left = build(q, lo, val);
node->right = build(q, val, hi);
return node;
}
};
// Time: O(n), Space: O(n) — but serialized string is ~50% smaller than LC 297
Problem 4: Verify Preorder Serialization of a Binary Tree (LC 331)
Problem: Given a string of comma-separated values, verify whether it is a correct preorder serialization of a binary tree without reconstructing the tree.
Key Insight: Use a slot counting approach. Each non-null node consumes one slot and produces two. Each null consumes one slot and produces zero. A valid serialization starts with one slot and ends with zero.
bool isValidSerialization(string preorder) {
int slots = 1; // root gets one slot
stringstream ss(preorder);
string token;
while (getline(ss, token, ',')) {
slots--; // every node (null or not) consumes a slot
if (slots < 0) return false; // more nodes than slots
if (token != "#") {
slots += 2; // non-null node creates two child slots
}
}
return slots == 0; // all slots must be filled
}
// Time: O(n), Space: O(1) — no tree construction needed!
Trace:
"9,3,4,#,#,1,#,#,2,#,6,#,#"
Token | Action | Slots
--------|-------------------|------
(start) | | 1
9 | -1 +2 | 2
3 | -1 +2 | 3
4 | -1 +2 | 4
# | -1 | 3
# | -1 | 2
1 | -1 +2 | 3
# | -1 | 2
# | -1 | 1
2 | -1 +2 | 2
# | -1 | 1
6 | -1 +2 | 2
# | -1 | 1
# | -1 | 0 ← valid!
Problem 5: Construct Binary Tree from String (LC 536)
Problem: Construct a binary tree from a string consisting of parentheses and integers. The format is "root(left)(right)".
Example: "4(2(3)(1))(6(5))" builds:
4
/ \
2 6
/ \ /
3 1 5
class Solution {
public:
TreeNode* str2tree(string s) {
int i = 0;
return build(s, i);
}
private:
TreeNode* build(string& s, int& i) {
if (i >= s.size()) return nullptr;
// Parse the number (may be negative)
int sign = 1;
if (s[i] == '-') { sign = -1; i++; }
int num = 0;
while (i < s.size() && isdigit(s[i])) {
num = num * 10 + (s[i] - '0');
i++;
}
TreeNode* node = new TreeNode(sign * num);
// If '(' follows, parse left child
if (i < s.size() && s[i] == '(') {
i++; // skip '('
node->left = build(s, i);
i++; // skip ')'
}
// If another '(' follows, parse right child
if (i < s.size() && s[i] == '(') {
i++; // skip '('
node->right = build(s, i);
i++; // skip ')'
}
return node;
}
};
// Time: O(n), Space: O(h)
Choosing the Right Strategy — Decision Framework
What are you serializing?
│
├── List of strings with arbitrary content?
│ └── Length-prefix encoding (len#string)
│
├── Array of integers/fixed-format values?
│ └── Delimiter encoding (comma-separated)
│
├── Binary Search Tree?
│ └── Preorder without nulls (exploit BST property)
│
├── General binary tree?
│ ├── Need human readability? → Level-order (BFS)
│ └── Need simplicity/streaming? → Preorder with nulls (DFS)
│
├── N-ary tree?
│ └── Preorder with child count OR sentinel-based
│
└── Need to verify without reconstructing?
└── Slot counting (LC 331 approach)
Common Mistakes
1. Forgetting null markers for general binary trees
Without nulls, preorder [1, 2] is ambiguous:
1 1
/ or \
2 2
Always mark null children unless the structure has a property (like BST ordering) that makes it unnecessary.
2. Using a delimiter that appears in the data
Strings: ["a,b", "c"] Encoded: "a,b,c" Decoded: ["a", "b", "c"] ← WRONG
Use length-prefix encoding for arbitrary string data, not delimiters.
3. Off-by-one in BFS deserialization
The parent queue and token index must advance in lockstep — each dequeued parent consumes exactly two tokens (left and right child). Missing one causes the entire tree to shift.
4. Not handling negative numbers
stoi("-5") works, but if you're parsing character by character, remember to check for a leading -.
5. Trailing delimiter issues
// BAD: "1,2,3," has an empty trailing token
// GOOD: "1,2,3" or check for empty tokens during parsing
Summary Cheat Sheet
| Data Structure | Strategy | Serialized Format | Key Idea |
|---|---|---|---|
| String list | Length-prefix | "5#hello5#world" |
Length makes content safe |
| Integer array | Delimiter | "1,23,456" |
Restricted alphabet → delimiter OK |
| Binary tree (general) | Preorder + nulls | "1,2,#,#,3,#,#" |
Null markers encode shape |
| Binary tree (general) | Level order | "1,2,3,#,#,#,#" |
BFS order, human-readable |
| BST | Preorder only | "5,3,1,4,8,7,9" |
BST property determines structure |
| N-ary tree | Preorder + child count | "1 3 2 0 3 2" |
Child count replaces null markers |
| Validation only | Slot counting | O(1) space | No tree needed |
The universal principle: serialization is about encoding just enough information to reconstruct the original structure unambiguously — no more, no less. For trees, that means encoding the shape alongside the values. The more structural invariants your data has (like BST ordering), the less explicit shape information you need.