Encoding & Transformation

Serialization & Deserialization

Table of Contents

Converting data structures to flat strings and reconstructing them perfectly — the technique behind LeetCode's tree format, network protocols, and every JSON.stringify call you've ever made.



What Is Serialization?

Serialization converts a data structure or object into a sequence of bits (typically a string) so it can be stored in a file, sent over a network, or passed between processes.

Deserialization reverses the process — reconstructing the original data structure from the flat representation.

The core challenge: the serialized format must capture enough information to reconstruct the original structure unambiguously. A serialized binary tree needs to encode not just the values but also the shape — which nodes are missing, which child is left vs. right.

Original Structure          Serialize →        "1,2,#,#,3,#,#"
     1                                         
    / \                     ← Deserialize       
   2   3                                       

Key properties of a good serialization scheme:

Property Why it matters
Lossless Must reconstruct the exact original structure
Unambiguous One string maps to exactly one structure
Compact Minimize space overhead
Streamable Can begin decoding before receiving the entire string

String Serialization — Length-Prefix Encoding

The Problem

Encode a list of strings into a single string, then decode it back. The strings can contain any character — including your delimiter.

LeetCode 271: Encode and Decode Strings

Why Delimiters Alone Fail

If you join with ,, how do you decode "a,b,c"? Was the input ["a", "b", "c"] or ["a,b", "c"] or ["a,b,c"]? Ambiguous.

The Fix: Length-Prefix Encoding

Prepend each string with its length followed by a fixed separator (like #). The length tells you exactly how many characters to read — no ambiguity regardless of content.

Input:  ["hello", "world", "a,b"]

Encode: "5#hello5#world3#a,b"
         ↑             ↑
     len=5, read 5   len=3, read 3 chars including the comma

Implementation

class Codec {
public:
    // Encodes a list of strings to a single string
    string encode(vector<string>& strs) {
        string encoded;
        for (const string& s : strs) {
            encoded += to_string(s.size()) + '#' + s;
        }
        return encoded;
    }

    // Decodes a single string to a list of strings
    vector<string> decode(string s) {
        vector<string> result;
        int i = 0;
        while (i < s.size()) {
            int j = s.find('#', i);                  // find the separator
            int len = stoi(s.substr(i, j - i));      // extract length
            result.push_back(s.substr(j + 1, len));  // extract string
            i = j + 1 + len;                         // advance past this string
        }
        return result;
    }
};

Complexity

Time Space
Encode O(n) O(n)
Decode O(n) O(n)

Where n is total characters across all strings.

Why This Works

The length field is self-delimiting — you know where the payload ends without scanning for a separator character. This makes it safe for arbitrary content, including binary data and strings that contain # itself.


Array Serialization — Delimiter Encoding

Simple Case: Integer Arrays

When elements come from a restricted alphabet (e.g. integers), a plain delimiter works:

// Encode: [1, 23, 456] → "1,23,456"
string encode(vector<int>& nums) {
    string s;
    for (int i = 0; i < nums.size(); i++) {
        if (i > 0) s += ',';
        s += to_string(nums[i]);
    }
    return s;
}

// Decode: "1,23,456" → [1, 23, 456]
vector<int> decode(string s) {
    vector<int> result;
    stringstream ss(s);
    string token;
    while (getline(ss, token, ',')) {
        result.push_back(stoi(token));
    }
    return result;
}

This is safe because integers never contain ,. For string arrays, use length-prefix encoding from the previous section.

Nested Arrays (Run-Length Encoding Variant)

For nested structures like [[1,2],[3],[4,5,6]], use a bracket-counting or recursive approach:

// Encode nested: use bracket notation directly
// "[[1,2],[3],[4,5,6]]"
// Decode by parsing brackets with a stack

vector<vector<int>> decodeNested(string s) {
    vector<vector<int>> result;
    int i = 1;  // skip outer '['
    while (i < s.size() - 1) {
        if (s[i] == '[') {
            vector<int> inner;
            i++;  // skip '['
            while (s[i] != ']') {
                int j = i;
                while (j < s.size() && s[j] != ',' && s[j] != ']') j++;
                inner.push_back(stoi(s.substr(i, j - i)));
                i = (s[j] == ',') ? j + 1 : j;
            }
            result.push_back(inner);
            i++;  // skip ']'
            if (i < s.size() && s[i] == ',') i++;  // skip ','
        }
    }
    return result;
}

Binary Tree Serialization — Preorder with Nulls

The Core Idea

A preorder traversal (Root -> Left -> Right) that explicitly marks null children with a sentinel (like #) produces an unambiguous encoding. The null markers capture the tree's shape.

Tree:       1                Preorder with nulls:
           / \
          2   3              "1,2,#,#,3,4,#,#,5,#,#"
             / \
            4   5

LeetCode 297: Serialize and Deserialize Binary Tree

Why Preorder + Nulls Works

Without null markers, you can't distinguish these two trees:

Tree A:    1          Tree B:    1
          /                       \
         2                         2

Both give preorder: [1, 2]
With nulls: A = "1,2,#,#,#"    B = "1,#,2,#,#"   ← now distinct

The null markers encode the structure. Since preorder visits the root first, the deserializer knows the first value is the root, then recursively builds left (until it hits #), then right.

Implementation

class Codec {
public:
    // Serialize: preorder DFS, null nodes marked as "#"
    string serialize(TreeNode* root) {
        if (!root) return "#";
        return to_string(root->val) + "," 
             + serialize(root->left) + "," 
             + serialize(root->right);
    }

    // Deserialize: consume tokens left to right
    TreeNode* deserialize(string data) {
        queue<string> tokens;
        stringstream ss(data);
        string token;
        while (getline(ss, token, ',')) {
            tokens.push(token);
        }
        return buildTree(tokens);
    }

private:
    TreeNode* buildTree(queue<string>& tokens) {
        string val = tokens.front();
        tokens.pop();
        
        if (val == "#") return nullptr;
        
        TreeNode* node = new TreeNode(stoi(val));
        node->left = buildTree(tokens);    // recursion consumes left subtree
        node->right = buildTree(tokens);   // then right subtree
        return node;
    }
};

Step-by-Step Trace

Serialize tree:      1
                    / \
                   2   3
                      / \
                     4   5

serialize(1)
  = "1," + serialize(2) + "," + serialize(3)
  = "1," + "2," + serialize(null) + "," + serialize(null) + "," + serialize(3)
  = "1," + "2,#,#," + "3," + serialize(4) + "," + serialize(5)
  = "1," + "2,#,#," + "3," + "4,#,#," + "5,#,#"
  = "1,2,#,#,3,4,#,#,5,#,#"

Deserialize "1,2,#,#,3,4,#,#,5,#,#":
  tokens: [1, 2, #, #, 3, 4, #, #, 5, #, #]
  
  build() → pop "1" → node(1)
    node(1).left = build() → pop "2" → node(2)
      node(2).left = build() → pop "#" → null
      node(2).right = build() → pop "#" → null
    node(1).right = build() → pop "3" → node(3)
      node(3).left = build() → pop "4" → node(4)
        node(4).left = build() → pop "#" → null
        node(4).right = build() → pop "#" → null
      node(3).right = build() → pop "5" → node(5)
        node(5).left = build() → pop "#" → null
        node(5).right = build() → pop "#" → null

Complexity

Time Space
Serialize O(n) O(n) for the string + O(h) call stack
Deserialize O(n) O(n) for queue + O(h) call stack

Where n = number of nodes, h = height of tree.


Binary Tree Serialization — Level Order (BFS)

Why Level Order?

This is the format LeetCode uses: [1,2,3,null,null,4,5]. It's more human-readable for balanced trees and naturally maps to BFS.

Implementation

class Codec {
public:
    string serialize(TreeNode* root) {
        if (!root) return "";
        string result;
        queue<TreeNode*> q;
        q.push(root);
        
        while (!q.empty()) {
            TreeNode* node = q.front();
            q.pop();
            
            if (!result.empty()) result += ",";
            
            if (node) {
                result += to_string(node->val);
                q.push(node->left);
                q.push(node->right);
            } else {
                result += "#";
            }
        }
        
        // Optional: trim trailing nulls for compactness
        return result;
    }

    TreeNode* deserialize(string data) {
        if (data.empty()) return nullptr;
        
        // Tokenize
        vector<string> tokens;
        stringstream ss(data);
        string token;
        while (getline(ss, token, ',')) tokens.push_back(token);
        
        // Build tree level by level
        TreeNode* root = new TreeNode(stoi(tokens[0]));
        queue<TreeNode*> q;
        q.push(root);
        int i = 1;
        
        while (!q.empty() && i < tokens.size()) {
            TreeNode* parent = q.front();
            q.pop();
            
            // Left child
            if (tokens[i] != "#") {
                parent->left = new TreeNode(stoi(tokens[i]));
                q.push(parent->left);
            }
            i++;
            
            // Right child
            if (i < tokens.size() && tokens[i] != "#") {
                parent->right = new TreeNode(stoi(tokens[i]));
                q.push(parent->right);
            }
            i++;
        }
        
        return root;
    }
};

Preorder vs Level Order

Aspect Preorder (DFS) Level Order (BFS)
Encoding "1,2,#,#,3,4,#,#,5,#,#" "1,2,3,#,#,4,5,#,#,#,#"
Readability Less intuitive Matches visual tree layout
Streaming Can start building immediately Need all data for full tree
Trailing nulls Always explicit Can be trimmed
Implementation Recursive (simpler) Iterative with queue
Space for skewed tree More compact (fewer trailing #) Many trailing # values

BST Serialization — Preorder Without Nulls

The Key Insight

A Binary Search Tree has an extra constraint: left < root < right. This means the structure is fully determined by the preorder traversal alone — no null markers needed.

Why? Given a preorder sequence, you can determine the split point between left and right subtrees using the BST property: all values less than root go left, all greater go right.

BST:        5              Preorder: "5,3,1,4,8,7,9"
           / \             
          3   8            (No null markers — 7 values instead of 15!)
         / \ / \
        1  4 7  9

LeetCode 449: Serialize and Deserialize BST

Implementation

class Codec {
public:
    string serialize(TreeNode* root) {
        string s;
        serializeHelper(root, s);
        return s;
    }

    TreeNode* deserialize(string data) {
        if (data.empty()) return nullptr;
        queue<int> q;
        stringstream ss(data);
        string token;
        while (getline(ss, token, ',')) q.push(stoi(token));
        return buildBST(q, INT_MIN, INT_MAX);
    }

private:
    void serializeHelper(TreeNode* node, string& s) {
        if (!node) return;
        if (!s.empty()) s += ',';
        s += to_string(node->val);
        serializeHelper(node->left, s);
        serializeHelper(node->right, s);
    }

    TreeNode* buildBST(queue<int>& q, int minVal, int maxVal) {
        if (q.empty() || q.front() < minVal || q.front() > maxVal) {
            return nullptr;
        }
        
        int val = q.front();
        q.pop();
        
        TreeNode* node = new TreeNode(val);
        node->left = buildBST(q, minVal, val);    // left subtree: values in (min, val)
        node->right = buildBST(q, val, maxVal);   // right subtree: values in (val, max)
        return node;
    }
};

Why This Is More Efficient

For a BST with n nodes:

  • Generic binary tree serialization: 2n + 1 tokens (n values + n+1 null markers)
  • BST serialization: exactly n tokens

This matters for large, sparse trees where null markers dominate the serialized size.

Step-by-Step Deserialization Trace

Input: "5,3,1,4,8,7,9"
Queue: [5, 3, 1, 4, 8, 7, 9]

buildBST(q, -INF, +INF)
  pop 5 → node(5), range (-INF, +INF)
    left = buildBST(q, -INF, 5)
      pop 3 → node(3), range (-INF, 5)
        left = buildBST(q, -INF, 3)
          pop 1 → node(1), range (-INF, 3)
            left = buildBST(q, -INF, 1)
              front=4 > 1 → return null
            right = buildBST(q, 1, 3)
              front=4 > 3 → return null
        right = buildBST(q, 3, 5)
          pop 4 → node(4), range (3, 5)
            left = buildBST(q, 3, 4)
              front=8 > 4 → return null
            right = buildBST(q, 4, 5)
              front=8 > 5 → return null
    right = buildBST(q, 5, +INF)
      pop 8 → node(8), range (5, +INF)
        left = buildBST(q, 5, 8)
          pop 7 → node(7), range (5, 8)
            left/right → both null (front=9 out of range)
        right = buildBST(q, 8, +INF)
          pop 9 → node(9), range (8, +INF)
            left/right → both null (queue empty)

Result:     5
           / \
          3   8
         / \ / \
        1  4 7  9  ✓

N-ary Tree Serialization

The Challenge

Each node can have any number of children. We need to encode how many children each node has.

Strategy: Preorder with Child Count

N-ary tree:       1              Serialize: "1 3 2 0 3 2 6 0 7 0 4 0"
               /  |  \                       ↑ ↑
              2   3   4          value=1, 3 children
                 / \
                6   7

Each node is stored as value childCount, then recursively its children.

LeetCode 428: Serialize and Deserialize N-ary Tree

Implementation

class Codec {
public:
    string serialize(Node* root) {
        if (!root) return "";
        string s;
        serializeDFS(root, s);
        return s;
    }

    Node* deserialize(string data) {
        if (data.empty()) return nullptr;
        queue<string> tokens;
        stringstream ss(data);
        string token;
        while (ss >> token) tokens.push(token);
        return deserializeDFS(tokens);
    }

private:
    void serializeDFS(Node* node, string& s) {
        if (!s.empty()) s += ' ';
        s += to_string(node->val) + ' ' + to_string(node->children.size());
        for (Node* child : node->children) {
            serializeDFS(child, s);
        }
    }

    Node* deserializeDFS(queue<string>& tokens) {
        if (tokens.empty()) return nullptr;
        
        int val = stoi(tokens.front()); tokens.pop();
        int childCount = stoi(tokens.front()); tokens.pop();
        
        Node* node = new Node(val);
        for (int i = 0; i < childCount; i++) {
            node->children.push_back(deserializeDFS(tokens));
        }
        return node;
    }
};

Alternative: Sentinel-Based (No Child Count)

Use a sentinel to mark the end of a node's children:

// Serialize: "1 2 $ 3 6 $ 7 $ $ 4 $ $"
// $ means "this node has no more children, go back to parent"

void serializeDFS(Node* node, string& s) {
    s += to_string(node->val) + ' ';
    for (Node* child : node->children) {
        serializeDFS(child, s);
    }
    s += "$ ";  // end-of-children marker
}

Classic Problems with Full Solutions

Problem 1: Serialize and Deserialize Binary Tree (LC 297)

Problem: Design an algorithm to serialize and deserialize a binary tree.

This is the preorder approach covered above. Here's the complete solution with edge cases:

class Codec {
public:
    string serialize(TreeNode* root) {
        if (!root) return "#";
        return to_string(root->val) + "," 
             + serialize(root->left) + "," 
             + serialize(root->right);
    }

    TreeNode* deserialize(string data) {
        queue<string> tokens;
        stringstream ss(data);
        string token;
        while (getline(ss, token, ',')) tokens.push(token);
        return build(tokens);
    }

private:
    TreeNode* build(queue<string>& q) {
        string val = q.front(); q.pop();
        if (val == "#") return nullptr;
        TreeNode* node = new TreeNode(stoi(val));
        node->left = build(q);
        node->right = build(q);
        return node;
    }
};
// Time: O(n), Space: O(n)

Problem 2: Encode and Decode Strings (LC 271)

Problem: Design an algorithm to encode a list of strings to a string and decode it back. The strings may contain any character.

class Codec {
public:
    string encode(vector<string>& strs) {
        string result;
        for (const string& s : strs) {
            result += to_string(s.size()) + '#' + s;
        }
        return result;
    }

    vector<string> decode(string s) {
        vector<string> result;
        int i = 0;
        while (i < s.size()) {
            int j = s.find('#', i);
            int len = stoi(s.substr(i, j - i));
            result.push_back(s.substr(j + 1, len));
            i = j + 1 + len;
        }
        return result;
    }
};
// Time: O(n), Space: O(n)

Problem 3: Serialize and Deserialize BST (LC 449)

Problem: Same as LC 297 but for a BST — exploit the BST property for a more compact encoding.

class Codec {
public:
    string serialize(TreeNode* root) {
        string s;
        preorder(root, s);
        return s;
    }

    TreeNode* deserialize(string data) {
        if (data.empty()) return nullptr;
        queue<int> q;
        stringstream ss(data);
        string tok;
        while (getline(ss, tok, ',')) q.push(stoi(tok));
        return build(q, INT_MIN, INT_MAX);
    }

private:
    void preorder(TreeNode* node, string& s) {
        if (!node) return;
        if (!s.empty()) s += ',';
        s += to_string(node->val);
        preorder(node->left, s);
        preorder(node->right, s);
    }

    TreeNode* build(queue<int>& q, int lo, int hi) {
        if (q.empty() || q.front() < lo || q.front() > hi) return nullptr;
        int val = q.front(); q.pop();
        TreeNode* node = new TreeNode(val);
        node->left = build(q, lo, val);
        node->right = build(q, val, hi);
        return node;
    }
};
// Time: O(n), Space: O(n) — but serialized string is ~50% smaller than LC 297

Problem 4: Verify Preorder Serialization of a Binary Tree (LC 331)

Problem: Given a string of comma-separated values, verify whether it is a correct preorder serialization of a binary tree without reconstructing the tree.

Key Insight: Use a slot counting approach. Each non-null node consumes one slot and produces two. Each null consumes one slot and produces zero. A valid serialization starts with one slot and ends with zero.

bool isValidSerialization(string preorder) {
    int slots = 1;  // root gets one slot
    
    stringstream ss(preorder);
    string token;
    while (getline(ss, token, ',')) {
        slots--;  // every node (null or not) consumes a slot
        
        if (slots < 0) return false;  // more nodes than slots
        
        if (token != "#") {
            slots += 2;  // non-null node creates two child slots
        }
    }
    
    return slots == 0;  // all slots must be filled
}
// Time: O(n), Space: O(1) — no tree construction needed!

Trace:

"9,3,4,#,#,1,#,#,2,#,6,#,#"

Token   | Action            | Slots
--------|-------------------|------
(start) |                   | 1
9       | -1 +2             | 2
3       | -1 +2             | 3
4       | -1 +2             | 4
#       | -1                | 3
#       | -1                | 2
1       | -1 +2             | 3
#       | -1                | 2
#       | -1                | 1
2       | -1 +2             | 2
#       | -1                | 1
6       | -1 +2             | 2
#       | -1                | 1
#       | -1                | 0  ← valid!

Problem 5: Construct Binary Tree from String (LC 536)

Problem: Construct a binary tree from a string consisting of parentheses and integers. The format is "root(left)(right)".

Example: "4(2(3)(1))(6(5))" builds:

        4
       / \
      2   6
     / \ /
    3  1 5
class Solution {
public:
    TreeNode* str2tree(string s) {
        int i = 0;
        return build(s, i);
    }

private:
    TreeNode* build(string& s, int& i) {
        if (i >= s.size()) return nullptr;
        
        // Parse the number (may be negative)
        int sign = 1;
        if (s[i] == '-') { sign = -1; i++; }
        int num = 0;
        while (i < s.size() && isdigit(s[i])) {
            num = num * 10 + (s[i] - '0');
            i++;
        }
        
        TreeNode* node = new TreeNode(sign * num);
        
        // If '(' follows, parse left child
        if (i < s.size() && s[i] == '(') {
            i++;  // skip '('
            node->left = build(s, i);
            i++;  // skip ')'
        }
        
        // If another '(' follows, parse right child
        if (i < s.size() && s[i] == '(') {
            i++;  // skip '('
            node->right = build(s, i);
            i++;  // skip ')'
        }
        
        return node;
    }
};
// Time: O(n), Space: O(h)

Choosing the Right Strategy — Decision Framework

What are you serializing?
│
├── List of strings with arbitrary content?
│   └── Length-prefix encoding (len#string)
│
├── Array of integers/fixed-format values?
│   └── Delimiter encoding (comma-separated)
│
├── Binary Search Tree?
│   └── Preorder without nulls (exploit BST property)
│
├── General binary tree?
│   ├── Need human readability? → Level-order (BFS)
│   └── Need simplicity/streaming? → Preorder with nulls (DFS)
│
├── N-ary tree?
│   └── Preorder with child count OR sentinel-based
│
└── Need to verify without reconstructing?
    └── Slot counting (LC 331 approach)

Common Mistakes

1. Forgetting null markers for general binary trees

Without nulls, preorder [1, 2] is ambiguous:
    1           1
   /     or      \
  2               2

Always mark null children unless the structure has a property (like BST ordering) that makes it unnecessary.

2. Using a delimiter that appears in the data

Strings: ["a,b", "c"]    Encoded: "a,b,c"    Decoded: ["a", "b", "c"] ← WRONG

Use length-prefix encoding for arbitrary string data, not delimiters.

3. Off-by-one in BFS deserialization

The parent queue and token index must advance in lockstep — each dequeued parent consumes exactly two tokens (left and right child). Missing one causes the entire tree to shift.

4. Not handling negative numbers

stoi("-5") works, but if you're parsing character by character, remember to check for a leading -.

5. Trailing delimiter issues

// BAD: "1,2,3," has an empty trailing token
// GOOD: "1,2,3" or check for empty tokens during parsing

Summary Cheat Sheet

Data Structure Strategy Serialized Format Key Idea
String list Length-prefix "5#hello5#world" Length makes content safe
Integer array Delimiter "1,23,456" Restricted alphabet → delimiter OK
Binary tree (general) Preorder + nulls "1,2,#,#,3,#,#" Null markers encode shape
Binary tree (general) Level order "1,2,3,#,#,#,#" BFS order, human-readable
BST Preorder only "5,3,1,4,8,7,9" BST property determines structure
N-ary tree Preorder + child count "1 3 2 0 3 2" Child count replaces null markers
Validation only Slot counting O(1) space No tree needed

The universal principle: serialization is about encoding just enough information to reconstruct the original structure unambiguously — no more, no less. For trees, that means encoding the shape alongside the values. The more structural invariants your data has (like BST ordering), the less explicit shape information you need.