Data Structure Internals

Heap Internals (Priority Queue)

Heap Internals (Priority Queue)

What Is a Heap?

A heap is a complete binary tree stored in an array that satisfies the heap property:

  • Max-Heap: Every parent >= its children. The root holds the maximum.
  • Min-Heap: Every parent <= its children. The root holds the minimum.

Complete binary tree means every level is fully filled except possibly the last, which is filled from left to right. This property is what allows the array representation to work -- there are no "gaps."

50 30 40 10 20 Last level filled left to right 50 30 40 10 20 35 25 Fully filled 50 30 40 10 INVALID — gap on the left

Array Representation

The entire heap lives in a single contiguous array. No pointers, no heap allocation for nodes. This is what makes heaps cache-friendly and fast in practice.

Index formulas (0-indexed):

For node at index i:
  Left child:  2*i + 1
  Right child: 2*i + 2
  Parent:      (i - 1) / 2   (integer division)

Visual mapping:

500 301 402 103 204 355 256 500 301 402 103 204 355 256
Verification:
  parent(1) = (1-1)/2 = 0  -->  arr[0] = 50  >= arr[1] = 30  OK
  parent(2) = (2-1)/2 = 0  -->  arr[0] = 50  >= arr[2] = 40  OK
  parent(3) = (3-1)/2 = 1  -->  arr[1] = 30  >= arr[3] = 10  OK
  parent(4) = (4-1)/2 = 1  -->  arr[1] = 30  >= arr[4] = 20  OK
  parent(5) = (5-1)/2 = 2  -->  arr[2] = 40  >= arr[5] = 35  OK
  parent(6) = (6-1)/2 = 2  -->  arr[2] = 40  >= arr[6] = 25  OK

  left(0)  = 2*0+1 = 1  -->  30  (left child of 50)
  right(0) = 2*0+2 = 2  -->  40  (right child of 50)
  left(1)  = 2*1+1 = 3  -->  10  (left child of 30)
  right(1) = 2*1+2 = 4  -->  20  (right child of 30)

Why this works: In a complete binary tree, if we number nodes level by level from left to right starting at 0, the parent-child relationships follow the formulas above exactly. No node index is "wasted" because there are no gaps in a complete binary tree.

1-indexed variant (sometimes simpler):

For node at index i (1-indexed, root at index 1):
  Left child:  2*i
  Right child: 2*i + 1
  Parent:      i / 2

Core Operations

1. Heapify Up (Bubble Up / Sift Up)

Used after insertion. The new element is placed at the end of the array (to maintain the complete binary tree shape), then it "bubbles up" by repeatedly swapping with its parent until the heap property is restored.

Algorithm:

1. Place new element at index n (end of array)
2. Compare with parent at index (n-1)/2
3. If new element violates heap property, swap with parent
4. Repeat from step 2 with the new position
5. Stop when heap property is satisfied or we reach the root

Detailed walkthrough (Max-Heap): Insert 60 into [50, 30, 40, 10, 20, 35, 25]

Initial heap:
           50
          /  \
        30    40
       / \   / \
      10  20 35  25

Array: [50, 30, 40, 10, 20, 35, 25]

Step 0: Append 60 at index 7
           50
          /  \
        30    40
       / \   / \
      10  20 35  25
     /
    60       <-- added here (leftmost open position)

Array: [50, 30, 40, 10, 20, 35, 25, 60]
                     ^               ^
                  parent(7)=3     new element

Step 1: Compare 60 with parent arr[3]=10.  60 > 10?  YES --> swap
           50
          /  \
        30    40
       / \   / \
     [60] 20 35  25       60 and 10 swapped
     /
   [10]

Array: [50, 30, 40, 60, 20, 35, 25, 10]
             ^       ^
          parent(3)=1   60 is now at index 3

Step 2: Compare 60 with parent arr[1]=30.  60 > 30?  YES --> swap
           50
          /  \
       [60]   40
       / \   / \
      30  20 35  25       60 and 30 swapped
     /
    10

Array: [50, 60, 40, 30, 20, 35, 25, 10]
         ^   ^
      parent(1)=0   60 is now at index 1

Step 3: Compare 60 with parent arr[0]=50.  60 > 50?  YES --> swap
          [60]
          /  \
        50    40
       / \   / \
      30  20 35  25       60 is now the root!
     /
    10

Array: [60, 50, 40, 30, 20, 35, 25, 10]

Step 4: Index 0 is root. DONE.

Total swaps: 3 (height of tree = log(8) = 3)

2. Heapify Down (Bubble Down / Sift Down)

Used after extracting the root (removing the max or min). The last element in the array is moved to the root position, then it "bubbles down" by swapping with the larger (max-heap) or smaller (min-heap) child until the heap property is restored.

Algorithm:

1. Replace root with last element, remove last element
2. Compare current element with both children
3. If either child violates heap property, swap with the LARGER child (max-heap)
   or SMALLER child (min-heap)
4. Repeat from step 2 with the new position
5. Stop when heap property is satisfied or we reach a leaf

Why swap with the larger child (max-heap)? Because we need the new parent to be >= both children. If we swap with the smaller child, the larger child might be bigger than the new parent, violating the heap property.

Detailed walkthrough (Max-Heap): Extract max from [60, 50, 40, 30, 20, 35, 25, 10]

Step 0: Save root value 60 (this is the extracted max).
        Move last element (10) to root. Remove last position.

           10           <-- 10 moved to root
          /  \
        50    40
       / \   / \
      30  20 35  25

Array: [10, 50, 40, 30, 20, 35, 25]

Step 1: Compare 10 with children: left=50, right=40. Largest child = 50 (index 1).
        10 < 50?  YES --> swap

          [50]
          /  \
       [10]   40
       / \   / \
      30  20 35  25

Array: [50, 10, 40, 30, 20, 35, 25]

Step 2: At index 1. Children: left=30 (index 3), right=20 (index 4).
        Largest child = 30.
        10 < 30?  YES --> swap

           50
          /  \
       [30]   40
       / \   / \
    [10]  20 35  25

Array: [50, 30, 40, 10, 20, 35, 25]

Step 3: At index 3. Children: left would be index 7 (out of bounds).
        No children --> leaf node. DONE.

Final heap: [50, 30, 40, 10, 20, 35, 25]

           50
          /  \
        30    40
       / \   / \
      10  20 35  25

Heap property restored! Returned value: 60.
Total swaps: 2

3. Build Heap from Array -- O(n), NOT O(n log n)!

Given an arbitrary array, we can transform it into a valid heap in O(n) time.

The naive approach (WRONG complexity analysis): Insert elements one by one, each taking O(log n). Total: O(n log n). But we can do better!

The smart approach (Floyd's algorithm): Start from the last non-leaf node and call heapify-down on each node, working backwards to the root.

Why start from the last non-leaf? Leaf nodes are already valid heaps (a single element trivially satisfies the heap property). The last non-leaf node has index (n-2)/2 (0-indexed) or n/2 - 1.

Walkthrough: Build max-heap from [4, 10, 3, 5, 1, 8, 7]

Initial array (arbitrary order):
           4
          / \
        10    3
       / \   / \
      5   1 8   7

Array: [4, 10, 3, 5, 1, 8, 7]
Last non-leaf: index (7-2)/2 = 2  (value 3)

Step 1: heapify_down(index 2)  -- value 3, children: 8 (idx 5), 7 (idx 6)
        Largest child = 8 > 3 --> swap

           4
          / \
        10   [8]
       / \   / \
      5   1 [3]  7

Array: [4, 10, 8, 5, 1, 3, 7]

Step 2: heapify_down(index 1)  -- value 10, children: 5 (idx 3), 1 (idx 4)
        Largest child = 5 < 10 --> no swap needed

           4
          / \
        10    8
       / \   / \
      5   1 3   7

Array: [4, 10, 8, 5, 1, 3, 7]

Step 3: heapify_down(index 0)  -- value 4, children: 10 (idx 1), 8 (idx 2)
        Largest child = 10 > 4 --> swap

          [10]
          /  \
        [4]    8
       / \   / \
      5   1 3   7

        Continue: value 4 at index 1, children: 5 (idx 3), 1 (idx 4)
        Largest child = 5 > 4 --> swap

           10
          /  \
        [5]    8
       / \   / \
     [4]  1 3   7

        Continue: value 4 at index 3, children: index 7 (out of bounds)
        Leaf node --> done

Final heap: [10, 5, 8, 4, 1, 3, 7]

           10
          /  \
         5    8
       / \   / \
      4   1 3   7

Valid max-heap!

Why Is Build Heap O(n) and Not O(n log n)?

This is one of the most elegant results in data structures. The key insight is that most nodes are near the bottom of the tree and require very few swaps.

Level Nodes Max Swaps Each Total Work 0 (root)1log(n)1 * log(n) 12log(n) - 12 * (log(n)-1) 24log(n) - 24 * (log(n)-2) ............ log(n) - 1n/41n/4 * 1 log(n)n/2 (leaves)00 The most nodes do the LEAST work!

Formal calculation:

Total work = sum over k = 0 to log(n) of: (nodes at level k) * (height - k)
           = sum over k = 0 to h of: 2^k * (h - k)       where h = log(n)

Let j = h - k (number of swaps needed):
           = sum over j = 0 to h of: 2^(h-j) * j
           = 2^h * sum over j = 0 to h of: j / 2^j

The series sum(j / 2^j) converges:
  S = 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + ...

To find S, use the standard technique:
  S   = 1/2 + 2/4 + 3/8 + 4/16 + ...
  S/2 =       1/4 + 2/8 + 3/16 + ...
  S - S/2 = 1/2 + 1/4 + 1/8 + 1/16 + ... = 1

So S/2 = 1, giving S = 2.

Total work = 2^h * 2 = 2 * n = O(n)

Intuition: Half the nodes are leaves (0 work). A quarter do 1 swap. An eighth do 2 swaps. The work decreases geometrically, summing to a constant times n.


Complete C++ Implementation

CPP
#include <bits/stdc++.h>
using namespace std;

class MinHeap {
    vector<int> heap;

    void heapifyUp(int i) {
        while (i > 0) {
            int par = (i - 1) / 2;
            if (heap[par] > heap[i]) {
                swap(heap[par], heap[i]);
                i = par;
            } else {
                break;
            }
        }
    }

    void heapifyDown(int i) {
        int n = heap.size();
        while (true) {
            int smallest = i;
            int left = 2 * i + 1;
            int right = 2 * i + 2;

            if (left < n && heap[left] < heap[smallest])
                smallest = left;
            if (right < n && heap[right] < heap[smallest])
                smallest = right;

            if (smallest == i) break;   // heap property satisfied
            swap(heap[i], heap[smallest]);
            i = smallest;
        }
    }

public:
    // Default constructor: empty heap
    MinHeap() {}

    // Build heap from array in O(n)
    MinHeap(vector<int>& arr) : heap(arr) {
        // Start from last non-leaf and heapify down
        for (int i = (int)heap.size() / 2 - 1; i >= 0; i--)
            heapifyDown(i);
    }

    // Insert: O(log n)
    void push(int val) {
        heap.push_back(val);
        heapifyUp(heap.size() - 1);
    }

    // Get minimum: O(1)
    int top() const {
        if (heap.empty()) throw runtime_error("Heap is empty");
        return heap[0];
    }

    // Remove minimum: O(log n)
    void pop() {
        if (heap.empty()) throw runtime_error("Heap is empty");
        heap[0] = heap.back();
        heap.pop_back();
        if (!heap.empty()) heapifyDown(0);
    }

    bool empty() const { return heap.empty(); }
    int size() const { return heap.size(); }

    // Peek at internal array (for debugging)
    void print() const {
        for (int x : heap) cout << x << " ";
        cout << endl;
    }
};

// Usage
int main() {
    MinHeap h;
    h.push(40);
    h.push(10);
    h.push(30);
    h.push(5);
    h.push(20);

    cout << "Min: " << h.top() << endl;  // 5

    h.pop();
    cout << "Min after pop: " << h.top() << endl;  // 10

    // Build from array
    vector<int> arr = {4, 10, 3, 5, 1, 8, 7};
    MinHeap h2(arr);
    while (!h2.empty()) {
        cout << h2.top() << " ";  // 1 3 4 5 7 8 10
        h2.pop();
    }
    cout << endl;

    return 0;
}

Max-Heap Implementation

The only difference from min-heap is the comparison direction.

CPP
class MaxHeap {
    vector<int> heap;

    void heapifyUp(int i) {
        while (i > 0) {
            int par = (i - 1) / 2;
            if (heap[par] < heap[i]) {     // changed to <
                swap(heap[par], heap[i]);
                i = par;
            } else break;
        }
    }

    void heapifyDown(int i) {
        int n = heap.size();
        while (true) {
            int largest = i;               // changed name
            int left = 2 * i + 1;
            int right = 2 * i + 2;

            if (left < n && heap[left] > heap[largest])      // changed to >
                largest = left;
            if (right < n && heap[right] > heap[largest])    // changed to >
                largest = right;

            if (largest == i) break;
            swap(heap[i], heap[largest]);
            i = largest;
        }
    }

public:
    MaxHeap() {}
    MaxHeap(vector<int>& arr) : heap(arr) {
        for (int i = (int)heap.size() / 2 - 1; i >= 0; i--)
            heapifyDown(i);
    }
    void push(int val) { heap.push_back(val); heapifyUp(heap.size() - 1); }
    int top() const { return heap[0]; }
    void pop() { heap[0] = heap.back(); heap.pop_back(); if (!heap.empty()) heapifyDown(0); }
    bool empty() const { return heap.empty(); }
    int size() const { return heap.size(); }
};

Complexity Analysis

Operation Time Explanation push (insert)O(log n)Bubble up at most height = log(n) levels top (peek min)O(1)Root is always at index 0 pop (extract min)O(log n)Bubble down at most log(n) levels build heapO(n)Floyd's algorithm (see proof above) heapify (single)O(log n)One bubble-down operation searchO(n)No ordering beyond parent-child decrease keyO(log n)Change value then bubble up increase keyO(log n)Change value then bubble down delete arbitraryO(log n)Decrease key to -inf, then extract min Space: O(n) — just the array, no extra pointers.

Important: Heaps do NOT support efficient search or arbitrary access. If you need to find a specific element, you must scan the entire array in O(n). This is why decrease-key in Dijkstra's algorithm requires tracking each node's position in the heap.


Decrease Key and Delete Arbitrary Element

Two operations that are crucial for algorithms like Dijkstra's:

CPP
class IndexableMinHeap {
    vector<int> heap;          // heap[i] = value at position i
    vector<int> pos;           // pos[id] = position of element id in heap
    vector<int> id;            // id[i] = which element is at position i
    int n;

    void bubbleUp(int i) {
        while (i > 0) {
            int par = (i - 1) / 2;
            if (heap[i] < heap[par]) {
                swapPos(i, par);
                i = par;
            } else break;
        }
    }

    void bubbleDown(int i) {
        while (true) {
            int smallest = i, l = 2*i+1, r = 2*i+2;
            if (l < n && heap[l] < heap[smallest]) smallest = l;
            if (r < n && heap[r] < heap[smallest]) smallest = r;
            if (smallest == i) break;
            swapPos(i, smallest);
            i = smallest;
        }
    }

    void swapPos(int i, int j) {
        swap(heap[i], heap[j]);
        swap(id[i], id[j]);
        pos[id[i]] = i;
        pos[id[j]] = j;
    }

public:
    IndexableMinHeap(int maxN) : heap(maxN, INT_MAX), pos(maxN), id(maxN), n(0) {
        iota(pos.begin(), pos.end(), 0);
        iota(id.begin(), id.end(), 0);
    }

    void push(int elemId, int val) {
        pos[elemId] = n;
        id[n] = elemId;
        heap[n] = val;
        bubbleUp(n);
        n++;
    }

    // Decrease key of element elemId to newVal
    void decreaseKey(int elemId, int newVal) {
        int i = pos[elemId];
        heap[i] = newVal;
        bubbleUp(i);
    }

    pair<int,int> extractMin() {
        int minVal = heap[0], minId = id[0];
        n--;
        swapPos(0, n);
        bubbleDown(0);
        return {minId, minVal};
    }

    bool empty() { return n == 0; }
};

C++ STL: priority_queue

The Standard Library provides priority_queue which is a max-heap by default.

Basic Usage

CPP
#include <bits/stdc++.h>
using namespace std;

int main() {
    // MAX heap (default) -- top() returns the LARGEST element
    priority_queue<int> maxPQ;
    maxPQ.push(30);
    maxPQ.push(10);
    maxPQ.push(50);
    maxPQ.push(20);

    cout << maxPQ.top() << endl;  // 50
    maxPQ.pop();
    cout << maxPQ.top() << endl;  // 30

    // MIN heap -- top() returns the SMALLEST element
    priority_queue<int, vector<int>, greater<int>> minPQ;
    minPQ.push(30);
    minPQ.push(10);
    minPQ.push(50);
    minPQ.push(20);

    cout << minPQ.top() << endl;  // 10
    minPQ.pop();
    cout << minPQ.top() << endl;  // 20

    return 0;
}

Custom Comparator

CPP
// Min heap of pairs, ordered by second element
auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) {
    return a.second > b.second;  // NOTE: > for min-heap (reverse of sort)
};
priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> pq(cmp);

pq.push({1, 30});
pq.push({2, 10});
pq.push({3, 50});

cout << pq.top().first << endl;   // 2 (has smallest second element = 10)

Common confusion: The comparator for priority_queue is the opposite of what you'd use for sort. Think of it as: the comparator defines which element has lower priority. a.second > b.second means "a has lower priority if a.second is larger," so the smallest second value has the highest priority (min-heap behavior).

Using struct with custom comparator

CPP
struct Task {
    int id, priority, deadline;
};

struct TaskCmp {
    bool operator()(const Task& a, const Task& b) {
        if (a.priority != b.priority)
            return a.priority < b.priority;  // higher priority = higher in heap
        return a.deadline > b.deadline;       // earlier deadline = higher in heap
    }
};

priority_queue<Task, vector<Task>, TaskCmp> taskQueue;

STL priority_queue Caveats

  1. No decrease-key: You cannot change the priority of an element already in the queue. Workaround: push a new entry and mark the old one as invalid (lazy deletion).

  2. No iteration: You cannot iterate over elements or search for a specific one.

  3. No random access: You cannot access the k-th element.

CPP
// Lazy deletion pattern (common in Dijkstra's):
priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> pq;
vector<int> dist(n, INT_MAX);

pq.push({0, source});
dist[source] = 0;

while (!pq.empty()) {
    auto [d, u] = pq.top(); pq.pop();
    if (d > dist[u]) continue;  // LAZY DELETION: skip outdated entry
    for (auto [v, w] : adj[u]) {
        if (dist[u] + w < dist[v]) {
            dist[v] = dist[u] + w;
            pq.push({dist[v], v});  // push new entry (don't remove old one)
        }
    }
}

Heap Sort

Heap sort uses the heap structure to sort an array in-place in O(n log n) time.

Algorithm:

  1. Build a max-heap from the array (O(n)).
  2. Repeatedly extract the max and place it at the end of the array.
CPP
void heapifyDown(vector<int>& arr, int n, int i) {
    while (true) {
        int largest = i;
        int left = 2 * i + 1;
        int right = 2 * i + 2;

        if (left < n && arr[left] > arr[largest])
            largest = left;
        if (right < n && arr[right] > arr[largest])
            largest = right;

        if (largest == i) break;
        swap(arr[i], arr[largest]);
        i = largest;
    }
}

void heapSort(vector<int>& arr) {
    int n = arr.size();

    // Phase 1: Build max-heap -- O(n)
    for (int i = n / 2 - 1; i >= 0; i--)
        heapifyDown(arr, n, i);

    // Phase 2: Extract max repeatedly -- O(n log n)
    for (int i = n - 1; i > 0; i--) {
        swap(arr[0], arr[i]);        // move max to the end
        heapifyDown(arr, i, 0);      // restore heap on remaining elements
    }
}

Full Walkthrough

Input:  [4, 10, 3, 5, 1]

Phase 1: Build max-heap
  heapify_down(index 1):  10 > 5, 10 > 1 --> no swap
  heapify_down(index 0):  4 < 10 --> swap(4, 10), then 4 < 5 --> swap(4, 5)

  Max-heap: [10, 5, 3, 4, 1]
           10
          /  \
         5    3
        / \
       4   1

Phase 2: Extract max

  Iteration 1: swap arr[0] and arr[4] --> [1, 5, 3, 4, 10]
                                                        ^^^ sorted region
               heapify_down on [1, 5, 3, 4] (size 4):
               1 < 5 --> swap --> [5, 1, 3, 4]
               1 < 4 --> swap --> [5, 4, 3, 1]
               Result: [5, 4, 3, 1, | 10]

  Iteration 2: swap arr[0] and arr[3] --> [1, 4, 3, 5, | 10]
                                                   ^^^ sorted
               heapify_down on [1, 4, 3] (size 3):
               1 < 4 --> swap --> [4, 1, 3]
               Result: [4, 1, 3, | 5, 10]

  Iteration 3: swap arr[0] and arr[2] --> [3, 1, 4, | 5, 10]
                                               ^^^ sorted
               heapify_down on [3, 1] (size 2):
               3 > 1 --> no swap
               Result: [3, 1, | 4, 5, 10]

  Iteration 4: swap arr[0] and arr[1] --> [1, 3, | 4, 5, 10]
                                           ^^^ sorted
               heapify_down on [1] (size 1): nothing to do
               Result: [1, | 3, 4, 5, 10]

  Final: [1, 3, 4, 5, 10]  -- SORTED!

Properties of Heap Sort:

Property Value Time (best)O(n log n) Time (average)O(n log n) Time (worst)O(n log n) — guaranteed! SpaceO(1) — in-place! Stable?No (elements can jump over equals) Adaptive?No (doesn't exploit existing order)

Heap Sort vs Quick Sort vs Merge Sort:

Algorithm Best Average Worst Space Stable Heap SortO(n log n)O(n log n)O(n log n)O(1)No Quick SortO(n log n)O(n log n)O(n²)O(log n)No Merge SortO(n log n)O(n log n)O(n log n)O(n)Yes Intro SortO(n log n)O(n log n)O(n log n)O(log n)No C++ std::sort

Heap sort's advantage: guaranteed O(n log n) worst case with O(1) extra space. Its disadvantage: poor cache locality (comparing elements far apart in array) makes it slower in practice than quicksort. That's why C++'s std::sort uses introsort (quicksort that switches to heapsort when recursion depth exceeds a limit).


Classic Problems

1. Kth Largest Element in an Array (LeetCode 215)

Problem: Find the k-th largest element in an unsorted array.

Approach 1: Min-heap of size k

Maintain a min-heap of size k. Walk through the array:

  • If heap size < k, push the element.
  • If current element > heap top, pop the top and push current element.

After processing all elements, the heap top is the k-th largest.

Intuition: The min-heap of size k holds the k largest elements seen so far. The smallest among them (the top) is the k-th largest overall.

CPP
int findKthLargest(vector<int>& nums, int k) {
    // Min-heap of size k
    priority_queue<int, vector<int>, greater<int>> minPQ;

    for (int x : nums) {
        minPQ.push(x);
        if ((int)minPQ.size() > k) {
            minPQ.pop();   // remove the smallest, keeping k largest
        }
    }

    return minPQ.top();   // the k-th largest
}
// Time: O(n log k), Space: O(k)

Approach 2: Max-heap, extract k times (simpler but slower)

CPP
int findKthLargest_v2(vector<int>& nums, int k) {
    priority_queue<int> maxPQ(nums.begin(), nums.end());  // build heap O(n)
    for (int i = 0; i < k - 1; i++)
        maxPQ.pop();   // extract max k-1 times
    return maxPQ.top();
}
// Time: O(n + k log n), Space: O(n)

Approach 3: Quickselect (optimal average case, O(n) expected)

CPP
int quickselect(vector<int>& nums, int l, int r, int k) {
    int pivot = nums[r];
    int storeIdx = l;
    for (int i = l; i < r; i++) {
        if (nums[i] <= pivot) {
            swap(nums[storeIdx], nums[i]);
            storeIdx++;
        }
    }
    swap(nums[storeIdx], nums[r]);

    if (storeIdx == k) return nums[k];
    else if (storeIdx < k) return quickselect(nums, storeIdx + 1, r, k);
    else return quickselect(nums, l, storeIdx - 1, k);
}

int findKthLargest_v3(vector<int>& nums, int k) {
    int n = nums.size();
    return quickselect(nums, 0, n - 1, n - k);
}
// Time: O(n) expected, O(n^2) worst case. Space: O(1).

2. Merge K Sorted Lists (LeetCode 23)

Problem: Merge k sorted linked lists into one sorted list.

Intuition: Use a min-heap to always extract the smallest element across all k lists. Push the head of each list into the heap. Pop the min, add it to the result, and push the next element from that list.

CPP
struct ListNode {
    int val;
    ListNode* next;
    ListNode(int x) : val(x), next(nullptr) {}
};

ListNode* mergeKLists(vector<ListNode*>& lists) {
    auto cmp = [](ListNode* a, ListNode* b) { return a->val > b->val; };
    priority_queue<ListNode*, vector<ListNode*>, decltype(cmp)> pq(cmp);

    // Push head of each non-empty list
    for (auto* head : lists) {
        if (head) pq.push(head);
    }

    ListNode dummy(0);
    ListNode* tail = &dummy;

    while (!pq.empty()) {
        ListNode* node = pq.top();
        pq.pop();
        tail->next = node;
        tail = node;
        if (node->next) {
            pq.push(node->next);
        }
    }

    return dummy.next;
}
// Time: O(N log k) where N = total number of nodes across all lists
// Space: O(k) for the heap

Why this is optimal: At any point, the heap contains at most k elements (one from each list). Each of the N total nodes is pushed and popped exactly once. Each push/pop is O(log k). Total: O(N log k).

3. Find Median from Data Stream (LeetCode 295)

Problem: Design a data structure that supports addNum(num) and findMedian().

Intuition: Maintain two heaps:

  • A max-heap for the lower half of the data.
  • A min-heap for the upper half of the data.

The max-heap's top is the largest of the small elements. The min-heap's top is the smallest of the large elements. The median is either the max-heap's top (odd count) or the average of both tops (even count).

Data: [1, 3, 5, 7, 9] Max-Heap (lower half) 5 3 1 Min-Heap (upper half) 7 9 Median = max-heap top = 5 (odd count) After adding 6: Max-Heap (lower half) 5 3 1 Min-Heap (upper half) 6 7 9 Median = (5 + 6) / 2 = 5.5 (even count)
CPP
class MedianFinder {
    priority_queue<int> maxPQ;                                    // lower half
    priority_queue<int, vector<int>, greater<int>> minPQ;        // upper half

public:
    void addNum(int num) {
        // Step 1: Add to appropriate heap
        if (maxPQ.empty() || num <= maxPQ.top()) {
            maxPQ.push(num);
        } else {
            minPQ.push(num);
        }

        // Step 2: Balance -- sizes should differ by at most 1
        // We keep maxPQ.size() >= minPQ.size()
        if ((int)maxPQ.size() > (int)minPQ.size() + 1) {
            minPQ.push(maxPQ.top());
            maxPQ.pop();
        } else if ((int)minPQ.size() > (int)maxPQ.size()) {
            maxPQ.push(minPQ.top());
            minPQ.pop();
        }
    }

    double findMedian() {
        if (maxPQ.size() > minPQ.size()) {
            return maxPQ.top();                      // odd count
        }
        return (maxPQ.top() + minPQ.top()) / 2.0;   // even count
    }
};
// addNum: O(log n), findMedian: O(1)

Invariants maintained:

  1. Every element in maxPQ <= every element in minPQ.
  2. maxPQ.size() == minPQ.size() or maxPQ.size() == minPQ.size() + 1.

4. Top K Frequent Elements (LeetCode 347)

Problem: Given an array, return the k most frequent elements.

Approach 1: Min-heap of size k

CPP
vector<int> topKFrequent(vector<int>& nums, int k) {
    // Step 1: Count frequencies
    unordered_map<int, int> freq;
    for (int x : nums) freq[x]++;

    // Step 2: Min-heap of size k, ordered by frequency
    auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) {
        return a.second > b.second;   // min-heap by frequency
    };
    priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> pq(cmp);

    for (auto& [val, cnt] : freq) {
        pq.push({val, cnt});
        if ((int)pq.size() > k) pq.pop();
    }

    // Step 3: Extract results
    vector<int> result;
    while (!pq.empty()) {
        result.push_back(pq.top().first);
        pq.pop();
    }
    return result;
}
// Time: O(n + m log k) where m = number of unique elements
// Space: O(m + k)

Approach 2: Bucket sort (O(n) time, optimal)

CPP
vector<int> topKFrequent_v2(vector<int>& nums, int k) {
    unordered_map<int, int> freq;
    for (int x : nums) freq[x]++;

    int n = nums.size();
    vector<vector<int>> buckets(n + 1);   // buckets[f] = elements with frequency f
    for (auto& [val, f] : freq)
        buckets[f].push_back(val);

    vector<int> result;
    for (int f = n; f >= 0 && (int)result.size() < k; f--) {
        for (int val : buckets[f]) {
            result.push_back(val);
            if ((int)result.size() == k) break;
        }
    }
    return result;
}
// Time: O(n), Space: O(n)

5. Task Scheduler (LeetCode 621)

Problem: Given tasks (characters A-Z) and a cooldown period n, find the minimum time to execute all tasks. After executing a task, you must wait n intervals before executing the same task again.

Intuition: Always execute the task with the highest remaining count first (greedy). Use a max-heap to pick the most frequent available task at each step.

CPP
int leastInterval(vector<char>& tasks, int n) {
    int freq[26] = {};
    for (char c : tasks) freq[c - 'A']++;

    priority_queue<int> maxPQ;  // stores remaining counts
    for (int f : freq) {
        if (f > 0) maxPQ.push(f);
    }

    int time = 0;
    while (!maxPQ.empty()) {
        vector<int> temp;   // tasks executed in this cycle
        int cycle = n + 1;  // each cycle is n+1 slots

        while (cycle > 0 && !maxPQ.empty()) {
            int cnt = maxPQ.top(); maxPQ.pop();
            cnt--;
            if (cnt > 0) temp.push_back(cnt);
            time++;
            cycle--;
        }

        // Put unfinished tasks back
        for (int cnt : temp) maxPQ.push(cnt);

        // If there are still tasks left but cycle has idle slots, add idle time
        if (!maxPQ.empty()) time += cycle;  // idle slots
    }

    return time;
}
// Time: O(N * log(26)) = O(N), Space: O(26) = O(1)

Mathematical approach (O(n) without heap):

CPP
int leastInterval_v2(vector<char>& tasks, int n) {
    int freq[26] = {};
    for (char c : tasks) freq[c - 'A']++;

    int maxFreq = *max_element(freq, freq + 26);
    int maxCount = count(freq, freq + 26, maxFreq);  // how many tasks have max frequency

    // The minimum time is either:
    // 1. Total number of tasks (if cooldown never causes idle time)
    // 2. (maxFreq - 1) * (n + 1) + maxCount (if cooldown causes idle time)
    return max((int)tasks.size(), (maxFreq - 1) * (n + 1) + maxCount);
}

6. Sliding Window Maximum (LeetCode 239)

Problem: Given array and window size k, return the max of each window.

While a deque-based solution is optimal, a heap approach is instructive:

CPP
vector<int> maxSlidingWindow(vector<int>& nums, int k) {
    // Max-heap of (value, index)
    priority_queue<pair<int,int>> pq;
    vector<int> result;

    for (int i = 0; i < (int)nums.size(); i++) {
        pq.push({nums[i], i});

        if (i >= k - 1) {
            // Remove elements outside the window (lazy deletion)
            while (pq.top().second <= i - k) {
                pq.pop();
            }
            result.push_back(pq.top().first);
        }
    }
    return result;
}
// Time: O(n log n) -- each element pushed and popped at most once
// Space: O(n)

Heap Variants

D-ary Heap

Instead of 2 children per node, each node has d children.

3-ary heap (d = 3) 10 20 30 40 50 60 70 80 Array: [10, 20, 30, 40, 50, 60, 70, 80]
For node at index i (0-indexed):
  j-th child (j=0..d-1):  d*i + j + 1
  Parent:                  (i - 1) / d

Tradeoffs:

Property Binary Heap (d=2) D-ary Heap Heightlog₂(n)log_d(n) Insert (bubble up)O(log₂ n)O(log_d n) — FASTER Extract minO(2 · log₂ n) comparisonsO(d · log_d n) comparisons Decrease keyO(log₂ n)O(log_d n) — FASTER

A d-ary heap with d=4 or d=8 can be faster in practice for decrease-key heavy workloads (like Dijkstra's algorithm) because the shallower tree means fewer cache misses during bubble-up, and the comparison overhead of d children is offset by fewer levels.

Optimal d for Dijkstra: d = m/n where m = edges, n = vertices. This balances the cost of n extract-min operations (each O(d * log_d n)) against m decrease-key operations (each O(log_d n)).

Fibonacci Heap (Brief Overview)

The Fibonacci heap achieves the best amortized complexity for priority queue operations:

Operation Binary Heap Fibonacci Heap InsertO(log n)O(1) amortized Find minO(1)O(1) Extract minO(log n)O(log n) amortized Decrease keyO(log n)O(1) amortized Merge two heapsO(n)O(1) DeleteO(log n)O(log n) amortized

Why it matters: Dijkstra's algorithm with a Fibonacci heap runs in O(V log V + E) instead of O((V + E) log V) with a binary heap. For dense graphs (E ~ V^2), this is a significant improvement.

Why it's rarely used in practice:

  • Enormous constant factors (complex bookkeeping)
  • Poor cache performance (many pointers)
  • A simple binary heap with lazy deletion is almost always fast enough
  • In competitive programming, priority_queue with lazy deletion is standard

How it works (high level):

  • Maintains a collection of heap-ordered trees (a "forest")
  • Insert: add a single-node tree to the forest
  • Extract min: remove the root of the min tree, add its children as separate trees, then "consolidate" trees of the same degree (like binomial heap merge)
  • Decrease key: cut the node from its parent, add it as a new tree. If the parent lost too many children, "cascade cut" up the tree.

The key insight is that most operations are done lazily (just add to the forest), and the expensive consolidation only happens during extract-min, amortizing the cost.


Comparison: When to Use What

Need Best Data Structure Min/max queries onlyHeap (priority_queue) Sorted order iterationBalanced BST (set/map) or sorted array Kth element queriesOrder-statistics tree or sorted array Dynamic medianTwo heaps (max + min) Merge k sorted streamsMin-heap of size k Dijkstra / PrimMin-heap (or Fibonacci heap for theory) Scheduling / simulationPriority queue Range min/max queriesSegment tree or sparse table (NOT heap)

Common Mistakes

1. Confusing Max-Heap and Min-Heap in C++

CPP
// C++ priority_queue is MAX heap by default!
priority_queue<int> pq;       // MAX heap -- top() is the LARGEST
pq.push(1); pq.push(5); pq.push(3);
cout << pq.top();             // 5, NOT 1!

// For MIN heap, you need greater<int>:
priority_queue<int, vector<int>, greater<int>> minPQ;

2. Modifying Elements in the Heap

CPP
// You CANNOT change an element's priority and have the heap automatically fix itself.
// priority_queue does not support decrease-key or increase-key.
// Use lazy deletion instead (push new entry, skip old ones when popping).

3. Assuming Heap is Sorted

CPP
// A heap is NOT sorted! Only the root has a guaranteed position.
// [10, 30, 20, 50, 40] is a valid min-heap, but 30 and 20 are "out of order"
// relative to each other -- and that's fine.

4. Forgetting that Build Heap is O(n)

CPP
// DON'T do this:
for (int x : arr) pq.push(x);    // O(n log n)

// DO this if you have all elements upfront:
priority_queue<int> pq(arr.begin(), arr.end());  // O(n) -- uses Floyd's algorithm

Summary

                     HEAP (Priority Queue)
                     =====================

    Structure:  Complete binary tree stored as array.
    Property:   Parent >= children (max-heap) or Parent <= children (min-heap).

    Key operations:
      push     O(log n)   -- append + bubble up
      top      O(1)       -- peek at root
      pop      O(log n)   -- swap root with last + bubble down
      build    O(n)       -- Floyd's bottom-up algorithm

    Array layout (0-indexed):
      left(i)   = 2i + 1
      right(i)  = 2i + 2
      parent(i) = (i-1) / 2

    C++ STL:
      priority_queue<T>                        -- max heap
      priority_queue<T, vector<T>, greater<T>> -- min heap

    Classic applications:
      - K-th largest/smallest (min-heap of size k)
      - Merge K sorted lists (min-heap of size k)
      - Running median (two heaps)
      - Dijkstra's shortest path
      - Heap sort (in-place, O(n log n) guaranteed)
      - Event-driven simulation

    Variants:
      - D-ary heap (d children per node, shallower tree)
      - Fibonacci heap (O(1) amortized insert and decrease-key)
      - Indexed heap (supports decrease-key via position tracking)