Heap Internals (Priority Queue)
Heap Internals (Priority Queue)
What Is a Heap?
A heap is a complete binary tree stored in an array that satisfies the heap property:
- Max-Heap: Every parent >= its children. The root holds the maximum.
- Min-Heap: Every parent <= its children. The root holds the minimum.
Complete binary tree means every level is fully filled except possibly the last, which is filled from left to right. This property is what allows the array representation to work -- there are no "gaps."
Array Representation
The entire heap lives in a single contiguous array. No pointers, no heap allocation for nodes. This is what makes heaps cache-friendly and fast in practice.
Index formulas (0-indexed):
For node at index i:
Left child: 2*i + 1
Right child: 2*i + 2
Parent: (i - 1) / 2 (integer division)
Visual mapping:
Verification:
parent(1) = (1-1)/2 = 0 --> arr[0] = 50 >= arr[1] = 30 OK
parent(2) = (2-1)/2 = 0 --> arr[0] = 50 >= arr[2] = 40 OK
parent(3) = (3-1)/2 = 1 --> arr[1] = 30 >= arr[3] = 10 OK
parent(4) = (4-1)/2 = 1 --> arr[1] = 30 >= arr[4] = 20 OK
parent(5) = (5-1)/2 = 2 --> arr[2] = 40 >= arr[5] = 35 OK
parent(6) = (6-1)/2 = 2 --> arr[2] = 40 >= arr[6] = 25 OK
left(0) = 2*0+1 = 1 --> 30 (left child of 50)
right(0) = 2*0+2 = 2 --> 40 (right child of 50)
left(1) = 2*1+1 = 3 --> 10 (left child of 30)
right(1) = 2*1+2 = 4 --> 20 (right child of 30)
Why this works: In a complete binary tree, if we number nodes level by level from left to right starting at 0, the parent-child relationships follow the formulas above exactly. No node index is "wasted" because there are no gaps in a complete binary tree.
1-indexed variant (sometimes simpler):
For node at index i (1-indexed, root at index 1):
Left child: 2*i
Right child: 2*i + 1
Parent: i / 2
Core Operations
1. Heapify Up (Bubble Up / Sift Up)
Used after insertion. The new element is placed at the end of the array (to maintain the complete binary tree shape), then it "bubbles up" by repeatedly swapping with its parent until the heap property is restored.
Algorithm:
1. Place new element at index n (end of array)
2. Compare with parent at index (n-1)/2
3. If new element violates heap property, swap with parent
4. Repeat from step 2 with the new position
5. Stop when heap property is satisfied or we reach the root
Detailed walkthrough (Max-Heap): Insert 60 into [50, 30, 40, 10, 20, 35, 25]
Initial heap:
50
/ \
30 40
/ \ / \
10 20 35 25
Array: [50, 30, 40, 10, 20, 35, 25]
Step 0: Append 60 at index 7
50
/ \
30 40
/ \ / \
10 20 35 25
/
60 <-- added here (leftmost open position)
Array: [50, 30, 40, 10, 20, 35, 25, 60]
^ ^
parent(7)=3 new element
Step 1: Compare 60 with parent arr[3]=10. 60 > 10? YES --> swap
50
/ \
30 40
/ \ / \
[60] 20 35 25 60 and 10 swapped
/
[10]
Array: [50, 30, 40, 60, 20, 35, 25, 10]
^ ^
parent(3)=1 60 is now at index 3
Step 2: Compare 60 with parent arr[1]=30. 60 > 30? YES --> swap
50
/ \
[60] 40
/ \ / \
30 20 35 25 60 and 30 swapped
/
10
Array: [50, 60, 40, 30, 20, 35, 25, 10]
^ ^
parent(1)=0 60 is now at index 1
Step 3: Compare 60 with parent arr[0]=50. 60 > 50? YES --> swap
[60]
/ \
50 40
/ \ / \
30 20 35 25 60 is now the root!
/
10
Array: [60, 50, 40, 30, 20, 35, 25, 10]
Step 4: Index 0 is root. DONE.
Total swaps: 3 (height of tree = log(8) = 3)
2. Heapify Down (Bubble Down / Sift Down)
Used after extracting the root (removing the max or min). The last element in the array is moved to the root position, then it "bubbles down" by swapping with the larger (max-heap) or smaller (min-heap) child until the heap property is restored.
Algorithm:
1. Replace root with last element, remove last element
2. Compare current element with both children
3. If either child violates heap property, swap with the LARGER child (max-heap)
or SMALLER child (min-heap)
4. Repeat from step 2 with the new position
5. Stop when heap property is satisfied or we reach a leaf
Why swap with the larger child (max-heap)? Because we need the new parent to be >= both children. If we swap with the smaller child, the larger child might be bigger than the new parent, violating the heap property.
Detailed walkthrough (Max-Heap): Extract max from [60, 50, 40, 30, 20, 35, 25, 10]
Step 0: Save root value 60 (this is the extracted max).
Move last element (10) to root. Remove last position.
10 <-- 10 moved to root
/ \
50 40
/ \ / \
30 20 35 25
Array: [10, 50, 40, 30, 20, 35, 25]
Step 1: Compare 10 with children: left=50, right=40. Largest child = 50 (index 1).
10 < 50? YES --> swap
[50]
/ \
[10] 40
/ \ / \
30 20 35 25
Array: [50, 10, 40, 30, 20, 35, 25]
Step 2: At index 1. Children: left=30 (index 3), right=20 (index 4).
Largest child = 30.
10 < 30? YES --> swap
50
/ \
[30] 40
/ \ / \
[10] 20 35 25
Array: [50, 30, 40, 10, 20, 35, 25]
Step 3: At index 3. Children: left would be index 7 (out of bounds).
No children --> leaf node. DONE.
Final heap: [50, 30, 40, 10, 20, 35, 25]
50
/ \
30 40
/ \ / \
10 20 35 25
Heap property restored! Returned value: 60.
Total swaps: 2
3. Build Heap from Array -- O(n), NOT O(n log n)!
Given an arbitrary array, we can transform it into a valid heap in O(n) time.
The naive approach (WRONG complexity analysis): Insert elements one by one, each taking O(log n). Total: O(n log n). But we can do better!
The smart approach (Floyd's algorithm): Start from the last non-leaf node and call heapify-down on each node, working backwards to the root.
Why start from the last non-leaf? Leaf nodes are already valid heaps (a single element
trivially satisfies the heap property). The last non-leaf node has index (n-2)/2
(0-indexed) or n/2 - 1.
Walkthrough: Build max-heap from [4, 10, 3, 5, 1, 8, 7]
Initial array (arbitrary order):
4
/ \
10 3
/ \ / \
5 1 8 7
Array: [4, 10, 3, 5, 1, 8, 7]
Last non-leaf: index (7-2)/2 = 2 (value 3)
Step 1: heapify_down(index 2) -- value 3, children: 8 (idx 5), 7 (idx 6)
Largest child = 8 > 3 --> swap
4
/ \
10 [8]
/ \ / \
5 1 [3] 7
Array: [4, 10, 8, 5, 1, 3, 7]
Step 2: heapify_down(index 1) -- value 10, children: 5 (idx 3), 1 (idx 4)
Largest child = 5 < 10 --> no swap needed
4
/ \
10 8
/ \ / \
5 1 3 7
Array: [4, 10, 8, 5, 1, 3, 7]
Step 3: heapify_down(index 0) -- value 4, children: 10 (idx 1), 8 (idx 2)
Largest child = 10 > 4 --> swap
[10]
/ \
[4] 8
/ \ / \
5 1 3 7
Continue: value 4 at index 1, children: 5 (idx 3), 1 (idx 4)
Largest child = 5 > 4 --> swap
10
/ \
[5] 8
/ \ / \
[4] 1 3 7
Continue: value 4 at index 3, children: index 7 (out of bounds)
Leaf node --> done
Final heap: [10, 5, 8, 4, 1, 3, 7]
10
/ \
5 8
/ \ / \
4 1 3 7
Valid max-heap!
Why Is Build Heap O(n) and Not O(n log n)?
This is one of the most elegant results in data structures. The key insight is that most nodes are near the bottom of the tree and require very few swaps.
Formal calculation:
Total work = sum over k = 0 to log(n) of: (nodes at level k) * (height - k)
= sum over k = 0 to h of: 2^k * (h - k) where h = log(n)
Let j = h - k (number of swaps needed):
= sum over j = 0 to h of: 2^(h-j) * j
= 2^h * sum over j = 0 to h of: j / 2^j
The series sum(j / 2^j) converges:
S = 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + ...
To find S, use the standard technique:
S = 1/2 + 2/4 + 3/8 + 4/16 + ...
S/2 = 1/4 + 2/8 + 3/16 + ...
S - S/2 = 1/2 + 1/4 + 1/8 + 1/16 + ... = 1
So S/2 = 1, giving S = 2.
Total work = 2^h * 2 = 2 * n = O(n)
Intuition: Half the nodes are leaves (0 work). A quarter do 1 swap. An eighth do 2 swaps. The work decreases geometrically, summing to a constant times n.
Complete C++ Implementation
#include <bits/stdc++.h>
using namespace std;
class MinHeap {
vector<int> heap;
void heapifyUp(int i) {
while (i > 0) {
int par = (i - 1) / 2;
if (heap[par] > heap[i]) {
swap(heap[par], heap[i]);
i = par;
} else {
break;
}
}
}
void heapifyDown(int i) {
int n = heap.size();
while (true) {
int smallest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && heap[left] < heap[smallest])
smallest = left;
if (right < n && heap[right] < heap[smallest])
smallest = right;
if (smallest == i) break; // heap property satisfied
swap(heap[i], heap[smallest]);
i = smallest;
}
}
public:
// Default constructor: empty heap
MinHeap() {}
// Build heap from array in O(n)
MinHeap(vector<int>& arr) : heap(arr) {
// Start from last non-leaf and heapify down
for (int i = (int)heap.size() / 2 - 1; i >= 0; i--)
heapifyDown(i);
}
// Insert: O(log n)
void push(int val) {
heap.push_back(val);
heapifyUp(heap.size() - 1);
}
// Get minimum: O(1)
int top() const {
if (heap.empty()) throw runtime_error("Heap is empty");
return heap[0];
}
// Remove minimum: O(log n)
void pop() {
if (heap.empty()) throw runtime_error("Heap is empty");
heap[0] = heap.back();
heap.pop_back();
if (!heap.empty()) heapifyDown(0);
}
bool empty() const { return heap.empty(); }
int size() const { return heap.size(); }
// Peek at internal array (for debugging)
void print() const {
for (int x : heap) cout << x << " ";
cout << endl;
}
};
// Usage
int main() {
MinHeap h;
h.push(40);
h.push(10);
h.push(30);
h.push(5);
h.push(20);
cout << "Min: " << h.top() << endl; // 5
h.pop();
cout << "Min after pop: " << h.top() << endl; // 10
// Build from array
vector<int> arr = {4, 10, 3, 5, 1, 8, 7};
MinHeap h2(arr);
while (!h2.empty()) {
cout << h2.top() << " "; // 1 3 4 5 7 8 10
h2.pop();
}
cout << endl;
return 0;
}
Max-Heap Implementation
The only difference from min-heap is the comparison direction.
class MaxHeap {
vector<int> heap;
void heapifyUp(int i) {
while (i > 0) {
int par = (i - 1) / 2;
if (heap[par] < heap[i]) { // changed to <
swap(heap[par], heap[i]);
i = par;
} else break;
}
}
void heapifyDown(int i) {
int n = heap.size();
while (true) {
int largest = i; // changed name
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && heap[left] > heap[largest]) // changed to >
largest = left;
if (right < n && heap[right] > heap[largest]) // changed to >
largest = right;
if (largest == i) break;
swap(heap[i], heap[largest]);
i = largest;
}
}
public:
MaxHeap() {}
MaxHeap(vector<int>& arr) : heap(arr) {
for (int i = (int)heap.size() / 2 - 1; i >= 0; i--)
heapifyDown(i);
}
void push(int val) { heap.push_back(val); heapifyUp(heap.size() - 1); }
int top() const { return heap[0]; }
void pop() { heap[0] = heap.back(); heap.pop_back(); if (!heap.empty()) heapifyDown(0); }
bool empty() const { return heap.empty(); }
int size() const { return heap.size(); }
};
Complexity Analysis
Important: Heaps do NOT support efficient search or arbitrary access. If you need to find a specific element, you must scan the entire array in O(n). This is why decrease-key in Dijkstra's algorithm requires tracking each node's position in the heap.
Decrease Key and Delete Arbitrary Element
Two operations that are crucial for algorithms like Dijkstra's:
class IndexableMinHeap {
vector<int> heap; // heap[i] = value at position i
vector<int> pos; // pos[id] = position of element id in heap
vector<int> id; // id[i] = which element is at position i
int n;
void bubbleUp(int i) {
while (i > 0) {
int par = (i - 1) / 2;
if (heap[i] < heap[par]) {
swapPos(i, par);
i = par;
} else break;
}
}
void bubbleDown(int i) {
while (true) {
int smallest = i, l = 2*i+1, r = 2*i+2;
if (l < n && heap[l] < heap[smallest]) smallest = l;
if (r < n && heap[r] < heap[smallest]) smallest = r;
if (smallest == i) break;
swapPos(i, smallest);
i = smallest;
}
}
void swapPos(int i, int j) {
swap(heap[i], heap[j]);
swap(id[i], id[j]);
pos[id[i]] = i;
pos[id[j]] = j;
}
public:
IndexableMinHeap(int maxN) : heap(maxN, INT_MAX), pos(maxN), id(maxN), n(0) {
iota(pos.begin(), pos.end(), 0);
iota(id.begin(), id.end(), 0);
}
void push(int elemId, int val) {
pos[elemId] = n;
id[n] = elemId;
heap[n] = val;
bubbleUp(n);
n++;
}
// Decrease key of element elemId to newVal
void decreaseKey(int elemId, int newVal) {
int i = pos[elemId];
heap[i] = newVal;
bubbleUp(i);
}
pair<int,int> extractMin() {
int minVal = heap[0], minId = id[0];
n--;
swapPos(0, n);
bubbleDown(0);
return {minId, minVal};
}
bool empty() { return n == 0; }
};
C++ STL: priority_queue
The Standard Library provides priority_queue which is a max-heap by default.
Basic Usage
#include <bits/stdc++.h>
using namespace std;
int main() {
// MAX heap (default) -- top() returns the LARGEST element
priority_queue<int> maxPQ;
maxPQ.push(30);
maxPQ.push(10);
maxPQ.push(50);
maxPQ.push(20);
cout << maxPQ.top() << endl; // 50
maxPQ.pop();
cout << maxPQ.top() << endl; // 30
// MIN heap -- top() returns the SMALLEST element
priority_queue<int, vector<int>, greater<int>> minPQ;
minPQ.push(30);
minPQ.push(10);
minPQ.push(50);
minPQ.push(20);
cout << minPQ.top() << endl; // 10
minPQ.pop();
cout << minPQ.top() << endl; // 20
return 0;
}
Custom Comparator
// Min heap of pairs, ordered by second element
auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) {
return a.second > b.second; // NOTE: > for min-heap (reverse of sort)
};
priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> pq(cmp);
pq.push({1, 30});
pq.push({2, 10});
pq.push({3, 50});
cout << pq.top().first << endl; // 2 (has smallest second element = 10)
Common confusion: The comparator for priority_queue is the opposite of what you'd
use for sort. Think of it as: the comparator defines which element has lower priority.
a.second > b.second means "a has lower priority if a.second is larger," so the smallest
second value has the highest priority (min-heap behavior).
Using struct with custom comparator
struct Task {
int id, priority, deadline;
};
struct TaskCmp {
bool operator()(const Task& a, const Task& b) {
if (a.priority != b.priority)
return a.priority < b.priority; // higher priority = higher in heap
return a.deadline > b.deadline; // earlier deadline = higher in heap
}
};
priority_queue<Task, vector<Task>, TaskCmp> taskQueue;
STL priority_queue Caveats
-
No decrease-key: You cannot change the priority of an element already in the queue. Workaround: push a new entry and mark the old one as invalid (lazy deletion).
-
No iteration: You cannot iterate over elements or search for a specific one.
-
No random access: You cannot access the k-th element.
// Lazy deletion pattern (common in Dijkstra's):
priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> pq;
vector<int> dist(n, INT_MAX);
pq.push({0, source});
dist[source] = 0;
while (!pq.empty()) {
auto [d, u] = pq.top(); pq.pop();
if (d > dist[u]) continue; // LAZY DELETION: skip outdated entry
for (auto [v, w] : adj[u]) {
if (dist[u] + w < dist[v]) {
dist[v] = dist[u] + w;
pq.push({dist[v], v}); // push new entry (don't remove old one)
}
}
}
Heap Sort
Heap sort uses the heap structure to sort an array in-place in O(n log n) time.
Algorithm:
- Build a max-heap from the array (O(n)).
- Repeatedly extract the max and place it at the end of the array.
void heapifyDown(vector<int>& arr, int n, int i) {
while (true) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest])
largest = left;
if (right < n && arr[right] > arr[largest])
largest = right;
if (largest == i) break;
swap(arr[i], arr[largest]);
i = largest;
}
}
void heapSort(vector<int>& arr) {
int n = arr.size();
// Phase 1: Build max-heap -- O(n)
for (int i = n / 2 - 1; i >= 0; i--)
heapifyDown(arr, n, i);
// Phase 2: Extract max repeatedly -- O(n log n)
for (int i = n - 1; i > 0; i--) {
swap(arr[0], arr[i]); // move max to the end
heapifyDown(arr, i, 0); // restore heap on remaining elements
}
}
Full Walkthrough
Input: [4, 10, 3, 5, 1]
Phase 1: Build max-heap
heapify_down(index 1): 10 > 5, 10 > 1 --> no swap
heapify_down(index 0): 4 < 10 --> swap(4, 10), then 4 < 5 --> swap(4, 5)
Max-heap: [10, 5, 3, 4, 1]
10
/ \
5 3
/ \
4 1
Phase 2: Extract max
Iteration 1: swap arr[0] and arr[4] --> [1, 5, 3, 4, 10]
^^^ sorted region
heapify_down on [1, 5, 3, 4] (size 4):
1 < 5 --> swap --> [5, 1, 3, 4]
1 < 4 --> swap --> [5, 4, 3, 1]
Result: [5, 4, 3, 1, | 10]
Iteration 2: swap arr[0] and arr[3] --> [1, 4, 3, 5, | 10]
^^^ sorted
heapify_down on [1, 4, 3] (size 3):
1 < 4 --> swap --> [4, 1, 3]
Result: [4, 1, 3, | 5, 10]
Iteration 3: swap arr[0] and arr[2] --> [3, 1, 4, | 5, 10]
^^^ sorted
heapify_down on [3, 1] (size 2):
3 > 1 --> no swap
Result: [3, 1, | 4, 5, 10]
Iteration 4: swap arr[0] and arr[1] --> [1, 3, | 4, 5, 10]
^^^ sorted
heapify_down on [1] (size 1): nothing to do
Result: [1, | 3, 4, 5, 10]
Final: [1, 3, 4, 5, 10] -- SORTED!
Properties of Heap Sort:
Heap Sort vs Quick Sort vs Merge Sort:
Heap sort's advantage: guaranteed O(n log n) worst case with O(1) extra space.
Its disadvantage: poor cache locality (comparing elements far apart in array) makes it
slower in practice than quicksort. That's why C++'s std::sort uses introsort (quicksort
that switches to heapsort when recursion depth exceeds a limit).
Classic Problems
1. Kth Largest Element in an Array (LeetCode 215)
Problem: Find the k-th largest element in an unsorted array.
Approach 1: Min-heap of size k
Maintain a min-heap of size k. Walk through the array:
- If heap size < k, push the element.
- If current element > heap top, pop the top and push current element.
After processing all elements, the heap top is the k-th largest.
Intuition: The min-heap of size k holds the k largest elements seen so far. The smallest among them (the top) is the k-th largest overall.
int findKthLargest(vector<int>& nums, int k) {
// Min-heap of size k
priority_queue<int, vector<int>, greater<int>> minPQ;
for (int x : nums) {
minPQ.push(x);
if ((int)minPQ.size() > k) {
minPQ.pop(); // remove the smallest, keeping k largest
}
}
return minPQ.top(); // the k-th largest
}
// Time: O(n log k), Space: O(k)
Approach 2: Max-heap, extract k times (simpler but slower)
int findKthLargest_v2(vector<int>& nums, int k) {
priority_queue<int> maxPQ(nums.begin(), nums.end()); // build heap O(n)
for (int i = 0; i < k - 1; i++)
maxPQ.pop(); // extract max k-1 times
return maxPQ.top();
}
// Time: O(n + k log n), Space: O(n)
Approach 3: Quickselect (optimal average case, O(n) expected)
int quickselect(vector<int>& nums, int l, int r, int k) {
int pivot = nums[r];
int storeIdx = l;
for (int i = l; i < r; i++) {
if (nums[i] <= pivot) {
swap(nums[storeIdx], nums[i]);
storeIdx++;
}
}
swap(nums[storeIdx], nums[r]);
if (storeIdx == k) return nums[k];
else if (storeIdx < k) return quickselect(nums, storeIdx + 1, r, k);
else return quickselect(nums, l, storeIdx - 1, k);
}
int findKthLargest_v3(vector<int>& nums, int k) {
int n = nums.size();
return quickselect(nums, 0, n - 1, n - k);
}
// Time: O(n) expected, O(n^2) worst case. Space: O(1).
2. Merge K Sorted Lists (LeetCode 23)
Problem: Merge k sorted linked lists into one sorted list.
Intuition: Use a min-heap to always extract the smallest element across all k lists. Push the head of each list into the heap. Pop the min, add it to the result, and push the next element from that list.
struct ListNode {
int val;
ListNode* next;
ListNode(int x) : val(x), next(nullptr) {}
};
ListNode* mergeKLists(vector<ListNode*>& lists) {
auto cmp = [](ListNode* a, ListNode* b) { return a->val > b->val; };
priority_queue<ListNode*, vector<ListNode*>, decltype(cmp)> pq(cmp);
// Push head of each non-empty list
for (auto* head : lists) {
if (head) pq.push(head);
}
ListNode dummy(0);
ListNode* tail = &dummy;
while (!pq.empty()) {
ListNode* node = pq.top();
pq.pop();
tail->next = node;
tail = node;
if (node->next) {
pq.push(node->next);
}
}
return dummy.next;
}
// Time: O(N log k) where N = total number of nodes across all lists
// Space: O(k) for the heap
Why this is optimal: At any point, the heap contains at most k elements (one from each list). Each of the N total nodes is pushed and popped exactly once. Each push/pop is O(log k). Total: O(N log k).
3. Find Median from Data Stream (LeetCode 295)
Problem: Design a data structure that supports addNum(num) and findMedian().
Intuition: Maintain two heaps:
- A max-heap for the lower half of the data.
- A min-heap for the upper half of the data.
The max-heap's top is the largest of the small elements. The min-heap's top is the smallest of the large elements. The median is either the max-heap's top (odd count) or the average of both tops (even count).
class MedianFinder {
priority_queue<int> maxPQ; // lower half
priority_queue<int, vector<int>, greater<int>> minPQ; // upper half
public:
void addNum(int num) {
// Step 1: Add to appropriate heap
if (maxPQ.empty() || num <= maxPQ.top()) {
maxPQ.push(num);
} else {
minPQ.push(num);
}
// Step 2: Balance -- sizes should differ by at most 1
// We keep maxPQ.size() >= minPQ.size()
if ((int)maxPQ.size() > (int)minPQ.size() + 1) {
minPQ.push(maxPQ.top());
maxPQ.pop();
} else if ((int)minPQ.size() > (int)maxPQ.size()) {
maxPQ.push(minPQ.top());
minPQ.pop();
}
}
double findMedian() {
if (maxPQ.size() > minPQ.size()) {
return maxPQ.top(); // odd count
}
return (maxPQ.top() + minPQ.top()) / 2.0; // even count
}
};
// addNum: O(log n), findMedian: O(1)
Invariants maintained:
- Every element in maxPQ <= every element in minPQ.
maxPQ.size() == minPQ.size()ormaxPQ.size() == minPQ.size() + 1.
4. Top K Frequent Elements (LeetCode 347)
Problem: Given an array, return the k most frequent elements.
Approach 1: Min-heap of size k
vector<int> topKFrequent(vector<int>& nums, int k) {
// Step 1: Count frequencies
unordered_map<int, int> freq;
for (int x : nums) freq[x]++;
// Step 2: Min-heap of size k, ordered by frequency
auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) {
return a.second > b.second; // min-heap by frequency
};
priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> pq(cmp);
for (auto& [val, cnt] : freq) {
pq.push({val, cnt});
if ((int)pq.size() > k) pq.pop();
}
// Step 3: Extract results
vector<int> result;
while (!pq.empty()) {
result.push_back(pq.top().first);
pq.pop();
}
return result;
}
// Time: O(n + m log k) where m = number of unique elements
// Space: O(m + k)
Approach 2: Bucket sort (O(n) time, optimal)
vector<int> topKFrequent_v2(vector<int>& nums, int k) {
unordered_map<int, int> freq;
for (int x : nums) freq[x]++;
int n = nums.size();
vector<vector<int>> buckets(n + 1); // buckets[f] = elements with frequency f
for (auto& [val, f] : freq)
buckets[f].push_back(val);
vector<int> result;
for (int f = n; f >= 0 && (int)result.size() < k; f--) {
for (int val : buckets[f]) {
result.push_back(val);
if ((int)result.size() == k) break;
}
}
return result;
}
// Time: O(n), Space: O(n)
5. Task Scheduler (LeetCode 621)
Problem: Given tasks (characters A-Z) and a cooldown period n, find the minimum time to execute all tasks. After executing a task, you must wait n intervals before executing the same task again.
Intuition: Always execute the task with the highest remaining count first (greedy). Use a max-heap to pick the most frequent available task at each step.
int leastInterval(vector<char>& tasks, int n) {
int freq[26] = {};
for (char c : tasks) freq[c - 'A']++;
priority_queue<int> maxPQ; // stores remaining counts
for (int f : freq) {
if (f > 0) maxPQ.push(f);
}
int time = 0;
while (!maxPQ.empty()) {
vector<int> temp; // tasks executed in this cycle
int cycle = n + 1; // each cycle is n+1 slots
while (cycle > 0 && !maxPQ.empty()) {
int cnt = maxPQ.top(); maxPQ.pop();
cnt--;
if (cnt > 0) temp.push_back(cnt);
time++;
cycle--;
}
// Put unfinished tasks back
for (int cnt : temp) maxPQ.push(cnt);
// If there are still tasks left but cycle has idle slots, add idle time
if (!maxPQ.empty()) time += cycle; // idle slots
}
return time;
}
// Time: O(N * log(26)) = O(N), Space: O(26) = O(1)
Mathematical approach (O(n) without heap):
int leastInterval_v2(vector<char>& tasks, int n) {
int freq[26] = {};
for (char c : tasks) freq[c - 'A']++;
int maxFreq = *max_element(freq, freq + 26);
int maxCount = count(freq, freq + 26, maxFreq); // how many tasks have max frequency
// The minimum time is either:
// 1. Total number of tasks (if cooldown never causes idle time)
// 2. (maxFreq - 1) * (n + 1) + maxCount (if cooldown causes idle time)
return max((int)tasks.size(), (maxFreq - 1) * (n + 1) + maxCount);
}
6. Sliding Window Maximum (LeetCode 239)
Problem: Given array and window size k, return the max of each window.
While a deque-based solution is optimal, a heap approach is instructive:
vector<int> maxSlidingWindow(vector<int>& nums, int k) {
// Max-heap of (value, index)
priority_queue<pair<int,int>> pq;
vector<int> result;
for (int i = 0; i < (int)nums.size(); i++) {
pq.push({nums[i], i});
if (i >= k - 1) {
// Remove elements outside the window (lazy deletion)
while (pq.top().second <= i - k) {
pq.pop();
}
result.push_back(pq.top().first);
}
}
return result;
}
// Time: O(n log n) -- each element pushed and popped at most once
// Space: O(n)
Heap Variants
D-ary Heap
Instead of 2 children per node, each node has d children.
For node at index i (0-indexed):
j-th child (j=0..d-1): d*i + j + 1
Parent: (i - 1) / d
Tradeoffs:
A d-ary heap with d=4 or d=8 can be faster in practice for decrease-key heavy workloads (like Dijkstra's algorithm) because the shallower tree means fewer cache misses during bubble-up, and the comparison overhead of d children is offset by fewer levels.
Optimal d for Dijkstra: d = m/n where m = edges, n = vertices. This balances
the cost of n extract-min operations (each O(d * log_d n)) against m decrease-key
operations (each O(log_d n)).
Fibonacci Heap (Brief Overview)
The Fibonacci heap achieves the best amortized complexity for priority queue operations:
Why it matters: Dijkstra's algorithm with a Fibonacci heap runs in O(V log V + E) instead of O((V + E) log V) with a binary heap. For dense graphs (E ~ V^2), this is a significant improvement.
Why it's rarely used in practice:
- Enormous constant factors (complex bookkeeping)
- Poor cache performance (many pointers)
- A simple binary heap with lazy deletion is almost always fast enough
- In competitive programming,
priority_queuewith lazy deletion is standard
How it works (high level):
- Maintains a collection of heap-ordered trees (a "forest")
- Insert: add a single-node tree to the forest
- Extract min: remove the root of the min tree, add its children as separate trees, then "consolidate" trees of the same degree (like binomial heap merge)
- Decrease key: cut the node from its parent, add it as a new tree. If the parent lost too many children, "cascade cut" up the tree.
The key insight is that most operations are done lazily (just add to the forest), and the expensive consolidation only happens during extract-min, amortizing the cost.
Comparison: When to Use What
Common Mistakes
1. Confusing Max-Heap and Min-Heap in C++
// C++ priority_queue is MAX heap by default!
priority_queue<int> pq; // MAX heap -- top() is the LARGEST
pq.push(1); pq.push(5); pq.push(3);
cout << pq.top(); // 5, NOT 1!
// For MIN heap, you need greater<int>:
priority_queue<int, vector<int>, greater<int>> minPQ;
2. Modifying Elements in the Heap
// You CANNOT change an element's priority and have the heap automatically fix itself.
// priority_queue does not support decrease-key or increase-key.
// Use lazy deletion instead (push new entry, skip old ones when popping).
3. Assuming Heap is Sorted
// A heap is NOT sorted! Only the root has a guaranteed position.
// [10, 30, 20, 50, 40] is a valid min-heap, but 30 and 20 are "out of order"
// relative to each other -- and that's fine.
4. Forgetting that Build Heap is O(n)
// DON'T do this:
for (int x : arr) pq.push(x); // O(n log n)
// DO this if you have all elements upfront:
priority_queue<int> pq(arr.begin(), arr.end()); // O(n) -- uses Floyd's algorithm
Summary
HEAP (Priority Queue)
=====================
Structure: Complete binary tree stored as array.
Property: Parent >= children (max-heap) or Parent <= children (min-heap).
Key operations:
push O(log n) -- append + bubble up
top O(1) -- peek at root
pop O(log n) -- swap root with last + bubble down
build O(n) -- Floyd's bottom-up algorithm
Array layout (0-indexed):
left(i) = 2i + 1
right(i) = 2i + 2
parent(i) = (i-1) / 2
C++ STL:
priority_queue<T> -- max heap
priority_queue<T, vector<T>, greater<T>> -- min heap
Classic applications:
- K-th largest/smallest (min-heap of size k)
- Merge K sorted lists (min-heap of size k)
- Running median (two heaps)
- Dijkstra's shortest path
- Heap sort (in-place, O(n log n) guaranteed)
- Event-driven simulation
Variants:
- D-ary heap (d children per node, shallower tree)
- Fibonacci heap (O(1) amortized insert and decrease-key)
- Indexed heap (supports decrease-key via position tracking)