DSA Fundamentals: Heaps & Priority Queues - Efficient Top-K and Scheduling Problems
Master heaps and priority queues for solving top-K, scheduling, and streaming problems efficiently. Covering MinHeap/MaxHeap concepts, heapify operations, and essential LeetCode patterns with Python solutions.
Heaps and Priority Queues are specialized tree-based data structures that efficiently maintain ordered elements, making them ideal for problems involving "top K elements," task scheduling, and real-time data streaming. Unlike binary search trees that maintain complete ordering, heaps only guarantee that the parent is greater (or smaller) than its children, enabling faster operations.
This guide explores heap fundamentals, time complexity analysis, and demonstrates essential patterns through practical LeetCode problems.
Understanding Heaps: Core Concepts
What is a Heap?
A Heap is a complete binary tree that satisfies the heap property. It's a specialized data structure that allows efficient access to the minimum (MinHeap) or maximum (MaxHeap) element.
Key Characteristics:
- Complete Binary Tree: All levels are fully filled except possibly the last, which fills from left to right
- Heap Property: Defines the relationship between parent and child nodes
- Array Representation: Despite being conceptually a tree, heaps are typically stored as arrays
Array Representation:
For a node at index i:
- Left child:
2 * i + 1 - Right child:
2 * i + 2 - Parent:
(i - 1) // 2
# Heap as array
[1, 3, 5, 7, 9, 8, 10]
# Visualized as tree:
1
/ \
3 5
/ \ / \
7 9 8 10
MinHeap vs MaxHeap
MinHeap:
- Property: Parent value ≤ children values
- Root: Contains the minimum element
- Use case: Finding smallest elements, ascending priority
# MinHeap example
1
/ \
3 5
/ \
7 9
# Array: [1, 3, 5, 7, 9]
MaxHeap:
- Property: Parent value ≥ children values
- Root: Contains the maximum element
- Use case: Finding largest elements, descending priority
# MaxHeap example
9
/ \
7 5
/ \
3 1
# Array: [9, 7, 5, 3, 1]
Time Complexity
| Operation | Time Complexity | Description |
|---|---|---|
heapify() | O(n) | Convert array to heap |
push() | O(log n) | Insert element |
pop() | O(log n) | Extract min/max |
peek() | O(1) | View min/max |
| Heap Sort | O(n log n) | Sort using heap |
| Build Heap | O(n) | Bottom-up construction |
Why heapify is O(n), not O(n log n): Building a heap from scratch uses bottom-up approach where most elements require few swaps. Mathematical analysis shows this sums to O(n).
Priority Queue
A Priority Queue is an abstract data type where each element has a priority. Elements are served based on priority, not insertion order.
Implementation:
- Typically implemented using heaps
- MinHeap: Lower value = higher priority
- MaxHeap: Higher value = higher priority
Real-World Applications:
- Task Scheduling: OS process scheduling, CPU task management
- Dijkstra's Algorithm: Finding shortest paths in graphs
- Huffman Coding: Data compression algorithms
- Event-Driven Simulation: Processing events by timestamp
- Median Maintenance: Finding median in streaming data
Python's heapq Module
Python provides the heapq module for heap operations. Important: Python's heapq only implements MinHeap.
Basic Operations:
import heapq
# Create a MinHeap
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 7)
# heap = [3, 5, 7]
# Extract minimum
min_val = heapq.heappop(heap) # 3
# Convert list to heap (in-place)
nums = [5, 2, 8, 1, 9]
heapq.heapify(nums) # [1, 2, 8, 5, 9]
# Peek at minimum (without removing)
min_val = heap[0] # Don't use heappop if you just want to peek
# K largest elements
k_largest = heapq.nlargest(3, [1, 5, 2, 9, 3]) # [9, 5, 3]
# K smallest elements
k_smallest = heapq.nsmallest(3, [1, 5, 2, 9, 3]) # [1, 2, 3]
Simulating MaxHeap: Since Python only has MinHeap, negate values to simulate MaxHeap:
# MaxHeap trick: negate values
max_heap = []
heapq.heappush(max_heap, -5) # Store as -5
heapq.heappush(max_heap, -3) # Store as -3
heapq.heappush(max_heap, -7) # Store as -7
max_val = -heapq.heappop(max_heap) # Get 7 (largest)
Essential Heap Patterns
Pattern 1: Last Stone Weight (MaxHeap Simulation)
Problem: You have stones with different weights. On each turn, take the two heaviest stones and smash them together. If they have different weights, the stone with weight x - y remains. Return the weight of the last remaining stone (or 0 if none remain).
Example:
Input: stones = [2,7,4,1,8,1]
Output: 1
Explanation:
- Smash 8 and 7 → 1 remains: [2,4,1,1,1]
- Smash 4 and 2 → 2 remains: [2,1,1,1]
- Smash 2 and 1 → 1 remains: [1,1,1]
- Smash 1 and 1 → nothing: [1]
- Result: 1
Approach: Use MaxHeap to efficiently access the two largest stones. Simulate the smashing process until ≤1 stone remains.
Time: O(n log n) | Space: O(n)
import heapq
class Solution:
def lastStoneWeight(self, stones: List[int]) -> int:
# Convert to MaxHeap by negating values
stones = [-stone for stone in stones]
heapq.heapify(stones) # O(n)
# Continue until 0 or 1 stone remains
while len(stones) > 1:
# Extract two largest stones
first = heapq.heappop(stones) # Largest (most negative)
second = heapq.heappop(stones) # Second largest
# Calculate difference
res = first - second # Still negative
# If stones have different weights, add the difference
if res != 0:
heapq.heappush(stones, res)
# Return last stone weight (or 0 if empty)
return -stones[0] if stones else 0
Key Insight: Negating values simulates MaxHeap. The heap automatically maintains the largest elements at the top, enabling efficient O(log n) access.
Pattern 2: Kth Largest Element
Problem: Find the kth largest element in an unsorted array.
Example:
Input: nums = [3,2,1,5,6,4], k = 2
Output: 5
Explanation: The 2nd largest element is 5
Approach: Maintain a MinHeap of size k. The root will always be the kth largest element. When the heap exceeds size k, remove the smallest element.
Why MinHeap? By keeping only the k largest elements in a MinHeap, the smallest of these k elements (the root) is the kth largest overall.
Time: O(n log k) | Space: O(k)
import heapq
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
min_heap = []
for num in nums:
heapq.heappush(min_heap, num)
# Maintain heap size of k
if len(min_heap) > k:
heapq.heappop(min_heap) # Remove smallest
# Root of MinHeap is kth largest
return min_heap[0]
Optimization - Quick Select: Quick Select achieves O(n) average time, but heap approach is simpler and guaranteed O(n log k).
Key Insight: For "Kth largest" problems, use MinHeap of size k. For "Kth smallest" problems, use MaxHeap of size k (or negate for MinHeap).
Pattern 3: K Closest Points to Origin
Problem: Given an array of points [x, y] on a 2D plane, find the k closest points to the origin (0, 0). Distance is Euclidean: √(x² + y²).
Example:
Input: points = [[1,3],[-2,2],[5,8],[0,1]], k = 2
Output: [[-2,2],[0,1]]
Explanation:
- Distance [1,3]: √(1² + 3²) = √10 ≈ 3.16
- Distance [-2,2]: √(4 + 4) = √8 ≈ 2.83
- Distance [5,8]: √(25 + 64) = √89 ≈ 9.43
- Distance [0,1]: √(0 + 1) = 1
- 2 closest: [0,1] and [-2,2]
Approach: Use MaxHeap to maintain k closest points. Negate distances to simulate MaxHeap. When heap exceeds size k, remove the farthest point.
Optimization: Skip √ calculation—just use x² + y² since ordering is preserved.
Time: O(n log k) | Space: O(k)
import heapq
class Solution:
def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]:
min_heap = []
for point in points:
# Calculate squared distance (skip sqrt for efficiency)
dist_squared = (point[0] ** 2) + (point[1] ** 2)
# Use MaxHeap by negating distance
heapq.heappush(min_heap, (-dist_squared, point))
# Maintain heap size of k
if len(min_heap) > k:
heapq.heappop(min_heap) # Remove farthest point
# Extract points (ignore distances)
return [point for _, point in min_heap]
Key Insight: For "K closest/smallest" problems with custom metrics, pair the metric with the data: (metric, data). Python's heapq compares by the first tuple element.
Pattern 4: Task Scheduler (Greedy + Math)
Problem: Given tasks represented by characters and a cooldown period n between identical tasks, return the minimum time units needed to complete all tasks.
Example:
Input: tasks = ["A","A","A","B","B","B"], n = 2
Output: 8
Explanation: A -> B -> idle -> A -> B -> idle -> A -> B
Approach: This is a math problem disguised as a heap problem. The task with maximum frequency determines the minimum time.
Formula:
minTime = max(
len(tasks),
(max_freq - 1) * (n + 1) + max_count
)
Where:
max_freq: Highest task frequencymax_count: Number of tasks with max frequencyn: Cooldown period
Why this works:
- Arrange tasks with max frequency first, separated by n cooldown slots
- Fill cooldown slots with other tasks
- If multiple tasks have max frequency, they all appear in the final slot
Time: O(n) | Space: O(1)
from collections import Counter
class Solution:
def leastInterval(self, tasks: List[str], n: int) -> int:
# Count frequency of each task
freq = Counter(tasks)
# Find maximum frequency
max_freq = max(freq.values())
# Count how many tasks have max frequency
max_count = list(freq.values()).count(max_freq)
# Formula: (max_freq-1) chunks of (n+1) slots + max_count tasks in final chunk
# Or just len(tasks) if that's larger (when cooldown isn't a constraint)
return max(len(tasks), ((max_freq - 1) * (n + 1) + max_count))
Example Walkthrough:
tasks = ["A","A","A","B","B","B"], n = 2
freq = {"A": 3, "B": 3}
max_freq = 3
max_count = 2 (both A and B have frequency 3)
Formula: (3-1) * (2+1) + 2 = 2 * 3 + 2 = 8
Layout: A -> B -> idle -> A -> B -> idle -> A -> B
[ chunk 1 ] [ chunk 2 ] [final]
Key Insight: Recognize when a problem is mathematical rather than algorithmic. Not all heap-tagged problems require heap implementation.
Pattern 5: Kth Largest in Stream (Design Problem)
Problem: Design a class to find the kth largest element in a stream of numbers. Implement:
KthLargest(k, nums): Initialize with k and initial arrayadd(val): Add value to stream and return kth largest
Example:
KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]);
kthLargest.add(3); // returns 4
kthLargest.add(5); // returns 5
kthLargest.add(10); // returns 5
kthLargest.add(9); // returns 8
kthLargest.add(4); // returns 8
Approach: Maintain a MinHeap of size k. The root is always the kth largest element. When adding a new element, maintain heap size by removing the smallest when it exceeds k.
Time: __init__: O(n log k), add: O(log k) | Space: O(k)
import heapq
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.min_heap = nums
self.k = k
# Build initial heap
heapq.heapify(self.min_heap) # O(n)
# Reduce to size k
while len(self.min_heap) > k:
heapq.heappop(self.min_heap)
def add(self, val: int) -> int:
# Add new value
heapq.heappush(self.min_heap, val)
# Maintain size k
if len(self.min_heap) > self.k:
heapq.heappop(self.min_heap)
# Root is kth largest
return self.min_heap[0]
# Usage:
# obj = KthLargest(k, nums)
# result = obj.add(val)
Optimization Consideration:
# Alternative: Check before pushing (slight optimization)
def add(self, val: int) -> int:
if len(self.min_heap) < self.k:
heapq.heappush(self.min_heap, val)
elif val > self.min_heap[0]:
heapq.heapreplace(self.min_heap, val) # Pop and push in one operation
return self.min_heap[0]
Key Insight: For streaming "top K" problems, maintain a heap of size k. This provides O(log k) insertion and O(1) access to the kth element—much better than sorting the entire stream O(n log n) on each addition.
Common Heap Patterns & Techniques
1. Top K Elements Pattern
When to use: Finding k largest/smallest elements
Approach:
- K largest: Use MinHeap of size k (counterintuitive but correct!)
- K smallest: Use MaxHeap of size k (or negate for MinHeap)
Why it works: By maintaining size k, the root represents the "boundary" element (kth largest or kth smallest).
Time: O(n log k) vs O(n log n) for full sort
2. MaxHeap Simulation
Problem: Python only provides MinHeap
Solution: Negate all values
# For integers
max_heap = [-x for x in nums]
heapq.heapify(max_heap)
# When extracting
max_val = -heapq.heappop(max_heap)
# For tuples (e.g., priority, data)
heapq.heappush(max_heap, (-priority, data))
3. Two Heaps Pattern
Use case: Finding median in a stream
Structure:
- MaxHeap for smaller half of numbers
- MinHeap for larger half of numbers
- Keep heaps balanced (sizes differ by at most 1)
Median:
- If sizes equal: average of two roots
- If unequal: root of larger heap
class MedianFinder:
def __init__(self):
self.small = [] # MaxHeap (negate values)
self.large = [] # MinHeap
def addNum(self, num: int) -> None:
# Add to MaxHeap (small)
heapq.heappush(self.small, -num)
# Balance: ensure max(small) <= min(large)
if self.small and self.large and (-self.small[0] > self.large[0]):
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
# Balance sizes
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small) + 1:
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return -self.small[0]
if len(self.large) > len(self.small):
return self.large[0]
return (-self.small[0] + self.large[0]) / 2
4. Custom Comparisons with Tuples
Technique: Python's heapq compares tuples lexicographically
# Compare by priority first, then by value
heapq.heappush(heap, (priority, value))
# Compare by distance, store entire object
heapq.heappush(heap, (distance, (x, y)))
# Multiple criteria
heapq.heappush(heap, (priority, timestamp, task_id, data))
5. Heapify vs Repeated Push
Building a heap:
# Method 1: heapify (preferred for initial construction)
nums = [5, 2, 8, 1, 9]
heapq.heapify(nums) # O(n)
# Method 2: repeated push
heap = []
for num in nums:
heapq.heappush(heap, num) # O(n log n)
Use heapify when converting an existing list to a heap. It's faster: O(n) vs O(n log n).
Heap vs Other Data Structures
| Data Structure | Insert | Delete | Find Min/Max | Find Kth | Use Case |
|---|---|---|---|---|---|
| Heap | O(log n) | O(log n) | O(1) | O(k log n) | Priority, streaming top K |
| Sorted Array | O(n) | O(n) | O(1) | O(1) | Static data, frequent queries |
| BST | O(log n) | O(log n) | O(log n) | O(k) | Dynamic, range queries |
| Hash Table | O(1) | O(1) | O(n) | O(n log n) | Fast lookup, no ordering |
When to use Heap:
- ✅ Streaming/dynamic data with priority
- ✅ Finding top K elements efficiently
- ✅ Scheduling and event processing
- ✅ Median maintenance
- ❌ Need to search arbitrary elements (use BST/Hash)
- ❌ Need all elements sorted (use sort or TreeMap)
Real-World Applications
1. Task Scheduling (Operating Systems)
class TaskScheduler:
def __init__(self):
self.tasks = [] # MinHeap of (priority, task)
def schedule(self, priority: int, task: str):
heapq.heappush(self.tasks, (priority, task))
def execute_next(self) -> str:
if self.tasks:
priority, task = heapq.heappop(self.tasks)
return task
return None
2. Merge K Sorted Lists
def mergeKLists(lists):
heap = []
result = []
# Initialize heap with first element from each list
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0)) # (value, list_idx, element_idx)
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
# Add next element from same list
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return result
3. Dijkstra's Shortest Path
def dijkstra(graph, start):
distances = {node: float('inf') for node in graph}
distances[start] = 0
heap = [(0, start)] # (distance, node)
while heap:
current_dist, node = heapq.heappop(heap)
if current_dist > distances[node]:
continue
for neighbor, weight in graph[node]:
distance = current_dist + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(heap, (distance, neighbor))
return distances
4. Event-Driven Simulation
class EventSimulator:
def __init__(self):
self.events = [] # MinHeap of (timestamp, event)
def schedule_event(self, timestamp: float, event: str):
heapq.heappush(self.events, (timestamp, event))
def run_simulation(self):
while self.events:
timestamp, event = heapq.heappop(self.events)
print(f"[{timestamp}] {event}")
# Process event...
Practice Strategy
Foundation (Easy)
- Last Stone Weight - MaxHeap simulation
- Kth Largest Element - MinHeap pattern
- Top K Frequent Elements - Heap + HashMap
Core Patterns (Medium)
- K Closest Points - Custom metrics with heap
- Kth Largest in Stream - Design problem
- Meeting Rooms II - Interval scheduling
- Task Scheduler - Math + greedy
Advanced (Hard)
- Merge K Sorted Lists - Multi-way merge
- Find Median from Data Stream - Two heaps
- Sliding Window Median - Two heaps + balancing
- IPO - Greedy + two heaps
Tips for Success
-
Identify heap problems:
- Keywords: "top K", "kth largest/smallest", "schedule", "priority", "stream"
- Need fast access to min/max with dynamic data
-
Choose heap type:
- MinHeap for kth largest (counterintuitive!)
- MaxHeap for kth smallest
- Two heaps for median
-
Python specifics:
- Only MinHeap available → negate for MaxHeap
heapify()is O(n) → use for initialization- Tuples compare lexicographically → use for custom priority
-
Common mistakes:
- Using heap for problems better solved with sorting
- Forgetting to negate when simulating MaxHeap
- Not maintaining fixed heap size for top K problems
- Using
heappop()when you just want topeek()(useheap[0])
Key Takeaways
- Heaps provide O(1) access to min/max element with O(log n) updates
- For top K problems, maintain heap of size k (O(n log k) vs O(n log n) sort)
- Python only has MinHeap → negate values for MaxHeap simulation
- heapify is O(n) → faster than n pushes for initialization
- Use tuples for custom priorities and comparisons
- Two heaps pattern powerful for median and percentile problems
- Not all "heap" problems need heaps → recognize mathematical patterns
Heaps are essential for competitive programming, system design, and real-world applications like task scheduling, event processing, and streaming analytics. Master these patterns, and you'll efficiently solve a wide range of priority-based problems.
Continue building your DSA foundation. Heaps bridge the gap between simple data structures and graph algorithms. Practice consistently, and these patterns will become second nature.