Heap Sort
Heap is a data structure that is a complete binary tree that satisfies the heap property.
Usually, we use tree to represent the heap.
The parent node is always greater (or smaller) than the child node.
In python, min heap is implemented by heapq library.
In C++, we can use priority_queue to implement the heap.
A heap is a complete binary tree that satisfies the heap property: every parent is less than or equal to its children (min-heap) or greater than or equal to its children (max-heap). Because a complete binary tree can be stored contiguously in an array, the heap doubles as an in-place priority queue.
Heap sort exploits this by first building a max-heap from the input, then repeatedly swapping the root (the current maximum) with the last unsorted slot and shrinking the heap by one. After extractions, the array is sorted in ascending order.
Codes
- Python
- C++
Basic usage of the heapq module (min-heap):
import heapq
heap = []
heapq.heappush(heap, 1)
heapq.heappush(heap, 2)
heapq.heappush(heap, 3)
print(heapq.heappop(heap))
A full heap sort built on top of heapq:
import heapq
def solve(xs):
"""Return xs sorted ascending using a heap."""
h = list(xs)
heapq.heapify(h) # O(n) bottom-up build
return [heapq.heappop(h) for _ in range(len(h))]
Basic usage of std::priority_queue (max-heap by default):
#include <queue>
priority_queue<int> heap;
heap.push(1);
heap.push(2);
heap.push(3);
cout << heap.top() << endl;
In-place heap sort using the lower-level STL routines:
#include <algorithm>
#include <vector>
std::vector<int> heap_sort(std::vector<int> xs) {
std::make_heap(xs.begin(), xs.end()); // max-heap, O(n)
for (auto end = xs.end(); end != xs.begin(); --end) {
std::pop_heap(xs.begin(), end); // move max to end-1
}
return xs; // now ascending
}
Example
Description
Time complexity: . The heap structure can also be maintained by an AVL tree if you want explicit balanced-tree guarantees, but the array representation is simpler and faster in practice.
Space complexity: for the Python version, auxiliary when the sort is done in place on the input array.
Run time analysis
.
Building the heap with heapify runs in by the classical siftdown
argument — the sum of heights over all nodes is bounded by a convergent
geometric-style series and collapses to . Each of the extractions
does one siftdown of depth , giving overall. The
bound is tight in the worst and average case.
Space analysis
auxiliary when implemented in-place on the input array, as in the C++
version above. The Python version allocates a new list and thus uses
extra space — that's a cost of working with immutable integers and the
heapq API rather than a property of the algorithm itself.
Proof of correctness
Let hold the input. After make_heap, satisfies the max-heap
property: for all . We then
maintain the invariant "after iterations of the extraction loop,
contains the largest elements in sorted ascending order, and
is a max-heap". The base case is the state right after make_heap.
On step , pop_heap swaps (the current max) with ,
then restores the heap on by sifting down — preserving both
halves of the invariant. After iterations, is sorted ascending.
Extensions
Applications
References
- Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms, 3rd ed., Chapter 6 (Heapsort).
- cp-algorithms — Heap and heap sort