Skip to main content

Heap Sort

Heap is a data structure that is a complete binary tree that satisfies the heap property.

Usually, we use tree to represent the heap.

The parent node is always greater (or smaller) than the child node.

In python, min heap is implemented by heapq library.

In C++, we can use priority_queue to implement the heap.

A heap is a complete binary tree that satisfies the heap property: every parent is less than or equal to its children (min-heap) or greater than or equal to its children (max-heap). Because a complete binary tree can be stored contiguously in an array, the heap doubles as an in-place priority queue.

Heap sort exploits this by first building a max-heap from the input, then repeatedly swapping the root (the current maximum) with the last unsorted slot and shrinking the heap by one. After n1n-1 extractions, the array is sorted in ascending order.

Codes

Basic usage of the heapq module (min-heap):

import heapq

heap = []
heapq.heappush(heap, 1)
heapq.heappush(heap, 2)
heapq.heappush(heap, 3)

print(heapq.heappop(heap))

A full heap sort built on top of heapq:

import heapq

def solve(xs):
"""Return xs sorted ascending using a heap."""
h = list(xs)
heapq.heapify(h) # O(n) bottom-up build
return [heapq.heappop(h) for _ in range(len(h))]

Example

Loading Python runner...

Description

Time complexity: O(nlogn)O(n \log n). The heap structure can also be maintained by an AVL tree if you want explicit balanced-tree guarantees, but the array representation is simpler and faster in practice.

Space complexity: O(n)O(n) for the Python version, O(1)O(1) auxiliary when the sort is done in place on the input array.

Run time analysis

O(nlogn)O(n \log n).

Building the heap with heapify runs in O(n)O(n) by the classical siftdown argument — the sum of heights over all nodes is bounded by a convergent geometric-style series and collapses to O(n)O(n). Each of the nn extractions does one siftdown of depth O(logn)O(\log n), giving O(nlogn)O(n \log n) overall. The bound is tight in the worst and average case.

Space analysis

O(1)O(1) auxiliary when implemented in-place on the input array, as in the C++ version above. The Python version allocates a new list and thus uses O(n)O(n) extra space — that's a cost of working with immutable integers and the heapq API rather than a property of the algorithm itself.

Proof of correctness

Let A[0..n)A[0..n) hold the input. After make_heap, AA satisfies the max-heap property: A[(i1)/2]A[i]A[\lfloor (i-1)/2 \rfloor] \ge A[i] for all i>0i > 0. We then maintain the invariant "after kk iterations of the extraction loop, A[nk..n)A[n-k..n) contains the kk largest elements in sorted ascending order, and A[0..nk)A[0..n-k) is a max-heap". The base case k=0k = 0 is the state right after make_heap. On step k+1k+1, pop_heap swaps A[0]A[0] (the current max) with A[nk1]A[n-k-1], then restores the heap on A[0..nk1)A[0..n-k-1) by sifting down — preserving both halves of the invariant. After nn iterations, AA is sorted ascending.

Extensions

Applications

References