Skip to content
Home Β» Hash Tables

Hash Tables


πŸ“˜ Hash Tables (Hashing)

A Hash Table is a data structure that stores key–value pairs and provides extremely fast operations:

  • Search β†’ O(1) average
  • Insert β†’ O(1) average
  • Delete β†’ O(1) average

Hash tables achieve this performance using a mathematical function called a hash function.


πŸ“Œ Why Hash Tables?

Hash Tables are used when:

  • Fast lookup is required
  • Ordering of data does NOT matter
  • Keys may be large (strings, objects)
  • We need efficient symbol tables, dictionaries, indexing

Examples:

  • Python dictionary (dict)
  • Hash sets
  • Databases for indexing
  • Compilers for symbol tables

πŸ“˜ Hash Function

A hash function maps a key to an index of the hash table:

[
h(k) = k \mod m
]

Where:

  • k = key
  • m = size of table (number of slots/buckets)

Good hash function must:

  • Be fast
  • Distribute keys uniformly
  • Minimize collisions

πŸ“˜ Collision

A collision occurs when two different keys map to the same index.

No matter how good the hash function is, collisions cannot be avoided completely.

Therefore, hashing includes collision handling techniques.


πŸ“˜ Collision Handling Techniques

There are two main types:


1. Open Addressing

Everything is stored inside the table itself.
When collision occurs, find another free slot.

Methods:

  • Linear Probing:
    [
    h(k, i) = (h(k) + i) \mod m
    ]
  • Quadratic Probing:
    [
    h(k, i) = (h(k) + i^2) \mod m
    ]
  • Double Hashing:
    [
    h(k, i) = (h(k) + i \cdot h_2(k)) \mod m
    ]

Pros:

  • Easy to implement
  • Cache-friendly

Cons:

  • Clustering problem
  • Table must have free slots

2. Separate Chaining

Each slot points to a linked list (or dynamic array) of keys.

Example:

index 2 --> [43 β†’ 62 β†’ 55]

Pros:

  • Easy to implement
  • Good for large load factors
  • No clustering

Cons:

  • Extra memory for pointers
  • Worst case search = O(n)

Python dict uses open addressing with collision resolution (not chaining).


πŸ“˜ Load Factor (Ξ±)

[
\alpha = \frac{\text{Number of keys}}{\text{Number of buckets}}
]

  • If Ξ± becomes large β†’ performance drops
  • Hash tables resize (rehash) when load factor crosses threshold

Typical:

  • Ξ± ≀ 0.7 (70%) for open addressing
  • Ξ± can exceed 1 for chaining

πŸ“˜ Performance Summary

OperationAverage CaseWorst Case
SearchO(1)O(n)
InsertO(1)O(n)
DeleteO(1)O(n)

Worst case occurs when:

  • All keys hash to same index
  • Or load factor becomes too high

πŸ“˜ Applications of Hash Tables

  • Databases (indexing)
  • Caches & memory management
  • Symbol tables in compilers
  • Sets & dictionaries
  • Network routers & firewall rules
  • Password storage (cryptographic hashing)

πŸ“˜ Python Implementation of Hash Table (Separate Chaining)

Below is a simple custom implementation of a hash table using chaining.

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # buckets

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)

        # check if key exists - update
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value
                return
        
        # else insert new key-value pair
        self.table[index].append([key, value])

    def search(self, key):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]
        return None  # not found

    def delete(self, key):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                self.table[index].remove(pair)
                return True
        return False

    def __repr__(self):
        return str(self.table)


# Example usage
ht = HashTable(5)
ht.insert("apple", 10)
ht.insert("banana", 20)
ht.insert("orange", 30)
print("Search apple:", ht.search("apple"))
ht.delete("banana")
print("Table now:", ht)

πŸ“˜ Example Output

Search apple: 10
Table now: [['apple', 10], [], ['orange', 30], [], []]

πŸ“˜ Summary (Exam Notes)

  • Hash Table = key–value mapping with O(1) average operations
  • Uses hash function to compute an index
  • Collisions handled by open addressing or separate chaining
  • Load factor affects performance
  • Used in dictionaries, databases, symbol tables, caches
  • Worst case O(n), but average O(1)