📘 Hash Tables (Hashing)

A Hash Table is a data structure that stores key–value pairs and provides extremely fast operations:

Search → O(1) average
Insert → O(1) average
Delete → O(1) average

Hash tables achieve this performance using a mathematical function called a hash function.

📌 Why Hash Tables?

Hash Tables are used when:

Fast lookup is required
Ordering of data does NOT matter
Keys may be large (strings, objects)
We need efficient symbol tables, dictionaries, indexing

Examples:

Python dictionary (dict)
Hash sets
Databases for indexing
Compilers for symbol tables

📘 Hash Function

A hash function maps a key to an index of the hash table:

[
h(k) = k \mod m
]

Where:

k = key
m = size of table (number of slots/buckets)

Good hash function must:

Be fast
Distribute keys uniformly
Minimize collisions

📘 Collision

A collision occurs when two different keys map to the same index.

No matter how good the hash function is, collisions cannot be avoided completely.

Therefore, hashing includes collision handling techniques.

📘 Collision Handling Techniques

There are two main types:

1. Open Addressing

Everything is stored inside the table itself.
When collision occurs, find another free slot.

Methods:

Linear Probing:
[
h(k, i) = (h(k) + i) \mod m
]
Quadratic Probing:
[
h(k, i) = (h(k) + i^2) \mod m
]
Double Hashing:
[
h(k, i) = (h(k) + i \cdot h_2(k)) \mod m
]

Pros:

Easy to implement
Cache-friendly

Cons:

Clustering problem
Table must have free slots

2. Separate Chaining

Each slot points to a linked list (or dynamic array) of keys.

Example:

index 2 --> [43 → 62 → 55]

Pros:

Easy to implement
Good for large load factors
No clustering

Cons:

Extra memory for pointers
Worst case search = O(n)

Python dict uses open addressing with collision resolution (not chaining).

📘 Load Factor (α)

[
\alpha = \frac{\text{Number of keys}}{\text{Number of buckets}}
]

If α becomes large → performance drops
Hash tables resize (rehash) when load factor crosses threshold

Typical:

α ≤ 0.7 (70%) for open addressing
α can exceed 1 for chaining

📘 Performance Summary

Operation	Average Case	Worst Case
Search	O(1)	O(n)
Insert	O(1)	O(n)
Delete	O(1)	O(n)

Worst case occurs when:

All keys hash to same index
Or load factor becomes too high

📘 Applications of Hash Tables

Databases (indexing)
Caches & memory management
Symbol tables in compilers
Sets & dictionaries
Network routers & firewall rules
Password storage (cryptographic hashing)

📘 Python Implementation of Hash Table (Separate Chaining)

Below is a simple custom implementation of a hash table using chaining.

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # buckets

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)

        # check if key exists - update
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value
                return
        
        # else insert new key-value pair
        self.table[index].append([key, value])

    def search(self, key):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]
        return None  # not found

    def delete(self, key):
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                self.table[index].remove(pair)
                return True
        return False

    def __repr__(self):
        return str(self.table)


# Example usage
ht = HashTable(5)
ht.insert("apple", 10)
ht.insert("banana", 20)
ht.insert("orange", 30)
print("Search apple:", ht.search("apple"))
ht.delete("banana")
print("Table now:", ht)

📘 Example Output

Search apple: 10
Table now: [['apple', 10], [], ['orange', 30], [], []]

📘 Summary (Exam Notes)

Hash Table = key–value mapping with O(1) average operations
Uses hash function to compute an index
Collisions handled by open addressing or separate chaining
Load factor affects performance
Used in dictionaries, databases, symbol tables, caches
Worst case O(n), but average O(1)