π Hash Tables (Hashing)
A Hash Table is a data structure that stores keyβvalue pairs and provides extremely fast operations:
- Search β O(1) average
- Insert β O(1) average
- Delete β O(1) average
Hash tables achieve this performance using a mathematical function called a hash function.
π Why Hash Tables?
Hash Tables are used when:
- Fast lookup is required
- Ordering of data does NOT matter
- Keys may be large (strings, objects)
- We need efficient symbol tables, dictionaries, indexing
Examples:
- Python dictionary (
dict) - Hash sets
- Databases for indexing
- Compilers for symbol tables
π Hash Function
A hash function maps a key to an index of the hash table:
[
h(k) = k \mod m
]
Where:
k= keym= size of table (number of slots/buckets)
Good hash function must:
- Be fast
- Distribute keys uniformly
- Minimize collisions
π Collision
A collision occurs when two different keys map to the same index.
No matter how good the hash function is, collisions cannot be avoided completely.
Therefore, hashing includes collision handling techniques.
π Collision Handling Techniques
There are two main types:
1. Open Addressing
Everything is stored inside the table itself.
When collision occurs, find another free slot.
Methods:
- Linear Probing:
[
h(k, i) = (h(k) + i) \mod m
] - Quadratic Probing:
[
h(k, i) = (h(k) + i^2) \mod m
] - Double Hashing:
[
h(k, i) = (h(k) + i \cdot h_2(k)) \mod m
]
Pros:
- Easy to implement
- Cache-friendly
Cons:
- Clustering problem
- Table must have free slots
2. Separate Chaining
Each slot points to a linked list (or dynamic array) of keys.
Example:
index 2 --> [43 β 62 β 55]
Pros:
- Easy to implement
- Good for large load factors
- No clustering
Cons:
- Extra memory for pointers
- Worst case search = O(n)
Python dict uses open addressing with collision resolution (not chaining).
π Load Factor (Ξ±)
[
\alpha = \frac{\text{Number of keys}}{\text{Number of buckets}}
]
- If Ξ± becomes large β performance drops
- Hash tables resize (rehash) when load factor crosses threshold
Typical:
- Ξ± β€ 0.7 (70%) for open addressing
- Ξ± can exceed 1 for chaining
π Performance Summary
| Operation | Average Case | Worst Case |
|---|---|---|
| Search | O(1) | O(n) |
| Insert | O(1) | O(n) |
| Delete | O(1) | O(n) |
Worst case occurs when:
- All keys hash to same index
- Or load factor becomes too high
π Applications of Hash Tables
- Databases (indexing)
- Caches & memory management
- Symbol tables in compilers
- Sets & dictionaries
- Network routers & firewall rules
- Password storage (cryptographic hashing)
π Python Implementation of Hash Table (Separate Chaining)
Below is a simple custom implementation of a hash table using chaining.
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)] # buckets
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
# check if key exists - update
for pair in self.table[index]:
if pair[0] == key:
pair[1] = value
return
# else insert new key-value pair
self.table[index].append([key, value])
def search(self, key):
index = self.hash_function(key)
for pair in self.table[index]:
if pair[0] == key:
return pair[1]
return None # not found
def delete(self, key):
index = self.hash_function(key)
for pair in self.table[index]:
if pair[0] == key:
self.table[index].remove(pair)
return True
return False
def __repr__(self):
return str(self.table)
# Example usage
ht = HashTable(5)
ht.insert("apple", 10)
ht.insert("banana", 20)
ht.insert("orange", 30)
print("Search apple:", ht.search("apple"))
ht.delete("banana")
print("Table now:", ht)
π Example Output
Search apple: 10
Table now: [['apple', 10], [], ['orange', 30], [], []]
π Summary (Exam Notes)
- Hash Table = keyβvalue mapping with O(1) average operations
- Uses hash function to compute an index
- Collisions handled by open addressing or separate chaining
- Load factor affects performance
- Used in dictionaries, databases, symbol tables, caches
- Worst case O(n), but average O(1)
