๐ Hash Functions (under Hashing)
A Hash Function is a mathematical function that converts a key into a fixed-size integer (hash value or index) used to store the key in a hash table.
[
h(k) = \text{index in hash table}
]
The goal of a hash function is to map keys uniformly across the table so that collisions are minimized.
๐ Why Hash Functions Are Important?
A good hash function ensures:
- Uniform distribution
- Low collision rate
- Fast computation
- Efficient search, insert, delete
- Consistent O(1) average performance
๐ Properties of a Good Hash Function
1. Deterministic
Same key โ same hash value every time.
2. Uniform Distribution
Keys should spread evenly across the table.
3. Fast to compute
Must run in constant time O(1).
4. Minimize Collisions
Different keys should rarely map to same index.
5. Use full range of hash table
Avoid clustering.
๐ Types of Hash Functions
1. Division (Mod) Method
[
h(k) = k \bmod m
]
Where:
k= keym= table size (preferably prime)
Example:
h(100) = 100 % 7 = 2
Pros: Simple, fast
Cons: Poor if m has patterns with key distribution.
2. Multiplication Method (Knuth)
[
h(k) = \lfloor m (kA – \lfloor kA \rfloor ) \rfloor
]
Where:
- A is a constant, usually 0.6180339887 (fraction of golden ratio)
Pros: Works well for any table size
Cons: Slightly more expensive to compute
3. Mid-Square Method
- Square the key.
- Extract middle digits as hash value.
Example:
Key = 123
123ยฒ = 15129
Middle digits = 512 โ h(k) = 512
Pros: Very uniform
Cons: Expensive for large keys
4. Folding Method
Break key into equal parts and add them.
Example:
Key = 123456 โ (123 + 456) = 579
Pros: Works for large integers
Cons: Still may collide
5. Universal Hashing
Use a family of hash functions, randomly choose one at runtime.
[
h_{a,b}(k) = ((ak + b) \bmod p) \bmod m
]
Where:
- p is a large prime
- a โ 0, b < p
Pros: Minimizes worst-case collisions
Used in cryptography & randomized algorithms
6. Cryptographic Hash Functions (not used for hash tables)
Examples:
- MD5
- SHA-1
- SHA-256
Properties: irreversible, secure
Usage: password hashing, digital signatures
(Not used for normal hash tables because more expensive)
๐ Hashing for Strings
Common technique:
Polynomial Rolling Hash
[
h(s) = (s_0 a^0 + s_1 a^1 + \dots + s_{n-1} a^{n-1}) \bmod m
]
Where:
- characters โ integers
- a = constant like 31 or 37
- m = large prime
Used by Java, Python, C++ strings internally.
๐ Hash Function Performance
A good hash function should:
- Avoid clustering
- Reduce collisions
- Spread keys uniformly
- Work fast in O(1) time
Bad hash functions cause:
- Long chains (in chaining)
- Long probe sequences (in open addressing)
โ Result: performance degrades to O(n)
๐ Python Example: Simple Hash Functions
def division_hash(key, table_size):
return key % table_size
def multiplication_hash(key, table_size):
A = 0.6180339887 # suggested constant
return int(table_size * ((key * A) % 1))
def mid_square_hash(key, table_size):
square = key * key
mid = (square // 10) % 1000 # extract middle digits
return mid % table_size
def string_hash(s, table_size):
h = 0
a = 31 # multiplier
for char in s:
h = (h * a + ord(char)) % table_size
return h
๐ Summary (Exam Notes)
- Hash function maps a key โ index
- Must be fast, uniform, and minimize collisions
- Techniques:
- Division method
- Multiplication method
- Mid-square
- Folding
- Universal hashing
- Polynomial rolling hash for strings
- Good hash function ensures O(1) average performance
- Bad hash function causes clustering and O(n) performance
