16.1 The Hash Function
Hashing removes the searching almost completely. Instead of looking for where a record is, you calculate where it must be. You feed the record's key (a unique number, like a student ID) into a small formula called a hash function. The formula gives you an address — the exact index in the array where the record lives.
address = Key MOD nwherenis the number of slots.- MOD gives the remainder, which is always between 0 and n-1 — a valid array index.
# Think of n lockers numbered 0 to n-1.
# A book with code 3547 in 2000 lockers goes to:
# 3547 MOD 2000 = 1547 (because 3547 - 2000 = 1547)
# Another book with code 5547:
# 5547 MOD 2000 = 1547 (because 5547 - 4000 = 1547)
# Same address! -> collision!- How many whole 200s fit into 3024?
- 3024 DIV 200 = 15 (15 x 200 = 3000)
- Subtract: 3024 - 3000 = ?
- How many whole 200s fit into 5012?
- Then subtract
- Does 50 divide 250 exactly?
- 50 x 5 = 250, remainder 0
- 100 x 12 = 1200
- 1207 - 1200 = ?
- Shortcut: MOD 100 is just the last two digits
The Hash Function
16.2 What is a Collision?
The hash function is fast, but it is not perfect. Different keys can produce the same address. For example, with MOD 2000:
3547 MOD 2000 = 1547
5547 MOD 2000 = 1547 # same address — collision!- A collision occurs when two different keys produce the same hash address.
- With n slots and infinitely many possible keys, collisions are inevitable (pigeonhole principle).
- They cannot be eliminated — only minimised and handled.
- Calculate 4821 MOD 2000
- Calculate 6821 MOD 2000
- Compare the two results
- Calculate both hash values
- Compare
Collisions
16.3 How to Deal with Collisions
There are three standard ways to handle a collision. Cambridge accepts any of them on the theory paper:
- 1. Linear probing (open addressing): start at the hashed slot and move forward one slot at a time until you find the first free space; store the record there.
- 2. Overflow area: keep a separate "spare" array; when the main slot is full, store the record in the next free space in the overflow area.
- 3. Chaining: each slot holds a reference to a linked list (chain) of records; the new record is added to that chain.
- The Paper 4 program in 9618/42 uses method 2 — a separate array called Spare.
- The main table (HashTable) has 200 slots; the overflow array (Spare) has 100 slots.
- When the calculated slot in HashTable is already taken, the record goes to the next free space in Spare.
- 305 MOD 100 = 5, slot 5 empty
- 412 MOD 100 = 12, slot 12 empty
- 205 MOD 100 = 5, slot 5 taken (collision!) goes to Spare[0]
- 740 MOD 100 = 40, empty
- 218 MOD 100 = 18, empty
- 540 MOD 100 = 40, taken (collision) goes to Spare[0]
- 318 MOD 100 = 18, taken (collision) goes to Spare[1]
- Start at the hashed slot
- Check next slots one by one
- Store at the first free space
- What does each slot point to?
- Colliding records are added to what?
Handling Collisions
16.3B Interactive Simulator
Try the hashing algorithm yourself. Enter keys (comma-separated), set the table and spare sizes, then click Start. Watch each key being hashed (Key MOD n), stored directly in the HashTable (green), or sent to the Spare overflow array (amber) on collision (red).
Simulation
HashTable (main)
Spare (overflow)
Status: Click Start to begin.
Code
1HashValue = CalculateHash(TheRecord.Key)2if HashTable[HashValue].Key == -1:3 HashTable[HashValue] = TheRecord4else: # collision!5 for i in range(100):6 if Spare[i].Key == -1:7 Spare[i] = TheRecord8 break16.4 Coding It — The Full Paper 4 Routine
This is the complete routine asked in 9618/42 May/June 2025. We build it piece by piece: the record class, two arrays, Initialise, CalculateHash, InsertIntoHash, CreateHashTable, and PrintSpare.
20.1.4a The record class
class NewRecord:
def __init__(self, key, item1, item2):
self.Key = key
self.Item1 = item1
self.Item2 = item220.1.4b The two arrays
HashTable = [] # main table, 200 slots
Spare = [] # overflow, 100 slots20.1.4c Initialise()
def Initialise():
global HashTable, Spare
HashTable = [NewRecord(-1, -1, -1) for i in range(200)]
Spare = [NewRecord(-1, -1, -1) for i in range(100)]- An empty record has -1 in each field.
- This is how we later know a slot is free — if
HashTable[i].Key == -1, the slot is available.
20.1.4d CalculateHash()
def CalculateHash(Key):
return Key % 20020.1.4e InsertIntoHash()
def InsertIntoHash(TheRecord):
global HashTable, Spare
HashValue = CalculateHash(TheRecord.Key)
if HashTable[HashValue].Key == -1: # slot is free
HashTable[HashValue] = TheRecord
else: # collision
for i in range(100):
if Spare[i].Key == -1:
Spare[i] = TheRecord
break # first free slot only- Once you find the first free Spare slot and store the record,
break. - Otherwise you would store the same record in every free slot.
20.1.4f CreateHashTable() — load from file
def CreateHashTable():
try:
File = open("HashData.txt")
for Line in File:
Data = Line.strip().split(",")
R = NewRecord(int(Data[0]), int(Data[1]), int(Data[2]))
InsertIntoHash(R)
File.close()
except:
print("Cannot open file")20.1.4g PrintSpare() and the main program
def PrintSpare():
for i in range(100):
if Spare[i].Key != -1:
print(Spare[i].Key)
# main program
Initialise()
CreateHashTable()
PrintSpare()- Same shape as NewRecord — class with __init__
- Three parameters, three self assignments
- Same shape, three parameters
- Only one number changes
- Use Key % 150
- Only one number changes
- Same pattern, different sizes
- HashTable = 50, Spare = 20
- Fill with NewRecord(-1, -1, -1)
- Loop through all 200 slots
- Count where Key is not -1
- Return the count
The Record Class & Arrays
InsertIntoHash & CreateHashTable
16.5 Full Exam-Style Question
The Aymaan Health clinic stores patient data in a 1D array of records, Clinic, with space for 150 records. The index for each record is calculated from the key field using Key MOD 150. If two keys give the same index there is a collision; the colliding record is stored in the next free space in a second array, Reserve, which has space for 75 records.
PatientRecord with integer fields PatientID, Age and RoomNo.- Class with __init__ constructor
- Three integer parameters
Clinic and Reserve. [1] (ii) Write ClearAll() that stores an empty record (-1 in each field) in every element of both arrays. [2]- Declare as empty lists
- ClearAll uses global
- Fill with PatientRecord(-1,-1,-1)
GetSlot() that takes a key as a parameter and returns Key MOD 150.- Function header with one parameter
- Return Key % 150
AddPatient(P) that takes a PatientRecord, uses GetSlot() on its key, stores it in Clinic if that slot is empty, otherwise stores it in the first free slot of Reserve.- Call GetSlot(P.PatientID) to get the slot
- Check if Clinic[slot].PatientID == -1 (free)
- If free: store there
- If collision: loop Reserve, find first free, store, break
LoadPatients() that reads patients.txt (three comma-separated integers per line), builds a record from each row, and calls AddPatient(). Include exception handling.- Open file inside try
- Loop, strip, split on comma
- Create PatientRecord, call AddPatient
- Close file, except: print error
| Part | Marking points | Max |
|---|---|---|
| (a) |
| 2 |
| (b)(i) |
| 1 |
| (b)(ii) |
| 2 |
| (c) |
| 2 |
| (d) |
| 6 |
| (e) |
| 3 |
Key Points Recap
✓ Key Points Summary
16.6 Practice Tasks
Fifteen exam-style hashing tasks. Each shows only the question — click Hint for the thought process, or Help for the worked solution.
- For every hashing task, the marker checks: (1) correct hash function (Key MOD n), (2) correct collision detection, (3) correct overflow handling (first free Spare slot + break), (4) correct record class with all fields.
- The most common mark loss is forgetting to break after storing in Spare.
Question Bank
Answer all questions, then press Submit Quiz to see your score.
Question 1Multiple Choice
What is the standard hash function in 9618?
Question 2True / False
3024 MOD 200 = 24.
Question 3Multiple Choice
What is a collision?
Question 4Multiple Choice
Which is NOT a collision handling method?
Question 5True / False
Empty slots are represented by Key = -1.
Question 6Multiple Choice
What does InsertIntoHash do on a collision?
Question 7Multiple Choice
What does Initialise() do?
Question 8True / False
4821 and 6821 collide in a table of 2000 slots.
Question 9Multiple Choice
What does CalculateHash(Key) return for a 200-slot table?
Question 10Multiple Choice
Linear probing means:
Question 11True / False
Chaining stores a linked list at each slot for colliding records.
Question 12Multiple Choice
What does CreateHashTable do?
Answer all 12 questions to enable submission.