Searching, Sorting & Hashing

Hashing

Map keys to addresses for near-instant data retrieval — the hash function (Key MOD n), collisions, three collision handling methods (linear probing, overflow area, chaining), and the full Python hash table program with record class, InsertIntoHash, CreateHashTable, and a 16-mark exam-style task.

16.1 The Hash Function

Hashing removes the searching almost completely. Instead of looking for where a record is, you calculate where it must be. You feed the record's key (a unique number, like a student ID) into a small formula called a hash function. The formula gives you an address — the exact index in the array where the record lives.

The standard hash function:
  • address = Key MOD n where n is the number of slots.
  • MOD gives the remainder, which is always between 0 and n-1 — a valid array index.
# Think of n lockers numbered 0 to n-1.
# A book with code 3547 in 2000 lockers goes to:
#   3547 MOD 2000 = 1547  (because 3547 - 2000 = 1547)

# Another book with code 5547:
#   5547 MOD 2000 = 1547  (because 5547 - 4000 = 1547)
#   Same address! -> collision!
Your Turn — Calculate a hash value [1 marks]
Calculate the hash value of the record key 3024 for a table with 200 slots, using address = Key MOD 200.
Hint:
  • How many whole 200s fit into 3024?
  • 3024 DIV 200 = 15 (15 x 200 = 3000)
  • Subtract: 3024 - 3000 = ?
Your Turn
Calculate the hash value of the record key 5012 for a table with 200 slots.
Hint:
  • How many whole 200s fit into 5012?
  • Then subtract
Your Turn — MOD practice
Find 250 MOD 50.
Hint:
  • Does 50 divide 250 exactly?
  • 50 x 5 = 250, remainder 0
Your Turn
Find 1207 MOD 100.
Hint:
  • 100 x 12 = 1200
  • 1207 - 1200 = ?
  • Shortcut: MOD 100 is just the last two digits

16.2 What is a Collision?

The hash function is fast, but it is not perfect. Different keys can produce the same address. For example, with MOD 2000:

3547 MOD 2000 = 1547
5547 MOD 2000 = 1547   # same address — collision!
Key rule:
  • A collision occurs when two different keys produce the same hash address.
  • With n slots and infinitely many possible keys, collisions are inevitable (pigeonhole principle).
  • They cannot be eliminated — only minimised and handled.
Your Turn — Check for a collision [2 marks]
State whether keys 4821 and 6821 collide in a table of 2000 slots.
Hint:
  • Calculate 4821 MOD 2000
  • Calculate 6821 MOD 2000
  • Compare the two results
Your Turn
State whether keys 3010 and 5025 collide in a table of 2000 slots.
Hint:
  • Calculate both hash values
  • Compare

16.3 How to Deal with Collisions

There are three standard ways to handle a collision. Cambridge accepts any of them on the theory paper:

Key rule — three methods:
  • 1. Linear probing (open addressing): start at the hashed slot and move forward one slot at a time until you find the first free space; store the record there.
  • 2. Overflow area: keep a separate "spare" array; when the main slot is full, store the record in the next free space in the overflow area.
  • 3. Chaining: each slot holds a reference to a linked list (chain) of records; the new record is added to that chain.
Exam tip:
  • The Paper 4 program in 9618/42 uses method 2 — a separate array called Spare.
  • The main table (HashTable) has 200 slots; the overflow array (Spare) has 100 slots.
  • When the calculated slot in HashTable is already taken, the record goes to the next free space in Spare.
Your Turn — Overflow worked example [3 marks]
A table has 100 slots (MOD 100) and a spare overflow array. Records arrive with keys 305, 412, 205. Show where each is stored.
Hint:
  • 305 MOD 100 = 5, slot 5 empty
  • 412 MOD 100 = 12, slot 12 empty
  • 205 MOD 100 = 5, slot 5 taken (collision!) goes to Spare[0]
Your Turn
Same table (MOD 100, with overflow). Records arrive with keys 740, 218, 540, 318. Show where each is stored.
Hint:
  • 740 MOD 100 = 40, empty
  • 218 MOD 100 = 18, empty
  • 540 MOD 100 = 40, taken (collision) goes to Spare[0]
  • 318 MOD 100 = 18, taken (collision) goes to Spare[1]
Your Turn — Define linear probing
Define linear probing in one sentence.
Hint:
  • Start at the hashed slot
  • Check next slots one by one
  • Store at the first free space
Your Turn
Define chaining in one sentence.
Hint:
  • What does each slot point to?
  • Colliding records are added to what?

16.3B Interactive Simulator

Try the hashing algorithm yourself. Enter keys (comma-separated), set the table and spare sizes, then click Start. Watch each key being hashed (Key MOD n), stored directly in the HashTable (green), or sent to the Spare overflow array (amber) on collision (red).

Interactive Simulator

Simulation

HashTable (main)

·
0
·
1
·
2
·
3
·
4
·
5
·
6
·
7
·
8
·
9

Spare (overflow)

·
0
·
1
·
2
·
3
·
4

Status: Click Start to begin.

Code

1HashValue = CalculateHash(TheRecord.Key)
2if HashTable[HashValue].Key == -1:
3 HashTable[HashValue] = TheRecord
4else: # collision!
5 for i in range(100):
6 if Spare[i].Key == -1:
7 Spare[i] = TheRecord
8 break
Speed:

16.4 Coding It — The Full Paper 4 Routine

This is the complete routine asked in 9618/42 May/June 2025. We build it piece by piece: the record class, two arrays, Initialise, CalculateHash, InsertIntoHash, CreateHashTable, and PrintSpare.

20.1.4a The record class

class NewRecord:
    def __init__(self, key, item1, item2):
        self.Key   = key
        self.Item1 = item1
        self.Item2 = item2

20.1.4b The two arrays

HashTable = []   # main table, 200 slots
Spare     = []   # overflow, 100 slots

20.1.4c Initialise()

def Initialise():
    global HashTable, Spare
    HashTable = [NewRecord(-1, -1, -1) for i in range(200)]
    Spare     = [NewRecord(-1, -1, -1) for i in range(100)]
Why -1?
  • An empty record has -1 in each field.
  • This is how we later know a slot is free — if HashTable[i].Key == -1, the slot is available.

20.1.4d CalculateHash()

def CalculateHash(Key):
    return Key % 200

20.1.4e InsertIntoHash()

def InsertIntoHash(TheRecord):
    global HashTable, Spare
    HashValue = CalculateHash(TheRecord.Key)

    if HashTable[HashValue].Key == -1:        # slot is free
        HashTable[HashValue] = TheRecord
    else:                                       # collision
        for i in range(100):
            if Spare[i].Key == -1:
                Spare[i] = TheRecord
                break                          # first free slot only
Key rule — break after storing in Spare:
  • Once you find the first free Spare slot and store the record, break.
  • Otherwise you would store the same record in every free slot.

20.1.4f CreateHashTable() — load from file

def CreateHashTable():
    try:
        File = open("HashData.txt")
        for Line in File:
            Data = Line.strip().split(",")
            R = NewRecord(int(Data[0]), int(Data[1]), int(Data[2]))
            InsertIntoHash(R)
        File.close()
    except:
        print("Cannot open file")

20.1.4g PrintSpare() and the main program

def PrintSpare():
    for i in range(100):
        if Spare[i].Key != -1:
            print(Spare[i].Key)

# main program
Initialise()
CreateHashTable()
PrintSpare()
Your Turn — Write a StudentRecord class
Write a StudentRecord class with fields StudentID, Marks, Age (all integers).
Hint:
  • Same shape as NewRecord — class with __init__
  • Three parameters, three self assignments
Your Turn — Write a BookRecord class
Write a BookRecord class with fields BookID, TitleID, PageCount.
Hint:
  • Same shape, three parameters
Your Turn — CalculateHash for 150 slots
Write CalculateHash() for a table of 150 slots.
Hint:
  • Only one number changes
  • Use Key % 150
Your Turn — CalculateHash for 500 slots
Write CalculateHash() for a table of 500 slots.
Hint:
  • Only one number changes
Your Turn — Initialise for 50+20 slots
Write an Initialise() that fills a 50-slot table and 20-slot spare with empty records.
Hint:
  • Same pattern, different sizes
  • HashTable = 50, Spare = 20
  • Fill with NewRecord(-1, -1, -1)
Your Turn — CountUsed — count non-empty slots
Write a CountUsed() that returns how many slots in HashTable (200 slots) are not empty.
Hint:
  • Loop through all 200 slots
  • Count where Key is not -1
  • Return the count

16.5 Full Exam-Style Question

The Aymaan Health clinic stores patient data in a 1D array of records, Clinic, with space for 150 records. The index for each record is calculated from the key field using Key MOD 150. If two keys give the same index there is a collision; the colliding record is stored in the next free space in a second array, Reserve, which has space for 75 records.

(a) — PatientRecord class [2 marks] Write program code to declare a record type PatientRecord with integer fields PatientID, Age and RoomNo.
Your Turn — (a) Model answer
Write PatientRecord class with PatientID, Age, RoomNo. [2 marks]
Hint:
  • Class with __init__ constructor
  • Three integer parameters
(b) — Declare arrays + ClearAll [3 marks] (i) Write code to declare the global arrays Clinic and Reserve. [1] (ii) Write ClearAll() that stores an empty record (-1 in each field) in every element of both arrays. [2]
Your Turn — (b) Model answer
Declare Clinic (150) and Reserve (75), then write ClearAll() that fills both with empty records. [3 marks]
Hint:
  • Declare as empty lists
  • ClearAll uses global
  • Fill with PatientRecord(-1,-1,-1)
(c) — GetSlot [2 marks] Write the function GetSlot() that takes a key as a parameter and returns Key MOD 150.
Your Turn — (c) Model answer
Write GetSlot(Key) that returns Key MOD 150. [2 marks]
Hint:
  • Function header with one parameter
  • Return Key % 150
(d) — AddPatient [6 marks] Write AddPatient(P) that takes a PatientRecord, uses GetSlot() on its key, stores it in Clinic if that slot is empty, otherwise stores it in the first free slot of Reserve.
Your Turn — (d) Model answer
Write AddPatient(P) that uses GetSlot, stores in Clinic if empty, else first free Reserve slot. [6 marks]
Hint:
  • Call GetSlot(P.PatientID) to get the slot
  • Check if Clinic[slot].PatientID == -1 (free)
  • If free: store there
  • If collision: loop Reserve, find first free, store, break
(e) — LoadPatients [3 marks] Write LoadPatients() that reads patients.txt (three comma-separated integers per line), builds a record from each row, and calls AddPatient(). Include exception handling.
Your Turn — (e) Model answer
Write LoadPatients() that reads patients.txt, builds records, calls AddPatient, with exception handling. [3 marks]
Hint:
  • Open file inside try
  • Loop, strip, split on comma
  • Create PatientRecord, call AddPatient
  • Close file, except: print error
Mark scheme (16 marks total):
PartMarking pointsMax
(a)
  • Class/record header with constructor
  • All three fields assigned
2
(b)(i)
  • Clinic (150) and Reserve (75) declared as global
1
(b)(ii)
  • Procedure that loops both arrays
  • Stores empty record (-1 in all fields)
2
(c)
  • Function taking one parameter
  • Returns parameter MOD 150
2
(d)
  • Procedure taking record parameter
  • Calls GetSlot with the key
  • Checks if Clinic at that index is empty
  • If empty stores there
  • Otherwise locates next free in Reserve
  • Stores at that index only (break)
6
(e)
  • Opens and reads patients.txt, splitting on commas
  • Creates record and calls AddPatient for each
  • Exception handling with catch and output
3

Key Points Summary

Hashing calculates where a record must be — no searching needed.
The hash function: address = Key MOD n (n = number of slots).
MOD gives the remainder, always 0 to n-1 — a valid array index.
A collision occurs when two different keys produce the same hash address.
Collisions are inevitable (pigeonhole principle) — they can only be handled.
Three collision methods: linear probing, overflow area (Spare), chaining.
Linear probing: move forward from the hashed slot until a free space is found.
Overflow area: store colliding records in a separate Spare array.
Chaining: each slot holds a linked list of colliding records.
The 9618/42 Paper 4 uses the overflow area method (Spare array).
Empty records have Key = -1 in each field — this marks a slot as free.
Record class groups related fields (Key + data items) into one object.
InsertIntoHash: calculate hash, check if free, store there or in first free Spare slot.
Always break after storing in Spare to avoid storing the same record twice.
CreateHashTable reads from a file, creates records, calls InsertIntoHash for each.
The full exam program: class + arrays + Initialise + GetSlot + AddPatient + LoadPatients.

16.6 Practice Tasks

Fifteen exam-style hashing tasks. Each shows only the question — click Hint for the thought process, or Help for the worked solution.

Exam tip:
  • For every hashing task, the marker checks: (1) correct hash function (Key MOD n), (2) correct collision detection, (3) correct overflow handling (first free Spare slot + break), (4) correct record class with all fields.
  • The most common mark loss is forgetting to break after storing in Spare.
1Practice Task — Calculate a hash value [1 marks]
Calculate the hash value of key 3024 for a table with 200 slots (Key MOD 200).
2Practice Task — Calculate hash for 5012 [1 marks]
Calculate the hash value of key 5012 for a table with 200 slots.
3Practice Task — Check for collision [2 marks]
State whether keys 4821 and 6821 collide in a table of 2000 slots.
4Practice Task — Overflow with 3 keys [3 marks]
A table has 100 slots (MOD 100) and a spare overflow array. Records arrive with keys 305, 412, 205. Show where each is stored.
5Practice Task — Overflow with 4 keys [4 marks]
Same table (MOD 100, with overflow). Records: 740, 218, 540, 318. Show where each is stored.
6Practice Task — Define linear probing [1 marks]
Define linear probing in one sentence.
7Practice Task — Define chaining [1 marks]
Define chaining in one sentence.
8Practice Task — Record class [2 marks]
Write a PatientRecord class with integer fields PatientID, Age, RoomNo.
9Practice Task — Initialise [2 marks]
Write ClearAll() that fills Clinic (150 slots) and Reserve (75 slots) with empty PatientRecord(-1,-1,-1).
10Practice Task — GetSlot [2 marks]
Write GetSlot(Key) that returns Key MOD 150.
11Practice Task — AddPatient [6 marks]
Write AddPatient(P) that uses GetSlot on P.PatientID, stores in Clinic if empty, else first free Reserve slot.
12Practice Task — LoadPatients [3 marks]
Write LoadPatients() that reads patients.txt (3 comma-separated ints per line), builds records, calls AddPatient, with exception handling.
13Practice Task — CountUsed [3 marks]
Write CountUsed() that returns how many slots in HashTable (200 slots) are not empty.
14Practice Task — Why use MOD? [2 marks]
Explain why Key MOD n is used as the hash function, rather than Key DIV n or Key + n.
15Practice Task — Why break in Spare? [2 marks]
Explain why you must break after storing a record in the first free Spare slot.

Question Bank

Answer all questions, then press Submit Quiz to see your score.

0/12 answered

Question 1Multiple Choice

What is the standard hash function in 9618?

Question 2True / False

3024 MOD 200 = 24.

Question 3Multiple Choice

What is a collision?

Question 4Multiple Choice

Which is NOT a collision handling method?

Question 5True / False

Empty slots are represented by Key = -1.

Question 6Multiple Choice

What does InsertIntoHash do on a collision?

Question 7Multiple Choice

What does Initialise() do?

Question 8True / False

4821 and 6821 collide in a table of 2000 slots.

Question 9Multiple Choice

What does CalculateHash(Key) return for a 200-slot table?

Question 10Multiple Choice

Linear probing means:

Question 11True / False

Chaining stores a linked list at each slot for colliding records.

Question 12Multiple Choice

What does CreateHashTable do?

Answer all 12 questions to enable submission.