Problem description:

Design a HashSet without using any built-in hash table libraries.

To be specific, your design should include these functions:

add(value): Insert a value into the HashSet.
contains(value) : Return whether the value exists in the HashSet or not.
remove(value): Remove a value in the HashSet. If the value does not exist in the HashSet, do nothing.

Example:

1
2
3
4
5
6
7
8
9
MyHashSet hashSet = new MyHashSet();
hashSet.add(1);
hashSet.add(2);
hashSet.contains(1); // returns true
hashSet.contains(3); // returns false (not found)
hashSet.add(2);
hashSet.contains(2); // returns true
hashSet.remove(2);
hashSet.contains(2); // returns false (already removed)

Note:

All values will be in the range of [0, 1000000].
The number of operations will be in the range of [1, 10000].
Please do not use the built-in HashSet library.

Solution:

I divide this problem into two part:

  • hash function
  • handle conflict

I use an intuitive way to implement hash function, find a prime number to be the seed and mod the value to get the key where to store it.
For example:

1
2
3
4
hash seed = 13, this means we have 13 buckets.
val = 40
40 % 13 = 1
so we'll store 40 in bucket[1]

Then I use linked list to store the nodes in the same buckets.

Improvement:
We should use a bigger prime number, something like this. By using a large prime number, the conflict would be lower down. This would make time complexity close to $O(logn)$ by reducing the probability to search node in linked list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class Node:

def __init__(self, val, next):
self.val = val
self.next = next

class MyHashSet:

def __init__(self):
"""
Initialize your data structure here.
"""
self.size = 13 # prime number
self.h = [Node(None, None) for _ in range(self.size)]

def add(self, key: int) -> None:
p = self.h[key % self.size]
node = p.next
while node:
if node.val == key:
break
p = node
node = node.next
else:
p.next = Node(key, None)

def remove(self, key: int) -> None:
p = self.h[key % self.size]
node = p.next
while node:
if node.val == key:
p.next = node.next
break
p = node
node = node.next

def contains(self, key: int) -> bool:
"""
Returns true if this set contains the specified element
"""
node = self.h[key % self.size]
while node:
if node.val == key:
return True
node = node.next
return False

time complexity: $O(n)$, if use a large prime number, could close to $O(logn)$
space complexity: $O(n)$
reference:
related problem: