Problem description:

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,
[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2

Follow up:

If all integer numbers from the stream are between 0 and 100, how would you optimize it?
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

Solution:

Questions to ask:

  • input always integers?
  • what kind of output for median, double or round to integer?
  • space complexity constraint?

Use heap to solve this question. Let’s say we have input like this: 1 7 3 4 5 8 2 6. If use maxheap to store it, it’ll become 1 2 3 4 5 6 7 8 where 8 is the top. The trick of this problem is to use two maxheap to store the left half and right half. In the left heap, simply push number in it; however, in right heap, push the negative number in it, so the smallest element will at top.

For example:
input: 1 7 3 4 5 8 2 6

1 2 3 4
smaller: 4 is the top
-8 -7 -6 -5
larger: -5 is the top

Another thing to mention is that, always keep left.size() > right.size(). This is to track if the total number of elements is odd or even.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class MedianFinder:
def __init__(self):
self.small = [] # the smaller half of the list, max heap (invert min-heap)
self.large = [] # the larger half of the list, min heap

def addNum(self, num):
if len(self.small) == len(self.large):
heappush(self.large, -heappushpop(self.small, -num))
else:
heappush(self.small, -heappushpop(self.large, num))

def findMedian(self):
if len(self.small) == len(self.large):
return float(self.large[0] - self.small[0]) / 2.0
else:
return float(self.large[0])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class MedianFinder {
public:
/** initialize your data structure here. */
priority_queue<int> left, right;
MedianFinder() {

}

void addNum(int num) {
left.push(num); //O(logn)
right.push(-left.top());
left.pop();

if(left.size() < right.size()){ //make left size always bigger than right, for finding median
left.push(-right.top());
right.pop();
}
}

double findMedian() {
return left.size()> right.size()? left.top(): 0.5*(left.top()-right.top()); //in right, numbers are negative
}
};

/**
* Your MedianFinder object will be instantiated and called as such:
* MedianFinder obj = new MedianFinder();
* obj.addNum(num);
* double param_2 = obj.findMedian();
*/

Use binary search to find insertion index.

If len(nums) is odd, return the nums[mid] value. If it’s even, return (nums[mid]+nums[mid-1])/2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class MedianFinder:
def __init__(self):
self.nums = []

def addNum(self, num: int) -> None:
index = bisect_left(self.nums, num)
self.nums.insert(index, num)

def findMedian(self) -> float:
if len(self.nums) % 2:
return self.nums[len(self.nums)//2]
else:
mid = len(self.nums) // 2
return (self.nums[mid] + self.nums[mid-1]) / 2

# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()

time complexity: $O(logn)$ for insert into heap, $O(1)$ for findMedian
space complexity: $O(n)$
reference:
http://www.cnblogs.com/grandyang/p/4896673.html
https://goo.gl/if9kLS