692. Top K Frequent Words | linlaw Techblog

Problem description:

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:

Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.
    Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
    with the number of occurrence being 4, 3, 2 and 1 respectively.

Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Input words contain only lowercase letters.
Follow up:
Try to solve it in O(n log k) time and O(n) extra space.

Solution:

We can use map and bucket sort to solve. The reason to choose map instead of unordered_map is because of the alphabetical order.

Count the appearance of each word
Create buckets. Only need to create words.size() buckets.

class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        // use map to count the appearance of word
        // then use bucket sort to get the top k frequent elements
        map<string, int> map;
        for(auto word: words)
            map[word]++;
        
        vector<vector<string>> buckets(words.size());
        for(auto it: map)
            buckets[it.second].push_back(it.first);
        
        vector<string> res;
        for(int i= buckets.size()-1; i>= 0 && k> 0; i--){
            if(buckets[i].size() == 0) continue;
            //have element in the bucket, find out how many we need to get
            //because we need to follow alphabetical order
            int n= min(k, (int)buckets[i].size());
            res.insert(res.end(), buckets[i].begin(), buckets[i].begin()+n);
            k-=n;
        }
        return res;
    }
};

Solution python heap

We could do without the count since by default the lower count will be popped first in a heapq implementation - the idea is to create a heap of length k containing the tuple (count,word). When length exceeds k, the tuple with lower count is popped, thereby always having the top k count words.

However, if count is same in 2 tuples, the alphabetically lower word is popped out since that is the default ordering in a heapq (minheap) implementation, i.e for (count, word1) and (count, word2), if word1 < word2, (count, word1) is popped. We create a custom class to reverse this ordering such that if word1 < word2, (count, word2) is popped. Our heap will then have the top k frequent words, with alphabetically lower words being treated as "greater" when the count is same in two tuples.

class ReverseWordOrder:
    def __init__(self, word):
        self.word = word
    
    def __lt__(self, other):
        # guaranteed no equality since we are comparing keys in counter
        # flipping the lesser than comparator for our purpose to keep greater word in heap
        return self.word > other.word

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        counter = collections.Counter(words)    # O(n) space and time
        heap = []    # max length will be k, so O(k) extra space
        for word,freq in counter.items():    # O(n) time to iterate counter
            heapq.heappush(heap, (freq,ReverseWordOrder(word)))    # O(logk) for each heappush operation
            if len(heap) > k:    # by not letting the heap exceed more than k length
                heapq.heappop(heap)    # O(logk) for each heappop operation
        output = []    # O(k) space, same length as heap
        while heap:    # O(k) since len(heap) = k
            _freq, reverseWordObj = heapq.heappop(heap)    # O(logk)
            output.append(reverseWordObj.word)
        return output[::-1]    # O(k)

Solution python Trie:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.freq = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()
        
    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.freq += 1
    
    def counter(self):
        def find_words(node, s):
            if node.freq > 0:
                lst.append((-node.freq, s))
            for char in node.children:
                find_words(node.children[char], s + char)
                
        lst = []
        find_words(self.root, '')
        return lst
        
class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        trie = Trie()
        for word in words:
            trie.insert(word)
        heap = trie.counter()
        heapq.heapify(heap)
        res = []
        while heap and k > 0:
            res.append(heapq.heappop(heap)[1])
            k -= 1
        return res

time complexity: $O(nlogn)$
space complexity: $O(n)$
reference:
https://goo.gl/Eznk8c
https://goo.gl/oMLd8T