269. Alien Dictionary | linlaw Techblog

Problem description:

There is a new alien language which uses the latin alphabet. However, the order among letters are unknown to you. You receive a list of non-empty words from the dictionary, where words are sorted lexicographically by the rules of this new language. Derive the order of letters in this language.

Example 1:

Input:
[
  "wrt",
  "wrf",
  "er",
  "ett",
  "rftt"
]

Output: "wertf"

Example 2:

Input:
[
  "z",
  "x"
]

Output: "zx"

Example 3:

Input:
[
  "z",
  "x",
  "z"
] 

Output: "" 

Explanation: The order is invalid, so return "".

Note:

You may assume all letters are in lowercase.
You may assume that if a is a prefix of b, then a must appear before b in the given dictionary.
If the order is invalid, return an empty string.
There may be multiple valid order of letters, return any one of them is fine.

Solution:

find characters in words
build graph and indegree, graph contains all neighbors of a characters c, indegree is how many prerequisite need to complete before this character c
find character if it’s indegree is 0
topological sort, remove indegree count for neighbor when we complete c
if the result length is not equal to indegree size, then return ""

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        graph = defaultdict(list)
        indegree = defaultdict(int)
        
        for word in words:
            for c in word:
                graph[c] = []
                indegree[c] = 0
        
        for i in range(1, len(words)):
            word1 = words[i-1]
            word2 = words[i]
            
            index, l = 0, min(len(word1), len(word2))
            while index < l and word1[index] == word2[index]:
                index += 1
            
            # edge case for ["abc", "ab"]
            if index == l and len(word1) > len(word2):
                return ''
            
            if index < l:
                graph[word1[index]].append(word2[index])
                indegree[word2[index]] += 1
        
        queue = deque()
        for key, val in indegree.items():
            if val == 0:
                queue.append(key)
        
        result = []
        
        while queue:
            cur = queue.popleft()
            result.append(cur)
            
            for nei in graph[cur]:
                indegree[nei] -= 1
                if indegree[nei] == 0:
                    queue.append(nei)
        return ''.join(result) if len(result) == len(indegree) else ''

First, build a degree map for each character in all the words:

w:0
r:0
t:0
f:0
e:0

Then build the hashmap by comparing the adjacent words, the first character that is different between two adjacent words reflect the lexicographical order. For example:

"wrt",
"wrf",
   first different character is 3rd letter, so t comes before f

"wrf",
"er",
   first different character is 1rd letter, so w comes before e

The characters in set come after the key. x->y means letter x comes before letter y. x -> set: y,z,t,w means x comes before all the letters in the set. The final HashMap “map” looks like.

t -> set: f    
w -> set: e
r -> set: t
e -> set: r

and final HashMap “degree” looks like, the number means “how many letters come before the key”:

w:0
r:1
t:1
f:1
e:1

class Solution {
public:
    string alienOrder(vector<string>& words) {
        if(words.size() == 0) return "";
        unordered_map<char, int> indegree;
        unordered_map<char, multiset<char>> hash;
        for(auto word: words)
            for(auto ch: word)
                indegree[ch]= 0;
        
        for(int i= 1; i< words.size(); i++){
            int k= 0, len1= words[i-1].length(), len2= words[i].length();
            //example: wrt, wrf
            while(words[i-1][k] == words[i][k]) //skip same characters
                k++; //ex: wrt, wrf. skip 'w', 'r'
            
            if(k>= min(len1, len2)) continue; //exceed one of the length
            indegree[words[i][k]]++; //ex: 'f' comes after 't', so indegree[f]++.
            hash[words[i-1][k]].insert(words[i][k]); //since wrf is after wrt. hash[t]= [f, ...]
        }
        
        string ans;
        for(int i= 0; i< indegree.size(); i++){
            char ch= ' ';
            for(auto val: indegree){ //ex: indegree[t]= 0, indegree[f]= 1
                if(!val.second){ //find node that indegree == 0
                    ch= val.first;
                    break;
                }
            }
            if(ch == ' ') return ""; //no one is indegree 0, that means have a cycle and is invalid
            ans+= ch; //append character with indegree= 0 to result
            //cout<<"ch: "<<ch<<" ,"<<indegree[ch];
            indegree[ch]--; 
            //cout<<", "<<indegree[ch]<<endl;
            for(auto val: hash[ch]) 
                indegree[val]--; //every string comes after 
        }
        return ans;
    }
};

time complexity: $O(n+V))$, n: words averge length, V: alpha number of characters in given alphabet
space complexity: $O(V)$

The first step to create a graph takes O(n + alhpa) time where n is number of given words and alpha is number of characters in given alphabet. The second step is also topological sorting. Note that there would be alpha vertices and at-most (n-1) edges in the graph. The time complexity of topological sorting is O(V+E) which is O(n + aplha) here. So overall time complexity is O(n + aplha) + O(n + aplha) which is O(n + aplha).

Second time:

Question to ask:

word in same length?
contain duplicate?
only character?

class Solution {
public:
    string alienOrder(vector<string>& words) {
        /*
        1. construct graph
            indegree table
            edge table
        2. use graph to output dictionary
        */
        if(words.empty()) return "";
        
        
        unordered_map<char, int> indegree;
        unordered_map<char, multiset<char>> edge;
        
        //1. construct graph
        //1.1 init every character as 0
        for(auto word: words){
            for(auto c: word)
                indegree[c]= 0;
        }
        
        for(int i= 1; i< words.size(); i++){
            int l1= words[i-1].length(), l2= words[i].length(), k= 0;
            while(words[i-1][k] == words[i][k])
                k++;
            
            //stop at the first position with different characters
            if(k>= min(l1, l2)) continue; //if all the same but one of them reaches end
            indegree[words[i][k]]++;
            edge[words[i-1][k]].insert(words[i][k]);
        }
        

        for(auto x: edge['a'])
            cout<<x<<" ";
        cout<<endl;
        for(auto x: edge['b'])
            cout<<x<<" ";
        cout<<endl;
        for(auto x: edge['z'])
            cout<<x<<" ";
        cout<<endl;
        for(auto x: edge['c'])
            cout<<x<<" ";
        cout<<endl;
        
        for(auto x: indegree){
            cout<<x.first<<" "<<x.second<<endl;
        }
        
        string res;
        for(int i= 0; i< indegree.size(); i++){
            char tmp= ' ';
            for(auto c: indegree){
                if(c.second == 0){ //no indegree, process it first
                    tmp= c.first;
                    cout<<c.first<<endl;
                    break;
                }
            }
            
            if(tmp == ' ') return ""; //can not find a valid character
            res+= tmp;
            indegree[tmp]--;
            for(auto c: edge[tmp]){
                indegree[c]--;
            }
        }
        return res;
    }
};

reference:
https://goo.gl/fKXTmg
https://goo.gl/DDsvde
https://goo.gl/Lb26cZ