Problem description:

There is a new alien language which uses the latin alphabet. However, the order among letters are unknown to you. You receive a list of non-empty words from the dictionary, where words are sorted lexicographically by the rules of this new language. Derive the order of letters in this language.

Example 1:

1
2
3
4
5
6
7
8
9
10
Input:
[
"wrt",
"wrf",
"er",
"ett",
"rftt"
]

Output: "wertf"

Example 2:

1
2
3
4
5
6
7
Input:
[
"z",
"x"
]

Output: "zx"

Example 3:

1
2
3
4
5
6
7
8
9
10
Input:
[
"z",
"x",
"z"
]

Output: ""

Explanation: The order is invalid, so return "".

Note:

You may assume all letters are in lowercase.
You may assume that if a is a prefix of b, then a must appear before b in the given dictionary.
If the order is invalid, return an empty string.
There may be multiple valid order of letters, return any one of them is fine.

Solution:

  1. find characters in words
  2. build graph and indegree, graph contains all neighbors of a characters c, indegree is how many prerequisite need to complete before this character c
  3. find character if it’s indegree is 0
  4. topological sort, remove indegree count for neighbor when we complete c
  5. if the result length is not equal to indegree size, then return ""
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Solution:
def alienOrder(self, words: List[str]) -> str:
graph = defaultdict(list)
indegree = defaultdict(int)

for word in words:
for c in word:
graph[c] = []
indegree[c] = 0

for i in range(1, len(words)):
word1 = words[i-1]
word2 = words[i]

index, l = 0, min(len(word1), len(word2))
while index < l and word1[index] == word2[index]:
index += 1

# edge case for ["abc", "ab"]
if index == l and len(word1) > len(word2):
return ''

if index < l:
graph[word1[index]].append(word2[index])
indegree[word2[index]] += 1

queue = deque()
for key, val in indegree.items():
if val == 0:
queue.append(key)

result = []

while queue:
cur = queue.popleft()
result.append(cur)

for nei in graph[cur]:
indegree[nei] -= 1
if indegree[nei] == 0:
queue.append(nei)
return ''.join(result) if len(result) == len(indegree) else ''

First, build a degree map for each character in all the words:

1
2
3
4
5
w:0
r:0
t:0
f:0
e:0

Then build the hashmap by comparing the adjacent words, the first character that is different between two adjacent words reflect the lexicographical order. For example:

1
2
3
4
5
6
7
"wrt",
"wrf",
first different character is 3rd letter, so t comes before f

"wrf",
"er",
first different character is 1rd letter, so w comes before e

The characters in set come after the key. x->y means letter x comes before letter y. x -> set: y,z,t,w means x comes before all the letters in the set. The final HashMap “map” looks like.

1
2
3
4
t -> set: f    
w -> set: e
r -> set: t
e -> set: r

and final HashMap “degree” looks like, the number means “how many letters come before the key”:

1
2
3
4
5
w:0
r:1
t:1
f:1
e:1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Solution {
public:
string alienOrder(vector<string>& words) {
if(words.size() == 0) return "";
unordered_map<char, int> indegree;
unordered_map<char, multiset<char>> hash;
for(auto word: words)
for(auto ch: word)
indegree[ch]= 0;

for(int i= 1; i< words.size(); i++){
int k= 0, len1= words[i-1].length(), len2= words[i].length();
//example: wrt, wrf
while(words[i-1][k] == words[i][k]) //skip same characters
k++; //ex: wrt, wrf. skip 'w', 'r'

if(k>= min(len1, len2)) continue; //exceed one of the length
indegree[words[i][k]]++; //ex: 'f' comes after 't', so indegree[f]++.
hash[words[i-1][k]].insert(words[i][k]); //since wrf is after wrt. hash[t]= [f, ...]
}

string ans;
for(int i= 0; i< indegree.size(); i++){
char ch= ' ';
for(auto val: indegree){ //ex: indegree[t]= 0, indegree[f]= 1
if(!val.second){ //find node that indegree == 0
ch= val.first;
break;
}
}
if(ch == ' ') return ""; //no one is indegree 0, that means have a cycle and is invalid
ans+= ch; //append character with indegree= 0 to result
//cout<<"ch: "<<ch<<" ,"<<indegree[ch];
indegree[ch]--;
//cout<<", "<<indegree[ch]<<endl;
for(auto val: hash[ch])
indegree[val]--; //every string comes after
}
return ans;
}
};

time complexity: $O(n+V))$, n: words averge length, V: alpha number of characters in given alphabet
space complexity: $O(V)$

The first step to create a graph takes O(n + alhpa) time where n is number of given words and alpha is number of characters in given alphabet. The second step is also topological sorting. Note that there would be alpha vertices and at-most (n-1) edges in the graph. The time complexity of topological sorting is O(V+E) which is O(n + aplha) here. So overall time complexity is O(n + aplha) + O(n + aplha) which is O(n + aplha).

Second time:

Question to ask:

  • word in same length?
  • contain duplicate?
  • only character?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class Solution {
public:
string alienOrder(vector<string>& words) {
/*
1. construct graph
indegree table
edge table
2. use graph to output dictionary
*/
if(words.empty()) return "";


unordered_map<char, int> indegree;
unordered_map<char, multiset<char>> edge;

//1. construct graph
//1.1 init every character as 0
for(auto word: words){
for(auto c: word)
indegree[c]= 0;
}

for(int i= 1; i< words.size(); i++){
int l1= words[i-1].length(), l2= words[i].length(), k= 0;
while(words[i-1][k] == words[i][k])
k++;

//stop at the first position with different characters
if(k>= min(l1, l2)) continue; //if all the same but one of them reaches end
indegree[words[i][k]]++;
edge[words[i-1][k]].insert(words[i][k]);
}


for(auto x: edge['a'])
cout<<x<<" ";
cout<<endl;
for(auto x: edge['b'])
cout<<x<<" ";
cout<<endl;
for(auto x: edge['z'])
cout<<x<<" ";
cout<<endl;
for(auto x: edge['c'])
cout<<x<<" ";
cout<<endl;

for(auto x: indegree){
cout<<x.first<<" "<<x.second<<endl;
}

string res;
for(int i= 0; i< indegree.size(); i++){
char tmp= ' ';
for(auto c: indegree){
if(c.second == 0){ //no indegree, process it first
tmp= c.first;
cout<<c.first<<endl;
break;
}
}

if(tmp == ' ') return ""; //can not find a valid character
res+= tmp;
indegree[tmp]--;
for(auto c: edge[tmp]){
indegree[c]--;
}
}
return res;
}
};

reference:
https://goo.gl/fKXTmg
https://goo.gl/DDsvde
https://goo.gl/Lb26cZ