hardDynamic Programming 1DAI-applied~45 min
Compound Token Detector
A tokenizer's vocabulary contains both atomic tokens and compounds. To prune redundancy, find every vocabulary entry that is itself a concatenation of two or more other entries in the vocabulary.
Problem
Given a list of distinct words, return all words that are a concatenation of at least two shorter words from the same list. A qualifying word must be formed entirely from other list words (using at least two pieces).
Input
An array words of distinct strings (1 ≤ count ≤ 10^4, total length ≤ 10^5).
Output
A list of the concatenated words (any order).
Constraints
- All words are distinct
- A word must split into ≥ 2 list words to qualify
- Empty string is not part of the input
Examples
Example 1
Input
words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
Output
["catsdogcats","dogcatsdog","ratcatdogcat"]
Each listed word splits fully into ≥ 2 other words.
Example 2
Input
words = ["a","b","ab","abc"]
Output
["ab"]
'ab' = 'a'+'b'; 'abc' cannot be fully split into list words.