mediumDynamic Programming 1DAI-applied~26 min
Tokenizer Vocab Segment
A subword tokenizer must decide whether an input string can be split end-to-end into pieces that all belong to its vocabulary, with no leftover characters. The classic algorithm underneath is word break.
Problem
Given a string s and a vocabulary of strings, return true if s can be segmented into a sequence of one or more vocabulary words (each word may be reused).
Input
A string s of length n (1 ≤ n ≤ 300) and a vocab list of up to 1000 words, each up to length 20.
Output
A boolean — true if s is fully segmentable.
Constraints
- 1 ≤ n ≤ 300
- Words may be reused any number of times
- Lowercase ASCII letters
Examples
Example 1
Input
s = "applepenapple", vocab = ["apple", "pen"]
Output
true
"apple" + "pen" + "apple" covers the whole string.
Example 2
Input
s = "catsandog", vocab = ["cats", "dog", "sand", "and", "cat"]
Output
false
No segmentation consumes the trailing "og".