mediumHeap / Priority QueueAI-applied~24 min
Top K Frequent Tokens
Before training a tokenizer, you profile a corpus and need the k most frequent token ids to seed the vocabulary. The tokens are just integers — the task is the classic top-k-by-frequency selection.
Problem
Given an array tokens and an integer k, return the k most frequent token values. The answer is guaranteed unique; any order among the k is accepted.
Input
An array tokens of length n (1 ≤ n ≤ 10^5), values in [-10^9, 10^9], and k (1 ≤ k ≤ number of distinct tokens).
Output
An array of the k most frequent token values.
Constraints
- 1 ≤ k ≤ number of distinct values
- The top-k set is unique (no frequency ties at the boundary)
- Aim better than a full sort of all distinct counts
Examples
Example 1
Input
tokens = [1, 1, 1, 2, 2, 3], k = 2
Output
[1, 2]
1 appears 3 times, 2 appears twice — the two most frequent.
Example 2
Input
tokens = [7], k = 1
Output
[7]
Only one distinct token.