hardBit ManipulationPure DSA~35 min
Validate a UTF-8 Byte Stream
Given a sequence of integers representing bytes, decide whether they form a valid UTF-8 encoding according to the leading-bit rules.
Problem
Given an array data of integers where only the least-significant 8 bits of each integer are meaningful (one byte each), return true if it is a valid UTF-8 encoding. A character is 1-4 bytes: a 1-byte char starts with 0; an n-byte char (2<=n<=4) starts with n leading 1s then a 0, and each continuation byte starts with 10.
Input
An array data of byte values (only the low 8 bits matter).
Output
A boolean: whether data is valid UTF-8.
Constraints
- 1 <= data.length <= 2*10^4
- 0 <= data[i] <= 255 in the low 8 bits.
- Characters span 1 to 4 bytes.
Examples
Example 1
Input
data = [197,130,1]
Output
true
11000101 10000010 forms a 2-byte char; 00000001 is a 1-byte char.
Example 2
Input
data = [235,140,4]
Output
false
A 3-byte lead expects two continuation bytes, but the third is invalid.