Encode in the style of utf-8 (see wikipedia article on utf-8)
Encode in the style of utf-8 (see wikipedia article on utf-8)
Variation is that we accept some things that a conventional utf-8 encoder rejects. Examples are illegal codepoints such as isolated Unicode surrogates (not making up a surrogate pair).
We also assume we're being handed surrogate pairs for any of the 4-byte character representations.