UTF-16: Perbedaan antara revisi
Konten dihapus Konten ditambahkan
k Bot: Perubahan kosmetika |
→Code points U+10000 to U+10FFFF: Perbaikan kesalahan ketik, Perbaikan tata bahasa, Penambahan pranala, LÊ TIẾN TIẾP Tag: Suntingan perangkat seluler Suntingan aplikasi seluler |
||
Baris 13:
The first plane (code points U+0000 to U+FFFF) contains the most frequently used characters and is called the [[Basic Multilingual Plane]] or ''BMP''. Both UTF-16 and UCS-2 encode code points in this range as single 16-bit code units that are numerically equal to the corresponding code points. The code points in the BMP are the ''only'' code points that can be represented in UCS-2.
-->
<!--
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called a ''surrogate pair'', by the following scheme:
Baris 44:
Since the ranges for the lead surrogates, trail surrogates, and valid BMP characters are disjoint, searches are simplified: it is not possible for part of one character to match a different part of another character. It also means that UTF-16 is ''self-synchronizing'': the start of the next character following a given code unit can be found by examining only that one code unit. [[UTF-8]] shares these advantages, but many earlier encoding schemes did not allow unambiguous searching and could only be synchronized by re-parsing from the start of the string.
Because the most commonly used characters are all in the Basic Multilingual Plane, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software (e.g. CVE-2008-2938, CVE-2012-2135).<ref>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-2938 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-2135</ref>FR770000
-->
=== Code points U+D800 to U+DFFF ===
<!--
|