Homoglif: Perbedaan antara revisi

Konten dihapus Konten ditambahkan
Bulandari27 (bicara | kontrib)
Tidak ada ringkasan suntingan
Tag: halaman dengan galat skrip Suntingan perangkat seluler Suntingan peramban seluler Suntingan seluler lanjutan
Raja Nine to Five (bicara | kontrib)
Tidak ada ringkasan suntingan
Tag: Suntingan perangkat seluler Suntingan peramban seluler Suntingan seluler lanjutan
 
(4 revisi perantara oleh satu pengguna lainnya tidak ditampilkan)
Baris 1:
{{Short description|Glif yang berbeda yang mirip secara visual}}
{{Multiplemultiple issues|
{{original research|date=July 2016}}
{{moreoriginal citations neededresearch|date=JulyJuli 2016}}
{{more citation needed|date=Juli 2016}}
{{Terjemah|Inggris}}
}}
[[Berkas:Homoglyph a.svg|thumb|Huruf kecil [[Alfabet Latin|Latin]] A kecil (Unicodeunicode 0061) dan Hurufhuruf kecil [[Alfabet Kiril|Sirilik]] A kecil (Unicodeunicode 0430). Kedua karakter diatur dalam [[Helvetica LT Std.]]]]
[[Berkas:Homoglyphs in Serif and Sans.png|thumb|Pada beberapa fon, ada beberapa karakter yang hampir tidak dapat dibedakan atau yang menjadi homoglif.]]
Dalam [[ortografi]] dan [[tipografi]], sebuah '''homoglif''' adalah salah satu dari dua atau lebih [[grafimgrafem]], [[Karakterkarakter (Komputasikomputasi)|karakter]], atau [[glif]] dengan bentuk yang tampak identik atau sangat mirip dalam [[fon (tipografi)|fon]], tetapi mungkin memiliki arti yang berbeda. DesainPenunjukan juga diterapkan pada urutan karakter yang berbagi properti ini.
 
InPada tahun 2008, the [[Unicode Consortium]] published itsmenerbitkan Technical Report #36<ref name=":0">{{cite web|url=https://www.unicode.org/reports/tr36/|title=UTR #36: Unicode Security Considerations|website=www.unicode.org}}</ref> onpada aberbagai rangemasalah ofyang issuesberasal derivingdari from thekemiripan visual similaritykarakter ofbaik charactersdalam bothskrip intunggal, single scripts,maupun andkemiripan similaritiesantar betweenkarakter charactersdalam inskrip differentyang scriptsberbeda.
'''Sinoglif''' are glyphs that look different but mean the same thing. Synoglyphs are also known informally as ''display variants''. The term [[homograph]] is sometimes used [[synonym]]ously with homoglyph, but in the usual linguistic sense, homographs are [[word]]s that are spelled the same but have different meanings, a property of words, not characters.
 
Sebuah contoh penting dari kebingungan homoglif muncul dari penggunaan {{angbr|y}} untuk mewakili {{angbr|þ}} ([[Thorn huruf)|thorn]]) ketika mengatur teks bahasa Inggris yang lebih tua dalam tipografi yang tidak mengandung karakter terakhir. Ini telah menyebabkan fenomena di zaman modern seperti ''[[Ye olde shoppe]]'', menyiratkan secara tidak benar bahwa kata ''the'' sebelumnya ditulis ''ye'' {{IPAc-en|j|iː}} daripada ''þe''.
In 2008, the [[Unicode Consortium]] published its Technical Report #36<ref name=":0">{{cite web|url=https://www.unicode.org/reports/tr36/|title=UTR #36: Unicode Security Considerations|website=www.unicode.org}}</ref> on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.
 
Contoh simbol homoglif adalah (a) [[diaresis]] dan umlaut (keduanya sepasang titik, tetapi dengan arti yang berbeda, meskipun [[pengkodean karakter|dikodekan]] dengan [[kode titik]] dan (b) [[tanda hubung]] dan [[tanda minus]] (keduanya adalah garis horizontal pendek, tetapi dengan arti yang berbeda, meskipun sering disandikan dengan [[tanda hubung-minus|titik kode yang sama]]). Di antara [[angka]] dan [[huruf]], angka [[1]] dan huruf kecil [[l]] selalu dikodekan secara terpisah tetapi dalam banyak jenis huruf yang diberikan bentuk yang sangat serupa, dan digit [[0]] dan huruf kapital [[O]] selalu dikodekan secara terpisah tetapi dalam banyak tipografi diberikan glif yang sangat mirip. Hampir setiap contoh pasangan karakter homoglif berpotensi dapat dibedakan secara grafis dengan glif yang dapat dibedakan dengan jelas dan titik kode terpisah, tetapi hal ini tidak selalu dilakukan. [[Typeface]] yang tidak secara tegas membedakan homoglif satu/el dan nol/oh dianggap tidak sesuai untuk menulis [[formula]], [[URL]], [[kode sumber]], ID, dan teks lainnya di mana karakter tidak selalu dapat dibedakan tanpa konteks. Font yang membedakan glif dengan [[nol terpotong]], misalnya, lebih disukai untuk penggunaan tersebut.
An example of homoglyphic confusion in a historical regard results from the use of a 'y' to represent a 'þ' when setting older English texts in typefaces that do not contain the latter character. It has led in modern times to such phenomena as ''[[Ye olde]] shoppe'', implying incorrectly that the word ''the'' was formerly written ''ye'' {{IPAc-en|j|iː}}. For further discussion, see [[Thorn (letter)|thorn]].
 
Istilah [[homograf]] kadang-kadang digunakan [[sinonim]] dengan homoglif, tetapi dalam pengertian linguistik biasa, homograf adalah [[kata]] yang dieja sama tetapi memiliki arti berbeda, properti kata, bukan karakter.
Contoh dari simbol homoglif (a) adalah [[diaresis]] (both a pair of dots, but with different meaning, although [[character encoding|encoded]] with the same [[code point]]s); and (b) the [[hyphen]] and [[minus sign]] (both a short horizontal stroke, but with different meaning, although often encoded with [[hyphen-minus|the same code point]]). Among [[numerical digit|digits]] and [[letter (alphabet)|letters]], digit [[1]] and lowercase [[l]] are always encoded separately but in many [[font]]s are given very similar glyphs, and digit [[0]] and capital [[O]] are always encoded separately but in many [[font]]s are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. [[Typeface]]s that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing [[formula]]s, [[URL]]s, [[source code]], IDs and other text where characters cannot always be differentiated without [[context (language use)|context]]. Fonts which distinguish glyphs by means of a [[slashed zero]], for example, are preferred for those uses.
 
('''Sinoglif''' adalah glif yang terlihat berbeda tetapi memiliki arti yang sama. Sinoglif juga dikenal secara informal sebagai varian tampilan.)
==Umlaut and diaresis==
In the days of mechanical typewriters these were typed with the same key, which was also used for a double inverted comma. However the umlaut originated specifically as a pair of short vertical lines (not two dots) (see [[Sutterlin]]). Incidentally the two dots above the letter E in Albanian are described as a diaresis but do not fulfil the function of a diaresis.
<ref>Describing these as homoglyphs is questionable as there are probably no languages in which the glyph can fulfil both these roles. It would be just as valid to describe, say, a grave accent as a homoglyph because it fulfils different roles in different languages.</ref>
 
== 0 and O; 1, l and I ==
Two common and important sets of homoglyphs in use today are the digit zero and the capital letter O (i.e. 0 and O); and the digit one, the lowercase letter L and the uppercase i (i.e. 1, l and I). In the early days of mechanical typewriters there was very little or no visual difference between these glyphs, and typists treated them interchangeably as keyboarding shortcuts. In fact, most keyboards did not even have a key for the digit "1", requiring users to type the letter "l" instead, and some also omitted 0. As these same typists transitioned in the 1970s and 1980s to being computer keyboard operators, their old keyboarding habits continued with them, and was an occasional source of confusion.
 
Most current type designs carefully distinguish between these homoglyphs, usually by drawing the digit zero narrower and drawing the digit one with prominent [[serifs]]. Early computer print-outs went even further and marked the zero with a slash or dot, which led to a new conflict involving the [[Scandinavian language|Scandinavian]] letter "[[Ø]]" and the Greek letter Φ ([[phi]]). The redesigning of character types to differentiate these characters has meant less confusion. The degree to which two different characters appear the same to a given observer is called the "visual similarity".<ref name="helfrich">{{cite conference |last1=Helfrich |first1=James |first2=Rick |last2=Neff |title=Dual canonicalization: An answer to the homograph attack |conference= eCrime Researchers Summit (eCrime), 2012 |year=2012|doi=10.1109/eCrime.2012.6489517 }}</ref>
 
== <!-- Both of the following span tags are needed, since this page is, by default, sans-serif when displayed and serif when printed. -->Multi-letter homoglyphs ==
[[File:Stefan Szczotkowski (1767-1836).jpg|thumb|225px|'''''St'''efan Szczotkowski'' looks like '''''A'''effan Szczotkowski'' on the gravestone.]]
Some other combinations of letters look similar, for instance '''rn''' looks similar to '''m''', '''cl''' looks similar to '''d''', and '''vv''' looks similar to '''w'''.
 
In certain narrow-spaced fonts (such as [[Tahoma (typeface)|Tahoma]]), placing the letter '''c''' next to a letter such as j, l or i will create a homoglyph, such as <span style="font: 8pt Tahoma, sans-serif;">cj cl ci</span> (g d a).
 
When some characters are placed next to each other, seen together at a glance they give the visual impression of another, unrelated character. A more precise way of saying this is that some [[Typographical ligature|typographic ligatures]] can look similar to standalone glyphs. For example, the '''{{not a typo|fi}}''' ligature ('''fi''') can look similar to '''A''' in some typefaces or fonts. This potential for confusion is sometimes an argument made against the use of ligatures.{{Citation needed|date=August 2009}}
 
== Unicode homoglyphs ==
[[File:Venn diagram gr la ru.svg|thumb|right|The three most prominent European alphabets (Greek, Cyrillic and Latin) share many letter forms that are encoded in Unicode under separate code points.]]
The [[Unicode]] [[character set]] contains many strongly homoglyphic characters, known as "confusables".<ref name=":0" /> These present security risks in a variety of situations (addressed in UTR#36)<ref>{{cite web |title=UTR #36: Unicode Security Considerations |url=http://unicode.org/reports/tr36/ |website=unicode.org}}</ref> and have recently been called to particular attention in regard to [[internationalized domain name]]s. One might deliberately spoof a domain name by replacing one character with its homoglyph, thus creating a second domain name, not readily distinguishable from the first, that can be exploited in [[phishing]] (''see main article [[IDN homograph attack]]''). In many [[typeface|fonts]] the [[Greek alphabet|Greek]] letter 'Α', the [[Cyrillic]] letter 'А' and the [[Latin alphabet|Latin]] letter 'A' are visually identical, as are the Latin letter 'a' and the Cyrillic letter 'а' (the same can be applied to the Latin letters "aBeHKopcTxy" and the Cyrillic letters "{{Script|Cyrl|аВеНКорсТху}}"). A domain name can be spoofed simply by substituting one of these forms for another in a separately registered name. There are also many examples of near-homoglyphs within the same script such as 'í' (with an acute accent) and 'i', É (E-acute) and Ė (E dot above) and È (E-grave), Í (with an acute accent) and ĺ (Lowercase L with acute). When discussing this specific security issue, any two sequences of similar characters may be assessed in terms of its potential to be taken as a 'homoglyph pair', or if the sequences clearly appear to be words, as 'pseudo-homographs' (noting again that these terms may themselves cause confusion in other contexts). In the [[Chinese language]], many [[simplified Chinese characters]] are homoglyphs of the corresponding [[traditional Chinese characters]].
 
Efforts by [[DNS registry|TLD registries]] and [[Web browser]] designers are under way to minimize the risks of homoglyphic confusion. Commonly, this is achieved by prohibiting names which mix character sets from multiple languages ([[Toys "R" Us|toys-Я-us.org]], using the Cyrillic letter [[Ya (Cyrillic)|Я]], would be invalid, but [[Uncyclopedia#Portuguese – Desciclopédia|wíkipedia.org]] and [[Wikipedia|wiki-indonesia.club]] still exist as different websites); Canada's [[.ca]] registry goes one step further by requiring names which differ only in [[diacritic]]s to have the same owner and same registrar.<ref>{{cite web |url=http://www.cira.ca/why-ca/french-ca/ |title=Archived copy |access-date=2013-03-29 |url-status=dead |archive-url=https://web.archive.org/web/20130328203849/http://www.cira.ca/why-ca/french-ca/ |archive-date=2013-03-28 }}</ref> The handling of Chinese characters varies: in [[.org]] and [[.info]] registration of one variant renders the other unavailable to anyone, while in [[.biz]] the traditional and simplified versions of the same name are delivered as a two-domain bundle which both point to the same [[domain name server]].
 
Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum<ref>{{cite web|url=http://forum.icann.org/lists/idn-guidelines/|title=ICANN Email Archives: [idn-guidelines]|website=forum.icann.org}}</ref> provided by [[ICANN]].
 
== Canonicalization ==
 
Homoglyphs of all kinds can be detected through a process called 'dual canonicalization'.<ref name="helfrich"/> The first step in this process is to identify homoglyph sets, namely characters appearing the same to a given observer. From here, a single token is specified to represent the homoglyph set. This token is called a canon. The next step is to convert each character in the text to the corresponding canon in a process called canonicalization. If the canons of two runs of text are the same but the original text is different, then a homoglyph exists in the text.
 
== Lihat pula ==
*{{anli|IDN homograph attack}}
*[[Duplicate characters in Unicode]]
*[[Serif]]
*{{anli|minim (palaeography)}}
*[[Vehicle registration plates of Bosnia and Herzegovina]] use only numbers and letters that look the same in the Latin and Cyrillic alphabets.
 
== Referensi ==
{{Reflist}}
 
== Pranala luaraluar ==
{{Wiktionary}}
*[https://www.unicode.org/Public/security/latest/confusables.txt https://www.unicode.org/Public/security/latest/confusables.txt] - recommended confusable mapping for IDN.