Homoglyph

From Wikipedia, the free encyclopedia

In typography, a homoglyph is one of a pair of characters with shapes that are either identical, or cannot be differentiated by quick visual inspection. This designation is also applied to sequences of characters sharing these properties. The antonym is a synoglyph, which refers to glyphs that look different but mean the same thing. Synoglyphs are also known as display variants. Synoglyphs are the equivalent of synonyms - words that mean the same thing.

The term homograph is sometimes used synonymously with homoglyph, but it must be noted that the typographic sense of this term is not included in the definition normally applied in linguistic discourse. In that context, homography is a property of words, not characters, and homographs are a type of homonym. References to characters in terms of the similarity of their appearance might therefore best be made without reliance on specialized vocabulary, for example, as 'seemingly identical', 'visually similar', 'visually confusable' or 'look-alike' characters. The Unicode Consortium has recently published its Technical Report #36 [1] on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.

Two common and important pairs of homoglyphs in use today are the digit zero and the capital letter O (i.e. 0 & O); and the digit one and the lowercase letter L (i.e. 1 & l). In the days of mechanical typewriters there was very little or no visual difference between these glyphs (some even omitted 1 and 0 completely), and typists treated them interchangeably as keyboarding shortcuts. As these same typists transitioned in the 1970s and 1980s to being computer keyboard operators, their old keyboarding habits betrayed them and became a source of great confusion. Ensuring these two pairs of homoglyphs are never confused is very important. Most current type designs carefully distinguish them, usually by drawing the digit zero narrower and by drawing the digit one with prominent serifs. Early computer print-outs went even further and marked the zero with a slash or dot. The re-drawing of type designs to split these homoglyphs, combined with the passing of keyboard operators trained on mechanical typewriters has seen the prevalence of these particular homoglyph typos greatly diminish.

Still, lowercase L sometimes resembles 1 in serif fonts (l & 1), and I in sans-serif fonts (l & I).

The Unicode character set contains many strongly homoglyphic characters. These present security risks in a variety of situations (addressed in UTR#36) and have recently been called to particular attention in regard to internationalized domain names. One might deliberately spoof a domain name by substituting one character with its homograph, thus creating a second domain name, not readily distinguishable from the first, that can be exploited in phishing (see main article IDN homograph attack). In many fonts the Greek letter 'Α', the Cyrillic letter 'А' and the Latin letter 'A' are visually identical, as are the Latin letter 'a' and the Cyrillic letter 'а'. A domain name can be spoofed simply by substituting one of these forms for another in a separately registered name. There are also many examples of near-homoglyphs within the same script such as 'í' (with an acute accent) and 'i'. When discussing this specific security issue, any two sequences of similar characters may be assessed in terms of its potential to be taken as a 'homoglyph pair', or if the sequences clearly appear to be words, as 'pseudo-homographs' (noting again that these terms may themselves cause confusion in other contexts).

Efforts are underway by TLD registries and Web browser designers to minimize the risks of homoglyphic confusion to the fullest extent possible. Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum [2] provided by ICANN.

A familiar manifestation of homoglyphic confusion in a historical regard results from of the use of a 'y' to represent a 'þ' when setting older English texts in fonts that do not contain the latter character. This has led to the mistaken supposition that the word 'The' was formerly written and pronounced as 'Ye', as in 'Ye olde shoppe', instead of the intended 'Þe olde shoppe' (discussed in detail in a separate article on the thorn).

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.