X-Nico

22 unusual facts about Unicode

.am

Unicode compatible names will not be instituted at AM-NIC until all issues related to IPv6 are resolved.

In 2000-2001, Bhurgri developed the first ever Sindhi Unicode font, obtained support for Sindhi on Windows platform and developed needed resources to make use of standard Sindhi possible on Windows OS and made these freely available on internet.

Batak script

Batak script was added to the Unicode Standard in October 2010 with the release of version 6.0.

Biangbiang noodles

The character has not been added to Unicode yet, but is being considered by the IRG for inclusion in the CJK Unified Ideographs Extension E block.

Climm

It is internationalized; German, English, and other translations are available, and it supports sending and receiving acknowledged and non-acknowledged Unicode-encoded messages (it even understands UTF-8 messages for message types the ICQ protocol does not use them for).

Directory traversal attack

When Microsoft added Unicode support to their Web server, a new way of encoding .

`Domain Name System`

To make this possible, ICANN approved the Internationalizing Domain Names in Applications (IDNA) system, by which user applications, such as web browsers, map Unicode strings into the valid DNS character set using Punycode.

`Dynamic recompilation`

FreeKEYB is a Unicode-based dynamically configurable successor of K3PLUS supporting most keyboard layouts, code pages, and country codes.

`Esperanto Wikipedia`

The introduction of support for the Esperanto alphabet by Brion Vibber, an Esperanto speaker and later Wikimedia Foundation's first employee, in January 2002 has paved the way for alphabets of languages other than English and initiated the transition of the whole Wikipedia to Unicode.

`Glyph Bitmap Distribution Format`

The string name of this particular character is "U+0041", expressing in the Unicode convention the code point hexadecimal 41 (decimal 65, the ASCII character "A").

`Heap spraying`

Depending on how the browser implements strings, either ASCII or Unicode characters can be used in the string.

`Hossein Derakhshan`

On September 25, 2001, he started his weblog in the Persian language, using Unicode.

`I Ching`

The other encoded I Ching symbols were added to the Unicode Standard in April, 2003 with the release of version 4.0.

•

In the table below, each hexagram's translation is accompanied by a form of R. Wilhelm translation (which is the source for the Unicode names), followed by a retranslation.

•

I Ching trigrams were added to the Unicode Standard in June, 1993 with the release of version 1.1.

`Intrusion detection system evasion techniques`

In the past, an adversary using the Unicode character could encode attack packets that an IDS would not recognize but that an IIS web server would decode and become attacked.

`Kaithi`

Kaithi script was added to the Unicode Standard in October, 2009 with the release of version 5.2.

`Natalie Wong`

In fact, the character was only recently available as an add-on to the Unicode software Extension-B package.

`Ş`

In early versions of Unicode, Ș (S-comma) was not present in the Unicode Standard.

`Scribe Mail`

Scribe uses Unicode internally and can display text correctly even if the glyphs are not available in the current font.

`Sweet Dew Incident`

Li Xun and Zheng also had six eunuchs who had conflicts with Wang Shoucheng previously, Tian Yuancao (田元操), Liu Xingshen (劉行深), Zhou Yuanzhen (周元稹), Xue Shigan (薛士幹), Sixian Yiyi (似先義逸), and Liu Yingchan (劉英, final character not in Unicode) to be sent out of Chang'an to survey six remote circuits, planning to eventually send edicts drafted by the imperial scholar Gu Shiyong (顧師邕) to the six circuits to order them to commit suicide later.

`Trie`

Such compression is also typically used in the implementation of the various fast lookup tables needed to retrieve Unicode character properties (for example to represent case mapping tables, or lookup tables containing the combination of base and combining characters needed to support Unicode normalization).

`Adobe InDesign`

InDesign supports Unicode character encoding and there is a special Middle East version supporting complex text layout for Arabic and Hebrew types of complex script.

`Apple Keyboard`

Some of these keys have unique symbols defined in the Unicode block Miscellaneous Technical.

`Arial Unicode MS`

The font is also apparently licensed to Apple, who announced on October 16, 2007 that their flagship operating system, Mac OS X v10.5 ("Leopard"), would be bundled with Arial Unicode.

`Batch file`

The non-ASCII parts of these are incompatible with the Unicode or Windows character sets otherwise used in Windows so care needs to be taken.

`Bush hid the facts`

While "Bush hid the facts" is the sentence most commonly presented on the Internet to induce the error, the bug can be triggered by many sentences with characters and spaces in a particular order so that the bytes match the UTF-16LE encoding of valid (if nonsensical) Chinese Unicode characters.

`C0 and C1 control codes`

While the C1 control characters are used in conjunction with the ISO/IEC 8859 series of graphical character sets among others, and are integrated into Unicode, they are rarely used directly, except on specific platforms such as OpenVMS.

`CDuce`

a rich type algebra, with recursive types and arbitrary boolean combinations (union, intersection, complement) allows precise definitions of data structures and XML types; general purpose types and types constructors are taken seriously (products, extensible records, arbitrary precision integers with interval constraints, Unicode characters);

`Core Foundation`

The most prevalent use of Core Foundation is for passing its own primitive types for data, including raw bytes, Unicode strings, numbers, calendar dates, and UUIDs, as well as collections such as arrays, sets, and dictionaries, to numerous OS X C routines, primarily those that are GUI-related.

`DirectWrite`

Supported Unicode features include BIDI, line breaking, surrogates, UVS, language-guided script itemization, number substitution, and glyph shaping.

`Egyptian Hieroglyphs`

Egyptian Hieroglyphs (Unicode block), a block of Unicode characters containing the characters of the Gardiner sign list.

`Ellipsis`

Unicode recognizes a series of three period characters (U+002E) as compatibility equivalent (though not canonical) to the horizontal ellipsis character.

`Fontconfig`

To perform font matching, fontconfig stores typesetting information about all of the installed fonts, including the name of the font family, style, weight, DPI, and Unicode coverage.

`GOST`

GOST 10859: A 1964 character set for computers, includes non-ASCII/non-Unicode characters required when programming in the ALGOL programming language.

`ISO 15919`

For example, Tahoma supports most Unicode characters needed for Indic language transliteration.

`ISO/IEC 646`

In the table above, the cells with non-white background emphasize the differences from the US variant used in the Basic Latin subset of ISO/IEC 10646 and Unicode.

`Lām with bar`

ݪ (Unicode name: Arabic Letter Lam With Bar) is an additional letter of the Arabic script, derived from lām (ل) with the addition of a bar.

`Mensural notation`

For quoting mensural notation symbols in inline text, a number of characters have been included in the character encoding standard Unicode, in the "musical symbols" block.

`MUFI`

Medieval Unicode Font Initiative, a project which aims to coordinate the encoding and display of special characters in medieval texts written in the Latin alphabet, which are not encoded as part of Unicode

`Musical Symbols`

Byzantine Musical Symbols, a Unicode block of Byzantine era musical notation symbols.

`NewLISP`

It also provides the functions expected of a modern scripting language, including support for regular expressions, XML, Unicode (UTF-8), TCP/IP and UDP networking, matrix and array processing, advanced math, statistics and Bayesian statistical analysis, financial mathematics, and distributed computing support.

`Numerals in Unicode`

According to the Unicode standard version 3.0, these characters are called Hangzhou style numerals.

`Roli Mosimann`

Roli Mosimann has most recently worked with Jojo Mayer and Miloopa (engineering and mixing the latter band's "Unicode" album).

`Sīn with four dots above`

ݜ (Unicode name: Arabic Letter Seen With Four Dots Above) is an additional letter of the Arabic script, derived from sīn (ﺱ) with the addition of four dots or two horizontal lines above the letter.

`Standard Compression Scheme for Unicode`

SQL Server 2008 R2 uses SCSU to compress Unicode values stored in nchar(n) and nvarchar(n) columns, achieving space savings between 15% and 50%, depending on the language of the data.

`Telegraph code`

(Demand for accented vowels led to the 8-bit ISO 8859-1. Demand for Cyrillic, Hebrew, Arabic, Greek, and other letters led to many variations of ISO 8859. Demand for using more than one of these sets of letters at the same time led to the 16 bit Unicode. Demand for Chinese characters led to 20-bit Unicode, which clearly is impossible for any typewriter like formed character mechanical teleprinter, although dot-matrix printers can print Chinese characters.)

`UTF-EBCDIC`

IBM EBCDIC-based mainframe operating systems, such as z/OS, usually use UTF-16 for complete Unicode support.

`VNI`

Due to its longstanding use, VIQR was a natural choice for computer word processing, prior to the appearance of VNI, VPS, VISCII, and Unicode.

`Z-variant`

The Unicode philosophy of codepoint allocation for CJK languages is organized along three “axes.”

22 unusual facts about Unicode

Similar

`Similar`