X-Nico

22 unusual facts about Unicode


.am

Unicode compatible names will not be instituted at AM-NIC until all issues related to IPv6 are resolved.

Abdul-Majid Bhurgri

In 2000-2001, Bhurgri developed the first ever Sindhi Unicode font, obtained support for Sindhi on Windows platform and developed needed resources to make use of standard Sindhi possible on Windows OS and made these freely available on internet.

Batak script

Batak script was added to the Unicode Standard in October 2010 with the release of version 6.0.

Biangbiang noodles

The character has not been added to Unicode yet, but is being considered by the IRG for inclusion in the CJK Unified Ideographs Extension E block.

Climm

It is internationalized; German, English, and other translations are available, and it supports sending and receiving acknowledged and non-acknowledged Unicode-encoded messages (it even understands UTF-8 messages for message types the ICQ protocol does not use them for).

Directory traversal attack

When Microsoft added Unicode support to their Web server, a new way of encoding .

Domain Name System

To make this possible, ICANN approved the Internationalizing Domain Names in Applications (IDNA) system, by which user applications, such as web browsers, map Unicode strings into the valid DNS character set using Punycode.

Dynamic recompilation

FreeKEYB is a Unicode-based dynamically configurable successor of K3PLUS supporting most keyboard layouts, code pages, and country codes.

Esperanto Wikipedia

The introduction of support for the Esperanto alphabet by Brion Vibber, an Esperanto speaker and later Wikimedia Foundation's first employee, in January 2002 has paved the way for alphabets of languages other than English and initiated the transition of the whole Wikipedia to Unicode.

Glyph Bitmap Distribution Format

The string name of this particular character is "U+0041", expressing in the Unicode convention the code point hexadecimal 41 (decimal 65, the ASCII character "A").

Heap spraying

Depending on how the browser implements strings, either ASCII or Unicode characters can be used in the string.

Hossein Derakhshan

On September 25, 2001, he started his weblog in the Persian language, using Unicode.

I Ching

The other encoded I Ching symbols were added to the Unicode Standard in April, 2003 with the release of version 4.0.

In the table below, each hexagram's translation is accompanied by a form of R. Wilhelm translation (which is the source for the Unicode names), followed by a retranslation.

I Ching trigrams were added to the Unicode Standard in June, 1993 with the release of version 1.1.

Intrusion detection system evasion techniques

In the past, an adversary using the Unicode character could encode attack packets that an IDS would not recognize but that an IIS web server would decode and become attacked.

Kaithi

Kaithi script was added to the Unicode Standard in October, 2009 with the release of version 5.2.

Natalie Wong

In fact, the character was only recently available as an add-on to the Unicode software Extension-B package.

Ş

In early versions of Unicode, Ș (S-comma) was not present in the Unicode Standard.

Scribe Mail

Scribe uses Unicode internally and can display text correctly even if the glyphs are not available in the current font.

Sweet Dew Incident

Li Xun and Zheng also had six eunuchs who had conflicts with Wang Shoucheng previously, Tian Yuancao (田元操), Liu Xingshen (劉行深), Zhou Yuanzhen (周元稹), Xue Shigan (薛士幹), Sixian Yiyi (似先義逸), and Liu Yingchan (劉英, final character not in Unicode) to be sent out of Chang'an to survey six remote circuits, planning to eventually send edicts drafted by the imperial scholar Gu Shiyong (顧師邕) to the six circuits to order them to commit suicide later.

Trie

Such compression is also typically used in the implementation of the various fast lookup tables needed to retrieve Unicode character properties (for example to represent case mapping tables, or lookup tables containing the combination of base and combining characters needed to support Unicode normalization).


Adobe InDesign

InDesign supports Unicode character encoding and there is a special Middle East version supporting complex text layout for Arabic and Hebrew types of complex script.

Apple Keyboard

Some of these keys have unique symbols defined in the Unicode block Miscellaneous Technical.

Arial Unicode MS

The font is also apparently licensed to Apple, who announced on October 16, 2007 that their flagship operating system, Mac OS X v10.5 ("Leopard"), would be bundled with Arial Unicode.

Batch file

The non-ASCII parts of these are incompatible with the Unicode or Windows character sets otherwise used in Windows so care needs to be taken.

Bush hid the facts

While "Bush hid the facts" is the sentence most commonly presented on the Internet to induce the error, the bug can be triggered by many sentences with characters and spaces in a particular order so that the bytes match the UTF-16LE encoding of valid (if nonsensical) Chinese Unicode characters.

C0 and C1 control codes

While the C1 control characters are used in conjunction with the ISO/IEC 8859 series of graphical character sets among others, and are integrated into Unicode, they are rarely used directly, except on specific platforms such as OpenVMS.

CDuce

a rich type algebra, with recursive types and arbitrary boolean combinations (union, intersection, complement) allows precise definitions of data structures and XML types; general purpose types and types constructors are taken seriously (products, extensible records, arbitrary precision integers with interval constraints, Unicode characters);

Core Foundation

The most prevalent use of Core Foundation is for passing its own primitive types for data, including raw bytes, Unicode strings, numbers, calendar dates, and UUIDs, as well as collections such as arrays, sets, and dictionaries, to numerous OS X C routines, primarily those that are GUI-related.

DirectWrite

Supported Unicode features include BIDI, line breaking, surrogates, UVS, language-guided script itemization, number substitution, and glyph shaping.

Egyptian Hieroglyphs

Egyptian Hieroglyphs (Unicode block), a block of Unicode characters containing the characters of the Gardiner sign list.

Ellipsis

Unicode recognizes a series of three period characters (U+002E) as compatibility equivalent (though not canonical) to the horizontal ellipsis character.

Fontconfig

To perform font matching, fontconfig stores typesetting information about all of the installed fonts, including the name of the font family, style, weight, DPI, and Unicode coverage.

GOST

GOST 10859: A 1964 character set for computers, includes non-ASCII/non-Unicode characters required when programming in the ALGOL programming language.

ISO 15919

For example, Tahoma supports most Unicode characters needed for Indic language transliteration.

ISO/IEC 646

In the table above, the cells with non-white background emphasize the differences from the US variant used in the Basic Latin subset of ISO/IEC 10646 and Unicode.

Lām with bar

ݪ (Unicode name: Arabic Letter Lam With Bar) is an additional letter of the Arabic script, derived from lām (ل) with the addition of a bar.

Mensural notation

For quoting mensural notation symbols in inline text, a number of characters have been included in the character encoding standard Unicode, in the "musical symbols" block.

MUFI

Medieval Unicode Font Initiative, a project which aims to coordinate the encoding and display of special characters in medieval texts written in the Latin alphabet, which are not encoded as part of Unicode

Musical Symbols

Byzantine Musical Symbols, a Unicode block of Byzantine era musical notation symbols.

NewLISP

It also provides the functions expected of a modern scripting language, including support for regular expressions, XML, Unicode (UTF-8), TCP/IP and UDP networking, matrix and array processing, advanced math, statistics and Bayesian statistical analysis, financial mathematics, and distributed computing support.

Numerals in Unicode

According to the Unicode standard version 3.0, these characters are called Hangzhou style numerals.

Roli Mosimann

Roli Mosimann has most recently worked with Jojo Mayer and Miloopa (engineering and mixing the latter band's "Unicode" album).

Sīn with four dots above

ݜ (Unicode name: Arabic Letter Seen With Four Dots Above) is an additional letter of the Arabic script, derived from sīn (ﺱ) with the addition of four dots or two horizontal lines above the letter.

Standard Compression Scheme for Unicode

SQL Server 2008 R2 uses SCSU to compress Unicode values stored in nchar(n) and nvarchar(n) columns, achieving space savings between 15% and 50%, depending on the language of the data.

Telegraph code

(Demand for accented vowels led to the 8-bit ISO 8859-1. Demand for Cyrillic, Hebrew, Arabic, Greek, and other letters led to many variations of ISO 8859. Demand for using more than one of these sets of letters at the same time led to the 16 bit Unicode. Demand for Chinese characters led to 20-bit Unicode, which clearly is impossible for any typewriter like formed character mechanical teleprinter, although dot-matrix printers can print Chinese characters.)

UTF-EBCDIC

IBM EBCDIC-based mainframe operating systems, such as z/OS, usually use UTF-16 for complete Unicode support.

VNI

Due to its longstanding use, VIQR was a natural choice for computer word processing, prior to the appearance of VNI, VPS, VISCII, and Unicode.

Z-variant

The Unicode philosophy of codepoint allocation for CJK languages is organized along three “axes.”