» 
Arabic Bulgarian Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Korean Latvian Lithuanian Malagasy Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Thai Turkish Vietnamese
Arabic Bulgarian Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Korean Latvian Lithuanian Malagasy Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Thai Turkish Vietnamese

definition - UTF-32/UCS-4

definition of Wikipedia

   Advertizing ▼

Wikipedia

UTF-32/UCS-4

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Unicode
Character encodings
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and E-mail
Unicode typefaces

UTF-32 (or UCS-4) is a protocol for encoding Unicode characters that uses exactly 32 bits for each Unicode code point. All other Unicode transformation formats use variable-length encodings.

Because UTF-32 uses 4 bytes for every character it is quite space inefficient. Specifically, non-BMP characters are so rare in most texts, they may as well be considered non-existent for sizing discussions, making UTF-32 between two and four times the size of other encodings.

Though a fixed number of bytes per code point seems convenient, it is not used as much as the other Unicode encodings. It makes truncation slightly easier but not significantly so compared to UTF-8 and UTF-16. It does not make calculating the displayed width of a string any easier except in very limited cases, since even with a “fixed width” font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks also mean editors cannot treat one code point as being the same as one unit for editing.

History

The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220 + 216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

See also

External links

 

All translations of UTF-32/UCS-4


sensagent's content

  • definitions
  • synonyms
  • antonyms
  • encyclopedia

Dictionary and translator for handheld

⇨ New : sensagent is now available on your handheld

   Advertising ▼

sensagent's office

Shortkey or widget. Free.

Windows Shortkey: sensagent. Free.

Vista Widget : sensagent. Free.

Webmaster Solution

Alexandria

A windows (pop-into) of information (full-content of Sensagent) triggered by double-clicking any word on your webpage. Give contextual explanation and translation from your sites !

Try here  or   get the code

SensagentBox

With a SensagentBox, visitors to your site can access reliable information on over 5 million pages provided by Sensagent.com. Choose the design that fits your site.

Business solution

Improve your site content

Add new content to your site from Sensagent by XML.

Crawl products or adds

Get XML access to reach the best products.

Index images and define metadata

Get XML access to fix the meaning of your metadata.


Please, email us to describe your idea.

WordGame

The English word games are:
○   Anagrams
○   Wildcard, crossword
○   Lettris
○   Boggle.

Lettris

Lettris is a curious tetris-clone game where all the bricks have the same square shape but different content. Each square carries a letter. To make squares disappear and save space for other squares you have to assemble English words (left, right, up, down) from the falling squares.

boggle

Boggle gives you 3 minutes to find as many words (3 letters or more) as you can in a grid of 16 letters. You can also try the grid of 16 letters. Letters must be adjacent and longer words score better. See if you can get into the grid Hall of Fame !

English dictionary
Main references

Most English definitions are provided by WordNet .
English thesaurus is mainly derived from The Integral Dictionary (TID).
English Encyclopedia is licensed by Wikipedia (GNU).

Copyrights

The wordgames anagrams, crossword, Lettris and Boggle are provided by Memodata.
The web service Alexandria is granted from Memodata for the Ebay search.
The SensagentBox are offered by sensAgent.

Translation

Change the target language to find translations.
Tips: browse the semantic fields (see From ideas to words) in two languages to learn more.

last searches on the dictionary :

3374 online visitors

computed in 0.062s

I would like to report:
section :
a spelling or a grammatical mistake
an offensive content(racist, pornographic, injurious, etc.)
a copyright violation
an error
a missing statement
other
please precise:

Advertize

Partnership

Company informations

My account

login

registration

   Advertising ▼