Showing posts with label UTS #39. Show all posts
Showing posts with label UTS #39. Show all posts

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

stopwatch image The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.
In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm
  • Uses Extended_Pictographic property for future-proofing
UAX #29, Unicode Text Segmentation
  • New support for Indic virama handling
  • Uses Extended_Pictographic property for future-proofing
  • A new table of formal regex definitions
UAX #31, Unicode Identifier and Pattern Syntax
  • Refines the use of ZWJ in identifiers
  • Broadens the definition of hashtag identifiers
UAX #38, Unicode Han Database (Unihan)
  • Five new fields and improved regular expressions.
  • Document extension of Unihan properties to non-Unihan
UAX #44, Unicode Character Database
  • New property Equivalent_Unified_Ideograph
  • New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
  • More discussion of emoji variation sequences
  • Clarification of values allowed for the Age property
UTS #10, Unicode Collation Algorithm
  • Updates data to Unicode 11.0
  • Clarification of search tailoring in visual-order scripts
UTS #39, Unicode Security Mechanisms
  • Updates data to Unicode 11.0
  • Enhances discussions of joining controls & combining sequences
UTS #46, Unicode IDNA Compatibility Processing
  • Updates data to Unicode 11.0
  • Changes the format of the test file for arbitrary input settings
  • Updates input setting for Transitional_Processing
UTS #51, Unicode Emoji
  • Supplies Extended_Pictographic property for future-proofing
  • Simplifies emoji sequence definitions
  • EBNF and Regex expressions for loose matches
  • More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
  • Mechanism for changing the “facing” direction for emoji
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, June 21, 2016

Announcing The Unicode® Standard, Version 9.0

🥂Version 9.0 of the Unicode Standard is now available. Version 9.0 adds exactly 7,500 characters, for a total of 128,172 characters. These additions include six new scripts and 72 new emoji characters.

The new scripts and characters in Version 9.0 add support for lesser-used languages worldwide, including:
  • Osage, a Native American language
  • Nepal Bhasa, a language of Nepal
  • Fulani and other African languages
  • The Bravanese dialect of Swahili, used in Somalia
  • The Warsh orthography for Arabic, used in North and West Africa
  • Tangut, a major historic script of China
Important symbol additions include:
  • 19 symbols for the new 4K TV standard
  • 72 emoji characters such as the following
Smileys & people 🤣  ROLLING ON THE FLOOR LAUGHING
🤦  FACE PALM
Hand gestures 🤞  HAND WITH INDEX AND MIDDLE FINGERS CROSSED
Animals 🦋  BUTTERFLY
Food 🥑  AVOCADO
🥘  SHALLOW PAN OF FOOD
Drink 🥂  CLINKING GLASSES
Travel 🛵  MOTOR SCOOTER
Sports 🤸  PERSON DOING CARTWHEEL

For the full list, see emoji additions for Unicode 9.0. For a detailed description of support for emoji characters by the Unicode Standard, see UTR #51, Unicode Emoji.

Three other important Unicode specifications have been updated for Version 9.0:
Some of the changes in Version 9.0 and associated Unicode technical standards and reports may require modifications in implementations. For more information, see Unicode 9.0 Migration and the migration sections of UTS #10, UTS #39, UTS #46, and UTR #51. For full details on Version 9.0, see http://unicode.org/versions/Unicode9.0.0/

Wednesday, June 17, 2015

Announcing The Unicode® Standard, Version 8.0

Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts.

The first version of Unicode Technical Report #51, Unicode Emoji is being released at the same time. That document describes the new emoji characters. It provides design guidelines and data for improving emoji interoperability across platforms, gives background information about emoji symbols, and describes how they are selected for inclusion in the Unicode Standard. The data is used to support emoji characters in implementations, specifying which symbols are commonly displayed as emoji, how the new skin-tone modifiers work, and how composite emoji can be formed with joiners. The Unicode website now supplies charts of emoji characters, showing vendor variations and providing other useful information.

The 41 new emoji in Unicode 8.0 include the following:

Diversity
five emoji modifiers
Faces and Hands
NERD FACE, FACE WITH ROLLING EYES, ROBOT FACE
Food-Related
HOT DOG, TACO, CHEESE WEDGE, POPCORN
Sports
CRICKET BAT AND BALL, VOLLEYBALL, BOW AND ARROW
Animals
UNICORN FACE, LION FACE, CRAB, SCORPION
Religious
MOSQUE, SYNAGOGUE, PRAYER BEADS

(For the full list, including images, see emoji additions for Unicode 8.0.)

Phones and computers often need operating system updates to support new emoji, which may take some time. It is also now clear which existing characters, such as the often requested SHOPPING BAGS, can be used as emoji. Once phones and computers support these characters, people will be able to see colorful images such as the BOTTLE WITH POPPING CORK above.

Three other important Unicode specifications are updated for Version 8.0:
Some of the changes in Version 8.0 and associated Unicode technical standards may require modifications in implementations. For more information, see Unicode 8.0 Migration and the migration sections of UTS #10, UTS #39, and UTS #46. For full details on Version 8.0, see Unicode 8.0.

Thursday, February 12, 2015

Unicode Security Mechanisms (UTS #39) Proposed Update Available


UTS #39, Unicode Security Mechanisms has a proposed update available, with modifications for alignment with the future Unicode 8.0, and some revisions to data and format. Most notable are the new Unicode 8.0 characters, the removal of the SL, SA, and ML data, and some changes in the values of the General Security Profile Identifier Types, and changes in the format for both the Status and Types.

Feedback is welcome through April 27, 2015. For further information and details about how to provide feedback, please see Public Review Issue #292.

Wednesday, May 28, 2014

Unicode Security Data: Beta Review

The documents and data for two Public Review Issues have been recently revised. The issues are PRI #272 Proposed Update UTR #36, Unicode Security Considerations and PRI #273 Proposed Update UTS #39, Unicode Security Mechanisms.

We have revised the draft data for UTS #39: Unicode Security Mechanisms (confusables and identifier restrictions) for review. There are also some small changes to the text of  UTR #36: Unicode Security Considerations, mostly cleanup in preparation for publication. The most important data files for review are:
These files can be downloaded from http://unicode.org/Public/security/7.0.0 and diffed against the corresponding files in http://unicode.org/Public/security/6.3.0
 
The comment period ends July 28, 2014.

Monday, March 31, 2014

Proposed Updates for Unicode Security-Related Publications

Proposed updates are now available for UTR #36, Unicode Security Considerations, and UTS #39, Unicode Security Mechanisms. These are both being updated to correspond with Unicode 7.0.

PRI #272, Proposed Update UTR #36, Unicode Security Considerations:
This UTR is being updated. In this draft, a description has been added about the downside of displaying URLs as Punycode. A note has also been added on the use of Catalan in identifiers.

PRI #273, Proposed Update UTS #39, Unicode Security Mechanisms:
This UTS is being updated to correspond with Unicode 7.0. Text has been added about the use of NFC, and on the use of Catalan in identifiers. A note has been added on the collection of confusable data outside of Status=allowed, such as for non-NFKC characters.

Review notes solicit feedback on whether to (a) add multi-character sequences to the data file, (b) change some of the Type values, and (c) base the data more on CLDR exemplars, and/or (d) change the format of the data files.

The closing date for both of these issues is April 28, 2014. For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions on the PRI pages.

The Public Review Issues page is: http://www.unicode.org/review/