Path: blob/aarch64-shenandoah-jdk8u272-b10/jdk/src/share/classes/java/lang/Character.java
38829 views
/*1* Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved.2* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.3*4* This code is free software; you can redistribute it and/or modify it5* under the terms of the GNU General Public License version 2 only, as6* published by the Free Software Foundation. Oracle designates this7* particular file as subject to the "Classpath" exception as provided8* by Oracle in the LICENSE file that accompanied this code.9*10* This code is distributed in the hope that it will be useful, but WITHOUT11* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or12* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License13* version 2 for more details (a copy is included in the LICENSE file that14* accompanied this code).15*16* You should have received a copy of the GNU General Public License version17* 2 along with this work; if not, write to the Free Software Foundation,18* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.19*20* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA21* or visit www.oracle.com if you need additional information or have any22* questions.23*/2425package java.lang;2627import java.util.Arrays;28import java.util.Map;29import java.util.HashMap;30import java.util.Locale;3132/**33* The {@code Character} class wraps a value of the primitive34* type {@code char} in an object. An object of class35* {@code Character} contains a single field whose type is36* {@code char}.37* <p>38* In addition, this class provides a large number of static methods for39* determining a character's category (lowercase letter, digit, etc.)40* and for converting characters from uppercase to lowercase and vice41* versa.42*43* <h3><a id="conformance">Unicode Conformance</a></h3>44* <p>45* The fields and methods of class {@code Character} are defined in terms46* of character information from the Unicode Standard, specifically the47* <i>UnicodeData</i> file that is part of the Unicode Character Database.48* This file specifies properties including name and category for every49* assigned Unicode code point or character range. The file is available50* from the Unicode Consortium at51* <a href="http://www.unicode.org">http://www.unicode.org</a>.52* <p>53* The Java SE 8 Platform uses character information from version 6.254* of the Unicode Standard, with two extensions. First, the Java SE 8 Platform55* allows an implementation of class {@code Character} to use the Japanese Era56* code point, {@code U+32FF}, from the first version of the Unicode Standard57* after 6.2 that assigns the code point. Second, in recognition of the fact58* that new currencies appear frequently, the Java SE 8 Platform allows an59* implementation of class {@code Character} to use the Currency Symbols60* block from version 10.0 of the Unicode Standard. Consequently, the61* behavior of fields and methods of class {@code Character} may vary across62* implementations of the Java SE 8 Platform when processing the aforementioned63* code points ( outside of version 6.2 ), except for the following methods64* that define Java identifiers:65* {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)},66* {@link #isJavaIdentifierPart(int)}, and {@link #isJavaIdentifierPart(char)}.67* Code points in Java identifiers must be drawn from version 6.2 of68* the Unicode Standard.69*70* <h3><a name="unicode">Unicode Character Representations</a></h3>71*72* <p>The {@code char} data type (and therefore the value that a73* {@code Character} object encapsulates) are based on the74* original Unicode specification, which defined characters as75* fixed-width 16-bit entities. The Unicode Standard has since been76* changed to allow for characters whose representation requires more77* than 16 bits. The range of legal <em>code point</em>s is now78* U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.79* (Refer to the <a80* href="http://www.unicode.org/reports/tr27/#notation"><i>81* definition</i></a> of the U+<i>n</i> notation in the Unicode82* Standard.)83*84* <p><a name="BMP">The set of characters from U+0000 to U+FFFF</a> is85* sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.86* <a name="supplementary">Characters</a> whose code points are greater87* than U+FFFF are called <em>supplementary character</em>s. The Java88* platform uses the UTF-16 representation in {@code char} arrays and89* in the {@code String} and {@code StringBuffer} classes. In90* this representation, supplementary characters are represented as a pair91* of {@code char} values, the first from the <em>high-surrogates</em>92* range, (\uD800-\uDBFF), the second from the93* <em>low-surrogates</em> range (\uDC00-\uDFFF).94*95* <p>A {@code char} value, therefore, represents Basic96* Multilingual Plane (BMP) code points, including the surrogate97* code points, or code units of the UTF-16 encoding. An98* {@code int} value represents all Unicode code points,99* including supplementary code points. The lower (least significant)100* 21 bits of {@code int} are used to represent Unicode code101* points and the upper (most significant) 11 bits must be zero.102* Unless otherwise specified, the behavior with respect to103* supplementary characters and surrogate {@code char} values is104* as follows:105*106* <ul>107* <li>The methods that only accept a {@code char} value cannot support108* supplementary characters. They treat {@code char} values from the109* surrogate ranges as undefined characters. For example,110* {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though111* this specific value if followed by any low-surrogate value in a string112* would represent a letter.113*114* <li>The methods that accept an {@code int} value support all115* Unicode characters, including supplementary characters. For116* example, {@code Character.isLetter(0x2F81A)} returns117* {@code true} because the code point value represents a letter118* (a CJK ideograph).119* </ul>120*121* <p>In the Java SE API documentation, <em>Unicode code point</em> is122* used for character values in the range between U+0000 and U+10FFFF,123* and <em>Unicode code unit</em> is used for 16-bit124* {@code char} values that are code units of the <em>UTF-16</em>125* encoding. For more information on Unicode terminology, refer to the126* <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>.127*128* @author Lee Boynton129* @author Guy Steele130* @author Akira Tanaka131* @author Martin Buchholz132* @author Ulf Zibis133* @since 1.0134*/135public final136class Character implements java.io.Serializable, Comparable<Character> {137/**138* The minimum radix available for conversion to and from strings.139* The constant value of this field is the smallest value permitted140* for the radix argument in radix-conversion methods such as the141* {@code digit} method, the {@code forDigit} method, and the142* {@code toString} method of class {@code Integer}.143*144* @see Character#digit(char, int)145* @see Character#forDigit(int, int)146* @see Integer#toString(int, int)147* @see Integer#valueOf(String)148*/149public static final int MIN_RADIX = 2;150151/**152* The maximum radix available for conversion to and from strings.153* The constant value of this field is the largest value permitted154* for the radix argument in radix-conversion methods such as the155* {@code digit} method, the {@code forDigit} method, and the156* {@code toString} method of class {@code Integer}.157*158* @see Character#digit(char, int)159* @see Character#forDigit(int, int)160* @see Integer#toString(int, int)161* @see Integer#valueOf(String)162*/163public static final int MAX_RADIX = 36;164165/**166* The constant value of this field is the smallest value of type167* {@code char}, {@code '\u005Cu0000'}.168*169* @since 1.0.2170*/171public static final char MIN_VALUE = '\u0000';172173/**174* The constant value of this field is the largest value of type175* {@code char}, {@code '\u005CuFFFF'}.176*177* @since 1.0.2178*/179public static final char MAX_VALUE = '\uFFFF';180181/**182* The {@code Class} instance representing the primitive type183* {@code char}.184*185* @since 1.1186*/187@SuppressWarnings("unchecked")188public static final Class<Character> TYPE = (Class<Character>) Class.getPrimitiveClass("char");189190/*191* Normative general types192*/193194/*195* General character types196*/197198/**199* General category "Cn" in the Unicode specification.200* @since 1.1201*/202public static final byte UNASSIGNED = 0;203204/**205* General category "Lu" in the Unicode specification.206* @since 1.1207*/208public static final byte UPPERCASE_LETTER = 1;209210/**211* General category "Ll" in the Unicode specification.212* @since 1.1213*/214public static final byte LOWERCASE_LETTER = 2;215216/**217* General category "Lt" in the Unicode specification.218* @since 1.1219*/220public static final byte TITLECASE_LETTER = 3;221222/**223* General category "Lm" in the Unicode specification.224* @since 1.1225*/226public static final byte MODIFIER_LETTER = 4;227228/**229* General category "Lo" in the Unicode specification.230* @since 1.1231*/232public static final byte OTHER_LETTER = 5;233234/**235* General category "Mn" in the Unicode specification.236* @since 1.1237*/238public static final byte NON_SPACING_MARK = 6;239240/**241* General category "Me" in the Unicode specification.242* @since 1.1243*/244public static final byte ENCLOSING_MARK = 7;245246/**247* General category "Mc" in the Unicode specification.248* @since 1.1249*/250public static final byte COMBINING_SPACING_MARK = 8;251252/**253* General category "Nd" in the Unicode specification.254* @since 1.1255*/256public static final byte DECIMAL_DIGIT_NUMBER = 9;257258/**259* General category "Nl" in the Unicode specification.260* @since 1.1261*/262public static final byte LETTER_NUMBER = 10;263264/**265* General category "No" in the Unicode specification.266* @since 1.1267*/268public static final byte OTHER_NUMBER = 11;269270/**271* General category "Zs" in the Unicode specification.272* @since 1.1273*/274public static final byte SPACE_SEPARATOR = 12;275276/**277* General category "Zl" in the Unicode specification.278* @since 1.1279*/280public static final byte LINE_SEPARATOR = 13;281282/**283* General category "Zp" in the Unicode specification.284* @since 1.1285*/286public static final byte PARAGRAPH_SEPARATOR = 14;287288/**289* General category "Cc" in the Unicode specification.290* @since 1.1291*/292public static final byte CONTROL = 15;293294/**295* General category "Cf" in the Unicode specification.296* @since 1.1297*/298public static final byte FORMAT = 16;299300/**301* General category "Co" in the Unicode specification.302* @since 1.1303*/304public static final byte PRIVATE_USE = 18;305306/**307* General category "Cs" in the Unicode specification.308* @since 1.1309*/310public static final byte SURROGATE = 19;311312/**313* General category "Pd" in the Unicode specification.314* @since 1.1315*/316public static final byte DASH_PUNCTUATION = 20;317318/**319* General category "Ps" in the Unicode specification.320* @since 1.1321*/322public static final byte START_PUNCTUATION = 21;323324/**325* General category "Pe" in the Unicode specification.326* @since 1.1327*/328public static final byte END_PUNCTUATION = 22;329330/**331* General category "Pc" in the Unicode specification.332* @since 1.1333*/334public static final byte CONNECTOR_PUNCTUATION = 23;335336/**337* General category "Po" in the Unicode specification.338* @since 1.1339*/340public static final byte OTHER_PUNCTUATION = 24;341342/**343* General category "Sm" in the Unicode specification.344* @since 1.1345*/346public static final byte MATH_SYMBOL = 25;347348/**349* General category "Sc" in the Unicode specification.350* @since 1.1351*/352public static final byte CURRENCY_SYMBOL = 26;353354/**355* General category "Sk" in the Unicode specification.356* @since 1.1357*/358public static final byte MODIFIER_SYMBOL = 27;359360/**361* General category "So" in the Unicode specification.362* @since 1.1363*/364public static final byte OTHER_SYMBOL = 28;365366/**367* General category "Pi" in the Unicode specification.368* @since 1.4369*/370public static final byte INITIAL_QUOTE_PUNCTUATION = 29;371372/**373* General category "Pf" in the Unicode specification.374* @since 1.4375*/376public static final byte FINAL_QUOTE_PUNCTUATION = 30;377378/**379* Error flag. Use int (code point) to avoid confusion with U+FFFF.380*/381static final int ERROR = 0xFFFFFFFF;382383384/**385* Undefined bidirectional character type. Undefined {@code char}386* values have undefined directionality in the Unicode specification.387* @since 1.4388*/389public static final byte DIRECTIONALITY_UNDEFINED = -1;390391/**392* Strong bidirectional character type "L" in the Unicode specification.393* @since 1.4394*/395public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0;396397/**398* Strong bidirectional character type "R" in the Unicode specification.399* @since 1.4400*/401public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1;402403/**404* Strong bidirectional character type "AL" in the Unicode specification.405* @since 1.4406*/407public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2;408409/**410* Weak bidirectional character type "EN" in the Unicode specification.411* @since 1.4412*/413public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3;414415/**416* Weak bidirectional character type "ES" in the Unicode specification.417* @since 1.4418*/419public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4;420421/**422* Weak bidirectional character type "ET" in the Unicode specification.423* @since 1.4424*/425public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5;426427/**428* Weak bidirectional character type "AN" in the Unicode specification.429* @since 1.4430*/431public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6;432433/**434* Weak bidirectional character type "CS" in the Unicode specification.435* @since 1.4436*/437public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7;438439/**440* Weak bidirectional character type "NSM" in the Unicode specification.441* @since 1.4442*/443public static final byte DIRECTIONALITY_NONSPACING_MARK = 8;444445/**446* Weak bidirectional character type "BN" in the Unicode specification.447* @since 1.4448*/449public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9;450451/**452* Neutral bidirectional character type "B" in the Unicode specification.453* @since 1.4454*/455public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10;456457/**458* Neutral bidirectional character type "S" in the Unicode specification.459* @since 1.4460*/461public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11;462463/**464* Neutral bidirectional character type "WS" in the Unicode specification.465* @since 1.4466*/467public static final byte DIRECTIONALITY_WHITESPACE = 12;468469/**470* Neutral bidirectional character type "ON" in the Unicode specification.471* @since 1.4472*/473public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13;474475/**476* Strong bidirectional character type "LRE" in the Unicode specification.477* @since 1.4478*/479public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14;480481/**482* Strong bidirectional character type "LRO" in the Unicode specification.483* @since 1.4484*/485public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15;486487/**488* Strong bidirectional character type "RLE" in the Unicode specification.489* @since 1.4490*/491public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16;492493/**494* Strong bidirectional character type "RLO" in the Unicode specification.495* @since 1.4496*/497public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17;498499/**500* Weak bidirectional character type "PDF" in the Unicode specification.501* @since 1.4502*/503public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18;504505/**506* The minimum value of a507* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">508* Unicode high-surrogate code unit</a>509* in the UTF-16 encoding, constant {@code '\u005CuD800'}.510* A high-surrogate is also known as a <i>leading-surrogate</i>.511*512* @since 1.5513*/514public static final char MIN_HIGH_SURROGATE = '\uD800';515516/**517* The maximum value of a518* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">519* Unicode high-surrogate code unit</a>520* in the UTF-16 encoding, constant {@code '\u005CuDBFF'}.521* A high-surrogate is also known as a <i>leading-surrogate</i>.522*523* @since 1.5524*/525public static final char MAX_HIGH_SURROGATE = '\uDBFF';526527/**528* The minimum value of a529* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">530* Unicode low-surrogate code unit</a>531* in the UTF-16 encoding, constant {@code '\u005CuDC00'}.532* A low-surrogate is also known as a <i>trailing-surrogate</i>.533*534* @since 1.5535*/536public static final char MIN_LOW_SURROGATE = '\uDC00';537538/**539* The maximum value of a540* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">541* Unicode low-surrogate code unit</a>542* in the UTF-16 encoding, constant {@code '\u005CuDFFF'}.543* A low-surrogate is also known as a <i>trailing-surrogate</i>.544*545* @since 1.5546*/547public static final char MAX_LOW_SURROGATE = '\uDFFF';548549/**550* The minimum value of a Unicode surrogate code unit in the551* UTF-16 encoding, constant {@code '\u005CuD800'}.552*553* @since 1.5554*/555public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;556557/**558* The maximum value of a Unicode surrogate code unit in the559* UTF-16 encoding, constant {@code '\u005CuDFFF'}.560*561* @since 1.5562*/563public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;564565/**566* The minimum value of a567* <a href="http://www.unicode.org/glossary/#supplementary_code_point">568* Unicode supplementary code point</a>, constant {@code U+10000}.569*570* @since 1.5571*/572public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;573574/**575* The minimum value of a576* <a href="http://www.unicode.org/glossary/#code_point">577* Unicode code point</a>, constant {@code U+0000}.578*579* @since 1.5580*/581public static final int MIN_CODE_POINT = 0x000000;582583/**584* The maximum value of a585* <a href="http://www.unicode.org/glossary/#code_point">586* Unicode code point</a>, constant {@code U+10FFFF}.587*588* @since 1.5589*/590public static final int MAX_CODE_POINT = 0X10FFFF;591592593/**594* Instances of this class represent particular subsets of the Unicode595* character set. The only family of subsets defined in the596* {@code Character} class is {@link Character.UnicodeBlock}.597* Other portions of the Java API may define other subsets for their598* own purposes.599*600* @since 1.2601*/602public static class Subset {603604private String name;605606/**607* Constructs a new {@code Subset} instance.608*609* @param name The name of this subset610* @exception NullPointerException if name is {@code null}611*/612protected Subset(String name) {613if (name == null) {614throw new NullPointerException("name");615}616this.name = name;617}618619/**620* Compares two {@code Subset} objects for equality.621* This method returns {@code true} if and only if622* {@code this} and the argument refer to the same623* object; since this method is {@code final}, this624* guarantee holds for all subclasses.625*/626public final boolean equals(Object obj) {627return (this == obj);628}629630/**631* Returns the standard hash code as defined by the632* {@link Object#hashCode} method. This method633* is {@code final} in order to ensure that the634* {@code equals} and {@code hashCode} methods will635* be consistent in all subclasses.636*/637public final int hashCode() {638return super.hashCode();639}640641/**642* Returns the name of this subset.643*/644public final String toString() {645return name;646}647}648649// See http://www.unicode.org/Public/UNIDATA/Blocks.txt650// for the latest specification of Unicode Blocks.651652/**653* A family of character subsets representing the character blocks in the654* Unicode specification. Character blocks generally define characters655* used for a specific script or purpose. A character is contained by656* at most one Unicode block.657*658* @since 1.2659*/660public static final class UnicodeBlock extends Subset {661662private static Map<String, UnicodeBlock> map = new HashMap<>(256);663664/**665* Creates a UnicodeBlock with the given identifier name.666* This name must be the same as the block identifier.667*/668private UnicodeBlock(String idName) {669super(idName);670map.put(idName, this);671}672673/**674* Creates a UnicodeBlock with the given identifier name and675* alias name.676*/677private UnicodeBlock(String idName, String alias) {678this(idName);679map.put(alias, this);680}681682/**683* Creates a UnicodeBlock with the given identifier name and684* alias names.685*/686private UnicodeBlock(String idName, String... aliases) {687this(idName);688for (String alias : aliases)689map.put(alias, this);690}691692/**693* Constant for the "Basic Latin" Unicode character block.694* @since 1.2695*/696public static final UnicodeBlock BASIC_LATIN =697new UnicodeBlock("BASIC_LATIN",698"BASIC LATIN",699"BASICLATIN");700701/**702* Constant for the "Latin-1 Supplement" Unicode character block.703* @since 1.2704*/705public static final UnicodeBlock LATIN_1_SUPPLEMENT =706new UnicodeBlock("LATIN_1_SUPPLEMENT",707"LATIN-1 SUPPLEMENT",708"LATIN-1SUPPLEMENT");709710/**711* Constant for the "Latin Extended-A" Unicode character block.712* @since 1.2713*/714public static final UnicodeBlock LATIN_EXTENDED_A =715new UnicodeBlock("LATIN_EXTENDED_A",716"LATIN EXTENDED-A",717"LATINEXTENDED-A");718719/**720* Constant for the "Latin Extended-B" Unicode character block.721* @since 1.2722*/723public static final UnicodeBlock LATIN_EXTENDED_B =724new UnicodeBlock("LATIN_EXTENDED_B",725"LATIN EXTENDED-B",726"LATINEXTENDED-B");727728/**729* Constant for the "IPA Extensions" Unicode character block.730* @since 1.2731*/732public static final UnicodeBlock IPA_EXTENSIONS =733new UnicodeBlock("IPA_EXTENSIONS",734"IPA EXTENSIONS",735"IPAEXTENSIONS");736737/**738* Constant for the "Spacing Modifier Letters" Unicode character block.739* @since 1.2740*/741public static final UnicodeBlock SPACING_MODIFIER_LETTERS =742new UnicodeBlock("SPACING_MODIFIER_LETTERS",743"SPACING MODIFIER LETTERS",744"SPACINGMODIFIERLETTERS");745746/**747* Constant for the "Combining Diacritical Marks" Unicode character block.748* @since 1.2749*/750public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS =751new UnicodeBlock("COMBINING_DIACRITICAL_MARKS",752"COMBINING DIACRITICAL MARKS",753"COMBININGDIACRITICALMARKS");754755/**756* Constant for the "Greek and Coptic" Unicode character block.757* <p>758* This block was previously known as the "Greek" block.759*760* @since 1.2761*/762public static final UnicodeBlock GREEK =763new UnicodeBlock("GREEK",764"GREEK AND COPTIC",765"GREEKANDCOPTIC");766767/**768* Constant for the "Cyrillic" Unicode character block.769* @since 1.2770*/771public static final UnicodeBlock CYRILLIC =772new UnicodeBlock("CYRILLIC");773774/**775* Constant for the "Armenian" Unicode character block.776* @since 1.2777*/778public static final UnicodeBlock ARMENIAN =779new UnicodeBlock("ARMENIAN");780781/**782* Constant for the "Hebrew" Unicode character block.783* @since 1.2784*/785public static final UnicodeBlock HEBREW =786new UnicodeBlock("HEBREW");787788/**789* Constant for the "Arabic" Unicode character block.790* @since 1.2791*/792public static final UnicodeBlock ARABIC =793new UnicodeBlock("ARABIC");794795/**796* Constant for the "Devanagari" Unicode character block.797* @since 1.2798*/799public static final UnicodeBlock DEVANAGARI =800new UnicodeBlock("DEVANAGARI");801802/**803* Constant for the "Bengali" Unicode character block.804* @since 1.2805*/806public static final UnicodeBlock BENGALI =807new UnicodeBlock("BENGALI");808809/**810* Constant for the "Gurmukhi" Unicode character block.811* @since 1.2812*/813public static final UnicodeBlock GURMUKHI =814new UnicodeBlock("GURMUKHI");815816/**817* Constant for the "Gujarati" Unicode character block.818* @since 1.2819*/820public static final UnicodeBlock GUJARATI =821new UnicodeBlock("GUJARATI");822823/**824* Constant for the "Oriya" Unicode character block.825* @since 1.2826*/827public static final UnicodeBlock ORIYA =828new UnicodeBlock("ORIYA");829830/**831* Constant for the "Tamil" Unicode character block.832* @since 1.2833*/834public static final UnicodeBlock TAMIL =835new UnicodeBlock("TAMIL");836837/**838* Constant for the "Telugu" Unicode character block.839* @since 1.2840*/841public static final UnicodeBlock TELUGU =842new UnicodeBlock("TELUGU");843844/**845* Constant for the "Kannada" Unicode character block.846* @since 1.2847*/848public static final UnicodeBlock KANNADA =849new UnicodeBlock("KANNADA");850851/**852* Constant for the "Malayalam" Unicode character block.853* @since 1.2854*/855public static final UnicodeBlock MALAYALAM =856new UnicodeBlock("MALAYALAM");857858/**859* Constant for the "Thai" Unicode character block.860* @since 1.2861*/862public static final UnicodeBlock THAI =863new UnicodeBlock("THAI");864865/**866* Constant for the "Lao" Unicode character block.867* @since 1.2868*/869public static final UnicodeBlock LAO =870new UnicodeBlock("LAO");871872/**873* Constant for the "Tibetan" Unicode character block.874* @since 1.2875*/876public static final UnicodeBlock TIBETAN =877new UnicodeBlock("TIBETAN");878879/**880* Constant for the "Georgian" Unicode character block.881* @since 1.2882*/883public static final UnicodeBlock GEORGIAN =884new UnicodeBlock("GEORGIAN");885886/**887* Constant for the "Hangul Jamo" Unicode character block.888* @since 1.2889*/890public static final UnicodeBlock HANGUL_JAMO =891new UnicodeBlock("HANGUL_JAMO",892"HANGUL JAMO",893"HANGULJAMO");894895/**896* Constant for the "Latin Extended Additional" Unicode character block.897* @since 1.2898*/899public static final UnicodeBlock LATIN_EXTENDED_ADDITIONAL =900new UnicodeBlock("LATIN_EXTENDED_ADDITIONAL",901"LATIN EXTENDED ADDITIONAL",902"LATINEXTENDEDADDITIONAL");903904/**905* Constant for the "Greek Extended" Unicode character block.906* @since 1.2907*/908public static final UnicodeBlock GREEK_EXTENDED =909new UnicodeBlock("GREEK_EXTENDED",910"GREEK EXTENDED",911"GREEKEXTENDED");912913/**914* Constant for the "General Punctuation" Unicode character block.915* @since 1.2916*/917public static final UnicodeBlock GENERAL_PUNCTUATION =918new UnicodeBlock("GENERAL_PUNCTUATION",919"GENERAL PUNCTUATION",920"GENERALPUNCTUATION");921922/**923* Constant for the "Superscripts and Subscripts" Unicode character924* block.925* @since 1.2926*/927public static final UnicodeBlock SUPERSCRIPTS_AND_SUBSCRIPTS =928new UnicodeBlock("SUPERSCRIPTS_AND_SUBSCRIPTS",929"SUPERSCRIPTS AND SUBSCRIPTS",930"SUPERSCRIPTSANDSUBSCRIPTS");931932/**933* Constant for the "Currency Symbols" Unicode character block.934* @since 1.2935*/936public static final UnicodeBlock CURRENCY_SYMBOLS =937new UnicodeBlock("CURRENCY_SYMBOLS",938"CURRENCY SYMBOLS",939"CURRENCYSYMBOLS");940941/**942* Constant for the "Combining Diacritical Marks for Symbols" Unicode943* character block.944* <p>945* This block was previously known as "Combining Marks for Symbols".946* @since 1.2947*/948public static final UnicodeBlock COMBINING_MARKS_FOR_SYMBOLS =949new UnicodeBlock("COMBINING_MARKS_FOR_SYMBOLS",950"COMBINING DIACRITICAL MARKS FOR SYMBOLS",951"COMBININGDIACRITICALMARKSFORSYMBOLS",952"COMBINING MARKS FOR SYMBOLS",953"COMBININGMARKSFORSYMBOLS");954955/**956* Constant for the "Letterlike Symbols" Unicode character block.957* @since 1.2958*/959public static final UnicodeBlock LETTERLIKE_SYMBOLS =960new UnicodeBlock("LETTERLIKE_SYMBOLS",961"LETTERLIKE SYMBOLS",962"LETTERLIKESYMBOLS");963964/**965* Constant for the "Number Forms" Unicode character block.966* @since 1.2967*/968public static final UnicodeBlock NUMBER_FORMS =969new UnicodeBlock("NUMBER_FORMS",970"NUMBER FORMS",971"NUMBERFORMS");972973/**974* Constant for the "Arrows" Unicode character block.975* @since 1.2976*/977public static final UnicodeBlock ARROWS =978new UnicodeBlock("ARROWS");979980/**981* Constant for the "Mathematical Operators" Unicode character block.982* @since 1.2983*/984public static final UnicodeBlock MATHEMATICAL_OPERATORS =985new UnicodeBlock("MATHEMATICAL_OPERATORS",986"MATHEMATICAL OPERATORS",987"MATHEMATICALOPERATORS");988989/**990* Constant for the "Miscellaneous Technical" Unicode character block.991* @since 1.2992*/993public static final UnicodeBlock MISCELLANEOUS_TECHNICAL =994new UnicodeBlock("MISCELLANEOUS_TECHNICAL",995"MISCELLANEOUS TECHNICAL",996"MISCELLANEOUSTECHNICAL");997998/**999* Constant for the "Control Pictures" Unicode character block.1000* @since 1.21001*/1002public static final UnicodeBlock CONTROL_PICTURES =1003new UnicodeBlock("CONTROL_PICTURES",1004"CONTROL PICTURES",1005"CONTROLPICTURES");10061007/**1008* Constant for the "Optical Character Recognition" Unicode character block.1009* @since 1.21010*/1011public static final UnicodeBlock OPTICAL_CHARACTER_RECOGNITION =1012new UnicodeBlock("OPTICAL_CHARACTER_RECOGNITION",1013"OPTICAL CHARACTER RECOGNITION",1014"OPTICALCHARACTERRECOGNITION");10151016/**1017* Constant for the "Enclosed Alphanumerics" Unicode character block.1018* @since 1.21019*/1020public static final UnicodeBlock ENCLOSED_ALPHANUMERICS =1021new UnicodeBlock("ENCLOSED_ALPHANUMERICS",1022"ENCLOSED ALPHANUMERICS",1023"ENCLOSEDALPHANUMERICS");10241025/**1026* Constant for the "Box Drawing" Unicode character block.1027* @since 1.21028*/1029public static final UnicodeBlock BOX_DRAWING =1030new UnicodeBlock("BOX_DRAWING",1031"BOX DRAWING",1032"BOXDRAWING");10331034/**1035* Constant for the "Block Elements" Unicode character block.1036* @since 1.21037*/1038public static final UnicodeBlock BLOCK_ELEMENTS =1039new UnicodeBlock("BLOCK_ELEMENTS",1040"BLOCK ELEMENTS",1041"BLOCKELEMENTS");10421043/**1044* Constant for the "Geometric Shapes" Unicode character block.1045* @since 1.21046*/1047public static final UnicodeBlock GEOMETRIC_SHAPES =1048new UnicodeBlock("GEOMETRIC_SHAPES",1049"GEOMETRIC SHAPES",1050"GEOMETRICSHAPES");10511052/**1053* Constant for the "Miscellaneous Symbols" Unicode character block.1054* @since 1.21055*/1056public static final UnicodeBlock MISCELLANEOUS_SYMBOLS =1057new UnicodeBlock("MISCELLANEOUS_SYMBOLS",1058"MISCELLANEOUS SYMBOLS",1059"MISCELLANEOUSSYMBOLS");10601061/**1062* Constant for the "Dingbats" Unicode character block.1063* @since 1.21064*/1065public static final UnicodeBlock DINGBATS =1066new UnicodeBlock("DINGBATS");10671068/**1069* Constant for the "CJK Symbols and Punctuation" Unicode character block.1070* @since 1.21071*/1072public static final UnicodeBlock CJK_SYMBOLS_AND_PUNCTUATION =1073new UnicodeBlock("CJK_SYMBOLS_AND_PUNCTUATION",1074"CJK SYMBOLS AND PUNCTUATION",1075"CJKSYMBOLSANDPUNCTUATION");10761077/**1078* Constant for the "Hiragana" Unicode character block.1079* @since 1.21080*/1081public static final UnicodeBlock HIRAGANA =1082new UnicodeBlock("HIRAGANA");10831084/**1085* Constant for the "Katakana" Unicode character block.1086* @since 1.21087*/1088public static final UnicodeBlock KATAKANA =1089new UnicodeBlock("KATAKANA");10901091/**1092* Constant for the "Bopomofo" Unicode character block.1093* @since 1.21094*/1095public static final UnicodeBlock BOPOMOFO =1096new UnicodeBlock("BOPOMOFO");10971098/**1099* Constant for the "Hangul Compatibility Jamo" Unicode character block.1100* @since 1.21101*/1102public static final UnicodeBlock HANGUL_COMPATIBILITY_JAMO =1103new UnicodeBlock("HANGUL_COMPATIBILITY_JAMO",1104"HANGUL COMPATIBILITY JAMO",1105"HANGULCOMPATIBILITYJAMO");11061107/**1108* Constant for the "Kanbun" Unicode character block.1109* @since 1.21110*/1111public static final UnicodeBlock KANBUN =1112new UnicodeBlock("KANBUN");11131114/**1115* Constant for the "Enclosed CJK Letters and Months" Unicode character block.1116* @since 1.21117*/1118public static final UnicodeBlock ENCLOSED_CJK_LETTERS_AND_MONTHS =1119new UnicodeBlock("ENCLOSED_CJK_LETTERS_AND_MONTHS",1120"ENCLOSED CJK LETTERS AND MONTHS",1121"ENCLOSEDCJKLETTERSANDMONTHS");11221123/**1124* Constant for the "CJK Compatibility" Unicode character block.1125* @since 1.21126*/1127public static final UnicodeBlock CJK_COMPATIBILITY =1128new UnicodeBlock("CJK_COMPATIBILITY",1129"CJK COMPATIBILITY",1130"CJKCOMPATIBILITY");11311132/**1133* Constant for the "CJK Unified Ideographs" Unicode character block.1134* @since 1.21135*/1136public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS =1137new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS",1138"CJK UNIFIED IDEOGRAPHS",1139"CJKUNIFIEDIDEOGRAPHS");11401141/**1142* Constant for the "Hangul Syllables" Unicode character block.1143* @since 1.21144*/1145public static final UnicodeBlock HANGUL_SYLLABLES =1146new UnicodeBlock("HANGUL_SYLLABLES",1147"HANGUL SYLLABLES",1148"HANGULSYLLABLES");11491150/**1151* Constant for the "Private Use Area" Unicode character block.1152* @since 1.21153*/1154public static final UnicodeBlock PRIVATE_USE_AREA =1155new UnicodeBlock("PRIVATE_USE_AREA",1156"PRIVATE USE AREA",1157"PRIVATEUSEAREA");11581159/**1160* Constant for the "CJK Compatibility Ideographs" Unicode character1161* block.1162* @since 1.21163*/1164public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS =1165new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS",1166"CJK COMPATIBILITY IDEOGRAPHS",1167"CJKCOMPATIBILITYIDEOGRAPHS");11681169/**1170* Constant for the "Alphabetic Presentation Forms" Unicode character block.1171* @since 1.21172*/1173public static final UnicodeBlock ALPHABETIC_PRESENTATION_FORMS =1174new UnicodeBlock("ALPHABETIC_PRESENTATION_FORMS",1175"ALPHABETIC PRESENTATION FORMS",1176"ALPHABETICPRESENTATIONFORMS");11771178/**1179* Constant for the "Arabic Presentation Forms-A" Unicode character1180* block.1181* @since 1.21182*/1183public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_A =1184new UnicodeBlock("ARABIC_PRESENTATION_FORMS_A",1185"ARABIC PRESENTATION FORMS-A",1186"ARABICPRESENTATIONFORMS-A");11871188/**1189* Constant for the "Combining Half Marks" Unicode character block.1190* @since 1.21191*/1192public static final UnicodeBlock COMBINING_HALF_MARKS =1193new UnicodeBlock("COMBINING_HALF_MARKS",1194"COMBINING HALF MARKS",1195"COMBININGHALFMARKS");11961197/**1198* Constant for the "CJK Compatibility Forms" Unicode character block.1199* @since 1.21200*/1201public static final UnicodeBlock CJK_COMPATIBILITY_FORMS =1202new UnicodeBlock("CJK_COMPATIBILITY_FORMS",1203"CJK COMPATIBILITY FORMS",1204"CJKCOMPATIBILITYFORMS");12051206/**1207* Constant for the "Small Form Variants" Unicode character block.1208* @since 1.21209*/1210public static final UnicodeBlock SMALL_FORM_VARIANTS =1211new UnicodeBlock("SMALL_FORM_VARIANTS",1212"SMALL FORM VARIANTS",1213"SMALLFORMVARIANTS");12141215/**1216* Constant for the "Arabic Presentation Forms-B" Unicode character block.1217* @since 1.21218*/1219public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_B =1220new UnicodeBlock("ARABIC_PRESENTATION_FORMS_B",1221"ARABIC PRESENTATION FORMS-B",1222"ARABICPRESENTATIONFORMS-B");12231224/**1225* Constant for the "Halfwidth and Fullwidth Forms" Unicode character1226* block.1227* @since 1.21228*/1229public static final UnicodeBlock HALFWIDTH_AND_FULLWIDTH_FORMS =1230new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS",1231"HALFWIDTH AND FULLWIDTH FORMS",1232"HALFWIDTHANDFULLWIDTHFORMS");12331234/**1235* Constant for the "Specials" Unicode character block.1236* @since 1.21237*/1238public static final UnicodeBlock SPECIALS =1239new UnicodeBlock("SPECIALS");12401241/**1242* @deprecated As of J2SE 5, use {@link #HIGH_SURROGATES},1243* {@link #HIGH_PRIVATE_USE_SURROGATES}, and1244* {@link #LOW_SURROGATES}. These new constants match1245* the block definitions of the Unicode Standard.1246* The {@link #of(char)} and {@link #of(int)} methods1247* return the new constants, not SURROGATES_AREA.1248*/1249@Deprecated1250public static final UnicodeBlock SURROGATES_AREA =1251new UnicodeBlock("SURROGATES_AREA");12521253/**1254* Constant for the "Syriac" Unicode character block.1255* @since 1.41256*/1257public static final UnicodeBlock SYRIAC =1258new UnicodeBlock("SYRIAC");12591260/**1261* Constant for the "Thaana" Unicode character block.1262* @since 1.41263*/1264public static final UnicodeBlock THAANA =1265new UnicodeBlock("THAANA");12661267/**1268* Constant for the "Sinhala" Unicode character block.1269* @since 1.41270*/1271public static final UnicodeBlock SINHALA =1272new UnicodeBlock("SINHALA");12731274/**1275* Constant for the "Myanmar" Unicode character block.1276* @since 1.41277*/1278public static final UnicodeBlock MYANMAR =1279new UnicodeBlock("MYANMAR");12801281/**1282* Constant for the "Ethiopic" Unicode character block.1283* @since 1.41284*/1285public static final UnicodeBlock ETHIOPIC =1286new UnicodeBlock("ETHIOPIC");12871288/**1289* Constant for the "Cherokee" Unicode character block.1290* @since 1.41291*/1292public static final UnicodeBlock CHEROKEE =1293new UnicodeBlock("CHEROKEE");12941295/**1296* Constant for the "Unified Canadian Aboriginal Syllabics" Unicode character block.1297* @since 1.41298*/1299public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS =1300new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS",1301"UNIFIED CANADIAN ABORIGINAL SYLLABICS",1302"UNIFIEDCANADIANABORIGINALSYLLABICS");13031304/**1305* Constant for the "Ogham" Unicode character block.1306* @since 1.41307*/1308public static final UnicodeBlock OGHAM =1309new UnicodeBlock("OGHAM");13101311/**1312* Constant for the "Runic" Unicode character block.1313* @since 1.41314*/1315public static final UnicodeBlock RUNIC =1316new UnicodeBlock("RUNIC");13171318/**1319* Constant for the "Khmer" Unicode character block.1320* @since 1.41321*/1322public static final UnicodeBlock KHMER =1323new UnicodeBlock("KHMER");13241325/**1326* Constant for the "Mongolian" Unicode character block.1327* @since 1.41328*/1329public static final UnicodeBlock MONGOLIAN =1330new UnicodeBlock("MONGOLIAN");13311332/**1333* Constant for the "Braille Patterns" Unicode character block.1334* @since 1.41335*/1336public static final UnicodeBlock BRAILLE_PATTERNS =1337new UnicodeBlock("BRAILLE_PATTERNS",1338"BRAILLE PATTERNS",1339"BRAILLEPATTERNS");13401341/**1342* Constant for the "CJK Radicals Supplement" Unicode character block.1343* @since 1.41344*/1345public static final UnicodeBlock CJK_RADICALS_SUPPLEMENT =1346new UnicodeBlock("CJK_RADICALS_SUPPLEMENT",1347"CJK RADICALS SUPPLEMENT",1348"CJKRADICALSSUPPLEMENT");13491350/**1351* Constant for the "Kangxi Radicals" Unicode character block.1352* @since 1.41353*/1354public static final UnicodeBlock KANGXI_RADICALS =1355new UnicodeBlock("KANGXI_RADICALS",1356"KANGXI RADICALS",1357"KANGXIRADICALS");13581359/**1360* Constant for the "Ideographic Description Characters" Unicode character block.1361* @since 1.41362*/1363public static final UnicodeBlock IDEOGRAPHIC_DESCRIPTION_CHARACTERS =1364new UnicodeBlock("IDEOGRAPHIC_DESCRIPTION_CHARACTERS",1365"IDEOGRAPHIC DESCRIPTION CHARACTERS",1366"IDEOGRAPHICDESCRIPTIONCHARACTERS");13671368/**1369* Constant for the "Bopomofo Extended" Unicode character block.1370* @since 1.41371*/1372public static final UnicodeBlock BOPOMOFO_EXTENDED =1373new UnicodeBlock("BOPOMOFO_EXTENDED",1374"BOPOMOFO EXTENDED",1375"BOPOMOFOEXTENDED");13761377/**1378* Constant for the "CJK Unified Ideographs Extension A" Unicode character block.1379* @since 1.41380*/1381public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A =1382new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A",1383"CJK UNIFIED IDEOGRAPHS EXTENSION A",1384"CJKUNIFIEDIDEOGRAPHSEXTENSIONA");13851386/**1387* Constant for the "Yi Syllables" Unicode character block.1388* @since 1.41389*/1390public static final UnicodeBlock YI_SYLLABLES =1391new UnicodeBlock("YI_SYLLABLES",1392"YI SYLLABLES",1393"YISYLLABLES");13941395/**1396* Constant for the "Yi Radicals" Unicode character block.1397* @since 1.41398*/1399public static final UnicodeBlock YI_RADICALS =1400new UnicodeBlock("YI_RADICALS",1401"YI RADICALS",1402"YIRADICALS");14031404/**1405* Constant for the "Cyrillic Supplementary" Unicode character block.1406* @since 1.51407*/1408public static final UnicodeBlock CYRILLIC_SUPPLEMENTARY =1409new UnicodeBlock("CYRILLIC_SUPPLEMENTARY",1410"CYRILLIC SUPPLEMENTARY",1411"CYRILLICSUPPLEMENTARY",1412"CYRILLIC SUPPLEMENT",1413"CYRILLICSUPPLEMENT");14141415/**1416* Constant for the "Tagalog" Unicode character block.1417* @since 1.51418*/1419public static final UnicodeBlock TAGALOG =1420new UnicodeBlock("TAGALOG");14211422/**1423* Constant for the "Hanunoo" Unicode character block.1424* @since 1.51425*/1426public static final UnicodeBlock HANUNOO =1427new UnicodeBlock("HANUNOO");14281429/**1430* Constant for the "Buhid" Unicode character block.1431* @since 1.51432*/1433public static final UnicodeBlock BUHID =1434new UnicodeBlock("BUHID");14351436/**1437* Constant for the "Tagbanwa" Unicode character block.1438* @since 1.51439*/1440public static final UnicodeBlock TAGBANWA =1441new UnicodeBlock("TAGBANWA");14421443/**1444* Constant for the "Limbu" Unicode character block.1445* @since 1.51446*/1447public static final UnicodeBlock LIMBU =1448new UnicodeBlock("LIMBU");14491450/**1451* Constant for the "Tai Le" Unicode character block.1452* @since 1.51453*/1454public static final UnicodeBlock TAI_LE =1455new UnicodeBlock("TAI_LE",1456"TAI LE",1457"TAILE");14581459/**1460* Constant for the "Khmer Symbols" Unicode character block.1461* @since 1.51462*/1463public static final UnicodeBlock KHMER_SYMBOLS =1464new UnicodeBlock("KHMER_SYMBOLS",1465"KHMER SYMBOLS",1466"KHMERSYMBOLS");14671468/**1469* Constant for the "Phonetic Extensions" Unicode character block.1470* @since 1.51471*/1472public static final UnicodeBlock PHONETIC_EXTENSIONS =1473new UnicodeBlock("PHONETIC_EXTENSIONS",1474"PHONETIC EXTENSIONS",1475"PHONETICEXTENSIONS");14761477/**1478* Constant for the "Miscellaneous Mathematical Symbols-A" Unicode character block.1479* @since 1.51480*/1481public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A =1482new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A",1483"MISCELLANEOUS MATHEMATICAL SYMBOLS-A",1484"MISCELLANEOUSMATHEMATICALSYMBOLS-A");14851486/**1487* Constant for the "Supplemental Arrows-A" Unicode character block.1488* @since 1.51489*/1490public static final UnicodeBlock SUPPLEMENTAL_ARROWS_A =1491new UnicodeBlock("SUPPLEMENTAL_ARROWS_A",1492"SUPPLEMENTAL ARROWS-A",1493"SUPPLEMENTALARROWS-A");14941495/**1496* Constant for the "Supplemental Arrows-B" Unicode character block.1497* @since 1.51498*/1499public static final UnicodeBlock SUPPLEMENTAL_ARROWS_B =1500new UnicodeBlock("SUPPLEMENTAL_ARROWS_B",1501"SUPPLEMENTAL ARROWS-B",1502"SUPPLEMENTALARROWS-B");15031504/**1505* Constant for the "Miscellaneous Mathematical Symbols-B" Unicode1506* character block.1507* @since 1.51508*/1509public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B =1510new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B",1511"MISCELLANEOUS MATHEMATICAL SYMBOLS-B",1512"MISCELLANEOUSMATHEMATICALSYMBOLS-B");15131514/**1515* Constant for the "Supplemental Mathematical Operators" Unicode1516* character block.1517* @since 1.51518*/1519public static final UnicodeBlock SUPPLEMENTAL_MATHEMATICAL_OPERATORS =1520new UnicodeBlock("SUPPLEMENTAL_MATHEMATICAL_OPERATORS",1521"SUPPLEMENTAL MATHEMATICAL OPERATORS",1522"SUPPLEMENTALMATHEMATICALOPERATORS");15231524/**1525* Constant for the "Miscellaneous Symbols and Arrows" Unicode character1526* block.1527* @since 1.51528*/1529public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_ARROWS =1530new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_ARROWS",1531"MISCELLANEOUS SYMBOLS AND ARROWS",1532"MISCELLANEOUSSYMBOLSANDARROWS");15331534/**1535* Constant for the "Katakana Phonetic Extensions" Unicode character1536* block.1537* @since 1.51538*/1539public static final UnicodeBlock KATAKANA_PHONETIC_EXTENSIONS =1540new UnicodeBlock("KATAKANA_PHONETIC_EXTENSIONS",1541"KATAKANA PHONETIC EXTENSIONS",1542"KATAKANAPHONETICEXTENSIONS");15431544/**1545* Constant for the "Yijing Hexagram Symbols" Unicode character block.1546* @since 1.51547*/1548public static final UnicodeBlock YIJING_HEXAGRAM_SYMBOLS =1549new UnicodeBlock("YIJING_HEXAGRAM_SYMBOLS",1550"YIJING HEXAGRAM SYMBOLS",1551"YIJINGHEXAGRAMSYMBOLS");15521553/**1554* Constant for the "Variation Selectors" Unicode character block.1555* @since 1.51556*/1557public static final UnicodeBlock VARIATION_SELECTORS =1558new UnicodeBlock("VARIATION_SELECTORS",1559"VARIATION SELECTORS",1560"VARIATIONSELECTORS");15611562/**1563* Constant for the "Linear B Syllabary" Unicode character block.1564* @since 1.51565*/1566public static final UnicodeBlock LINEAR_B_SYLLABARY =1567new UnicodeBlock("LINEAR_B_SYLLABARY",1568"LINEAR B SYLLABARY",1569"LINEARBSYLLABARY");15701571/**1572* Constant for the "Linear B Ideograms" Unicode character block.1573* @since 1.51574*/1575public static final UnicodeBlock LINEAR_B_IDEOGRAMS =1576new UnicodeBlock("LINEAR_B_IDEOGRAMS",1577"LINEAR B IDEOGRAMS",1578"LINEARBIDEOGRAMS");15791580/**1581* Constant for the "Aegean Numbers" Unicode character block.1582* @since 1.51583*/1584public static final UnicodeBlock AEGEAN_NUMBERS =1585new UnicodeBlock("AEGEAN_NUMBERS",1586"AEGEAN NUMBERS",1587"AEGEANNUMBERS");15881589/**1590* Constant for the "Old Italic" Unicode character block.1591* @since 1.51592*/1593public static final UnicodeBlock OLD_ITALIC =1594new UnicodeBlock("OLD_ITALIC",1595"OLD ITALIC",1596"OLDITALIC");15971598/**1599* Constant for the "Gothic" Unicode character block.1600* @since 1.51601*/1602public static final UnicodeBlock GOTHIC =1603new UnicodeBlock("GOTHIC");16041605/**1606* Constant for the "Ugaritic" Unicode character block.1607* @since 1.51608*/1609public static final UnicodeBlock UGARITIC =1610new UnicodeBlock("UGARITIC");16111612/**1613* Constant for the "Deseret" Unicode character block.1614* @since 1.51615*/1616public static final UnicodeBlock DESERET =1617new UnicodeBlock("DESERET");16181619/**1620* Constant for the "Shavian" Unicode character block.1621* @since 1.51622*/1623public static final UnicodeBlock SHAVIAN =1624new UnicodeBlock("SHAVIAN");16251626/**1627* Constant for the "Osmanya" Unicode character block.1628* @since 1.51629*/1630public static final UnicodeBlock OSMANYA =1631new UnicodeBlock("OSMANYA");16321633/**1634* Constant for the "Cypriot Syllabary" Unicode character block.1635* @since 1.51636*/1637public static final UnicodeBlock CYPRIOT_SYLLABARY =1638new UnicodeBlock("CYPRIOT_SYLLABARY",1639"CYPRIOT SYLLABARY",1640"CYPRIOTSYLLABARY");16411642/**1643* Constant for the "Byzantine Musical Symbols" Unicode character block.1644* @since 1.51645*/1646public static final UnicodeBlock BYZANTINE_MUSICAL_SYMBOLS =1647new UnicodeBlock("BYZANTINE_MUSICAL_SYMBOLS",1648"BYZANTINE MUSICAL SYMBOLS",1649"BYZANTINEMUSICALSYMBOLS");16501651/**1652* Constant for the "Musical Symbols" Unicode character block.1653* @since 1.51654*/1655public static final UnicodeBlock MUSICAL_SYMBOLS =1656new UnicodeBlock("MUSICAL_SYMBOLS",1657"MUSICAL SYMBOLS",1658"MUSICALSYMBOLS");16591660/**1661* Constant for the "Tai Xuan Jing Symbols" Unicode character block.1662* @since 1.51663*/1664public static final UnicodeBlock TAI_XUAN_JING_SYMBOLS =1665new UnicodeBlock("TAI_XUAN_JING_SYMBOLS",1666"TAI XUAN JING SYMBOLS",1667"TAIXUANJINGSYMBOLS");16681669/**1670* Constant for the "Mathematical Alphanumeric Symbols" Unicode1671* character block.1672* @since 1.51673*/1674public static final UnicodeBlock MATHEMATICAL_ALPHANUMERIC_SYMBOLS =1675new UnicodeBlock("MATHEMATICAL_ALPHANUMERIC_SYMBOLS",1676"MATHEMATICAL ALPHANUMERIC SYMBOLS",1677"MATHEMATICALALPHANUMERICSYMBOLS");16781679/**1680* Constant for the "CJK Unified Ideographs Extension B" Unicode1681* character block.1682* @since 1.51683*/1684public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B =1685new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B",1686"CJK UNIFIED IDEOGRAPHS EXTENSION B",1687"CJKUNIFIEDIDEOGRAPHSEXTENSIONB");16881689/**1690* Constant for the "CJK Compatibility Ideographs Supplement" Unicode character block.1691* @since 1.51692*/1693public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT =1694new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT",1695"CJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT",1696"CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT");16971698/**1699* Constant for the "Tags" Unicode character block.1700* @since 1.51701*/1702public static final UnicodeBlock TAGS =1703new UnicodeBlock("TAGS");17041705/**1706* Constant for the "Variation Selectors Supplement" Unicode character1707* block.1708* @since 1.51709*/1710public static final UnicodeBlock VARIATION_SELECTORS_SUPPLEMENT =1711new UnicodeBlock("VARIATION_SELECTORS_SUPPLEMENT",1712"VARIATION SELECTORS SUPPLEMENT",1713"VARIATIONSELECTORSSUPPLEMENT");17141715/**1716* Constant for the "Supplementary Private Use Area-A" Unicode character1717* block.1718* @since 1.51719*/1720public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A =1721new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A",1722"SUPPLEMENTARY PRIVATE USE AREA-A",1723"SUPPLEMENTARYPRIVATEUSEAREA-A");17241725/**1726* Constant for the "Supplementary Private Use Area-B" Unicode character1727* block.1728* @since 1.51729*/1730public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_B =1731new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_B",1732"SUPPLEMENTARY PRIVATE USE AREA-B",1733"SUPPLEMENTARYPRIVATEUSEAREA-B");17341735/**1736* Constant for the "High Surrogates" Unicode character block.1737* This block represents codepoint values in the high surrogate1738* range: U+D800 through U+DB7F1739*1740* @since 1.51741*/1742public static final UnicodeBlock HIGH_SURROGATES =1743new UnicodeBlock("HIGH_SURROGATES",1744"HIGH SURROGATES",1745"HIGHSURROGATES");17461747/**1748* Constant for the "High Private Use Surrogates" Unicode character1749* block.1750* This block represents codepoint values in the private use high1751* surrogate range: U+DB80 through U+DBFF1752*1753* @since 1.51754*/1755public static final UnicodeBlock HIGH_PRIVATE_USE_SURROGATES =1756new UnicodeBlock("HIGH_PRIVATE_USE_SURROGATES",1757"HIGH PRIVATE USE SURROGATES",1758"HIGHPRIVATEUSESURROGATES");17591760/**1761* Constant for the "Low Surrogates" Unicode character block.1762* This block represents codepoint values in the low surrogate1763* range: U+DC00 through U+DFFF1764*1765* @since 1.51766*/1767public static final UnicodeBlock LOW_SURROGATES =1768new UnicodeBlock("LOW_SURROGATES",1769"LOW SURROGATES",1770"LOWSURROGATES");17711772/**1773* Constant for the "Arabic Supplement" Unicode character block.1774* @since 1.71775*/1776public static final UnicodeBlock ARABIC_SUPPLEMENT =1777new UnicodeBlock("ARABIC_SUPPLEMENT",1778"ARABIC SUPPLEMENT",1779"ARABICSUPPLEMENT");17801781/**1782* Constant for the "NKo" Unicode character block.1783* @since 1.71784*/1785public static final UnicodeBlock NKO =1786new UnicodeBlock("NKO");17871788/**1789* Constant for the "Samaritan" Unicode character block.1790* @since 1.71791*/1792public static final UnicodeBlock SAMARITAN =1793new UnicodeBlock("SAMARITAN");17941795/**1796* Constant for the "Mandaic" Unicode character block.1797* @since 1.71798*/1799public static final UnicodeBlock MANDAIC =1800new UnicodeBlock("MANDAIC");18011802/**1803* Constant for the "Ethiopic Supplement" Unicode character block.1804* @since 1.71805*/1806public static final UnicodeBlock ETHIOPIC_SUPPLEMENT =1807new UnicodeBlock("ETHIOPIC_SUPPLEMENT",1808"ETHIOPIC SUPPLEMENT",1809"ETHIOPICSUPPLEMENT");18101811/**1812* Constant for the "Unified Canadian Aboriginal Syllabics Extended"1813* Unicode character block.1814* @since 1.71815*/1816public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED =1817new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED",1818"UNIFIED CANADIAN ABORIGINAL SYLLABICS EXTENDED",1819"UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED");18201821/**1822* Constant for the "New Tai Lue" Unicode character block.1823* @since 1.71824*/1825public static final UnicodeBlock NEW_TAI_LUE =1826new UnicodeBlock("NEW_TAI_LUE",1827"NEW TAI LUE",1828"NEWTAILUE");18291830/**1831* Constant for the "Buginese" Unicode character block.1832* @since 1.71833*/1834public static final UnicodeBlock BUGINESE =1835new UnicodeBlock("BUGINESE");18361837/**1838* Constant for the "Tai Tham" Unicode character block.1839* @since 1.71840*/1841public static final UnicodeBlock TAI_THAM =1842new UnicodeBlock("TAI_THAM",1843"TAI THAM",1844"TAITHAM");18451846/**1847* Constant for the "Balinese" Unicode character block.1848* @since 1.71849*/1850public static final UnicodeBlock BALINESE =1851new UnicodeBlock("BALINESE");18521853/**1854* Constant for the "Sundanese" Unicode character block.1855* @since 1.71856*/1857public static final UnicodeBlock SUNDANESE =1858new UnicodeBlock("SUNDANESE");18591860/**1861* Constant for the "Batak" Unicode character block.1862* @since 1.71863*/1864public static final UnicodeBlock BATAK =1865new UnicodeBlock("BATAK");18661867/**1868* Constant for the "Lepcha" Unicode character block.1869* @since 1.71870*/1871public static final UnicodeBlock LEPCHA =1872new UnicodeBlock("LEPCHA");18731874/**1875* Constant for the "Ol Chiki" Unicode character block.1876* @since 1.71877*/1878public static final UnicodeBlock OL_CHIKI =1879new UnicodeBlock("OL_CHIKI",1880"OL CHIKI",1881"OLCHIKI");18821883/**1884* Constant for the "Vedic Extensions" Unicode character block.1885* @since 1.71886*/1887public static final UnicodeBlock VEDIC_EXTENSIONS =1888new UnicodeBlock("VEDIC_EXTENSIONS",1889"VEDIC EXTENSIONS",1890"VEDICEXTENSIONS");18911892/**1893* Constant for the "Phonetic Extensions Supplement" Unicode character1894* block.1895* @since 1.71896*/1897public static final UnicodeBlock PHONETIC_EXTENSIONS_SUPPLEMENT =1898new UnicodeBlock("PHONETIC_EXTENSIONS_SUPPLEMENT",1899"PHONETIC EXTENSIONS SUPPLEMENT",1900"PHONETICEXTENSIONSSUPPLEMENT");19011902/**1903* Constant for the "Combining Diacritical Marks Supplement" Unicode1904* character block.1905* @since 1.71906*/1907public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS_SUPPLEMENT =1908new UnicodeBlock("COMBINING_DIACRITICAL_MARKS_SUPPLEMENT",1909"COMBINING DIACRITICAL MARKS SUPPLEMENT",1910"COMBININGDIACRITICALMARKSSUPPLEMENT");19111912/**1913* Constant for the "Glagolitic" Unicode character block.1914* @since 1.71915*/1916public static final UnicodeBlock GLAGOLITIC =1917new UnicodeBlock("GLAGOLITIC");19181919/**1920* Constant for the "Latin Extended-C" Unicode character block.1921* @since 1.71922*/1923public static final UnicodeBlock LATIN_EXTENDED_C =1924new UnicodeBlock("LATIN_EXTENDED_C",1925"LATIN EXTENDED-C",1926"LATINEXTENDED-C");19271928/**1929* Constant for the "Coptic" Unicode character block.1930* @since 1.71931*/1932public static final UnicodeBlock COPTIC =1933new UnicodeBlock("COPTIC");19341935/**1936* Constant for the "Georgian Supplement" Unicode character block.1937* @since 1.71938*/1939public static final UnicodeBlock GEORGIAN_SUPPLEMENT =1940new UnicodeBlock("GEORGIAN_SUPPLEMENT",1941"GEORGIAN SUPPLEMENT",1942"GEORGIANSUPPLEMENT");19431944/**1945* Constant for the "Tifinagh" Unicode character block.1946* @since 1.71947*/1948public static final UnicodeBlock TIFINAGH =1949new UnicodeBlock("TIFINAGH");19501951/**1952* Constant for the "Ethiopic Extended" Unicode character block.1953* @since 1.71954*/1955public static final UnicodeBlock ETHIOPIC_EXTENDED =1956new UnicodeBlock("ETHIOPIC_EXTENDED",1957"ETHIOPIC EXTENDED",1958"ETHIOPICEXTENDED");19591960/**1961* Constant for the "Cyrillic Extended-A" Unicode character block.1962* @since 1.71963*/1964public static final UnicodeBlock CYRILLIC_EXTENDED_A =1965new UnicodeBlock("CYRILLIC_EXTENDED_A",1966"CYRILLIC EXTENDED-A",1967"CYRILLICEXTENDED-A");19681969/**1970* Constant for the "Supplemental Punctuation" Unicode character block.1971* @since 1.71972*/1973public static final UnicodeBlock SUPPLEMENTAL_PUNCTUATION =1974new UnicodeBlock("SUPPLEMENTAL_PUNCTUATION",1975"SUPPLEMENTAL PUNCTUATION",1976"SUPPLEMENTALPUNCTUATION");19771978/**1979* Constant for the "CJK Strokes" Unicode character block.1980* @since 1.71981*/1982public static final UnicodeBlock CJK_STROKES =1983new UnicodeBlock("CJK_STROKES",1984"CJK STROKES",1985"CJKSTROKES");19861987/**1988* Constant for the "Lisu" Unicode character block.1989* @since 1.71990*/1991public static final UnicodeBlock LISU =1992new UnicodeBlock("LISU");19931994/**1995* Constant for the "Vai" Unicode character block.1996* @since 1.71997*/1998public static final UnicodeBlock VAI =1999new UnicodeBlock("VAI");20002001/**2002* Constant for the "Cyrillic Extended-B" Unicode character block.2003* @since 1.72004*/2005public static final UnicodeBlock CYRILLIC_EXTENDED_B =2006new UnicodeBlock("CYRILLIC_EXTENDED_B",2007"CYRILLIC EXTENDED-B",2008"CYRILLICEXTENDED-B");20092010/**2011* Constant for the "Bamum" Unicode character block.2012* @since 1.72013*/2014public static final UnicodeBlock BAMUM =2015new UnicodeBlock("BAMUM");20162017/**2018* Constant for the "Modifier Tone Letters" Unicode character block.2019* @since 1.72020*/2021public static final UnicodeBlock MODIFIER_TONE_LETTERS =2022new UnicodeBlock("MODIFIER_TONE_LETTERS",2023"MODIFIER TONE LETTERS",2024"MODIFIERTONELETTERS");20252026/**2027* Constant for the "Latin Extended-D" Unicode character block.2028* @since 1.72029*/2030public static final UnicodeBlock LATIN_EXTENDED_D =2031new UnicodeBlock("LATIN_EXTENDED_D",2032"LATIN EXTENDED-D",2033"LATINEXTENDED-D");20342035/**2036* Constant for the "Syloti Nagri" Unicode character block.2037* @since 1.72038*/2039public static final UnicodeBlock SYLOTI_NAGRI =2040new UnicodeBlock("SYLOTI_NAGRI",2041"SYLOTI NAGRI",2042"SYLOTINAGRI");20432044/**2045* Constant for the "Common Indic Number Forms" Unicode character block.2046* @since 1.72047*/2048public static final UnicodeBlock COMMON_INDIC_NUMBER_FORMS =2049new UnicodeBlock("COMMON_INDIC_NUMBER_FORMS",2050"COMMON INDIC NUMBER FORMS",2051"COMMONINDICNUMBERFORMS");20522053/**2054* Constant for the "Phags-pa" Unicode character block.2055* @since 1.72056*/2057public static final UnicodeBlock PHAGS_PA =2058new UnicodeBlock("PHAGS_PA",2059"PHAGS-PA");20602061/**2062* Constant for the "Saurashtra" Unicode character block.2063* @since 1.72064*/2065public static final UnicodeBlock SAURASHTRA =2066new UnicodeBlock("SAURASHTRA");20672068/**2069* Constant for the "Devanagari Extended" Unicode character block.2070* @since 1.72071*/2072public static final UnicodeBlock DEVANAGARI_EXTENDED =2073new UnicodeBlock("DEVANAGARI_EXTENDED",2074"DEVANAGARI EXTENDED",2075"DEVANAGARIEXTENDED");20762077/**2078* Constant for the "Kayah Li" Unicode character block.2079* @since 1.72080*/2081public static final UnicodeBlock KAYAH_LI =2082new UnicodeBlock("KAYAH_LI",2083"KAYAH LI",2084"KAYAHLI");20852086/**2087* Constant for the "Rejang" Unicode character block.2088* @since 1.72089*/2090public static final UnicodeBlock REJANG =2091new UnicodeBlock("REJANG");20922093/**2094* Constant for the "Hangul Jamo Extended-A" Unicode character block.2095* @since 1.72096*/2097public static final UnicodeBlock HANGUL_JAMO_EXTENDED_A =2098new UnicodeBlock("HANGUL_JAMO_EXTENDED_A",2099"HANGUL JAMO EXTENDED-A",2100"HANGULJAMOEXTENDED-A");21012102/**2103* Constant for the "Javanese" Unicode character block.2104* @since 1.72105*/2106public static final UnicodeBlock JAVANESE =2107new UnicodeBlock("JAVANESE");21082109/**2110* Constant for the "Cham" Unicode character block.2111* @since 1.72112*/2113public static final UnicodeBlock CHAM =2114new UnicodeBlock("CHAM");21152116/**2117* Constant for the "Myanmar Extended-A" Unicode character block.2118* @since 1.72119*/2120public static final UnicodeBlock MYANMAR_EXTENDED_A =2121new UnicodeBlock("MYANMAR_EXTENDED_A",2122"MYANMAR EXTENDED-A",2123"MYANMAREXTENDED-A");21242125/**2126* Constant for the "Tai Viet" Unicode character block.2127* @since 1.72128*/2129public static final UnicodeBlock TAI_VIET =2130new UnicodeBlock("TAI_VIET",2131"TAI VIET",2132"TAIVIET");21332134/**2135* Constant for the "Ethiopic Extended-A" Unicode character block.2136* @since 1.72137*/2138public static final UnicodeBlock ETHIOPIC_EXTENDED_A =2139new UnicodeBlock("ETHIOPIC_EXTENDED_A",2140"ETHIOPIC EXTENDED-A",2141"ETHIOPICEXTENDED-A");21422143/**2144* Constant for the "Meetei Mayek" Unicode character block.2145* @since 1.72146*/2147public static final UnicodeBlock MEETEI_MAYEK =2148new UnicodeBlock("MEETEI_MAYEK",2149"MEETEI MAYEK",2150"MEETEIMAYEK");21512152/**2153* Constant for the "Hangul Jamo Extended-B" Unicode character block.2154* @since 1.72155*/2156public static final UnicodeBlock HANGUL_JAMO_EXTENDED_B =2157new UnicodeBlock("HANGUL_JAMO_EXTENDED_B",2158"HANGUL JAMO EXTENDED-B",2159"HANGULJAMOEXTENDED-B");21602161/**2162* Constant for the "Vertical Forms" Unicode character block.2163* @since 1.72164*/2165public static final UnicodeBlock VERTICAL_FORMS =2166new UnicodeBlock("VERTICAL_FORMS",2167"VERTICAL FORMS",2168"VERTICALFORMS");21692170/**2171* Constant for the "Ancient Greek Numbers" Unicode character block.2172* @since 1.72173*/2174public static final UnicodeBlock ANCIENT_GREEK_NUMBERS =2175new UnicodeBlock("ANCIENT_GREEK_NUMBERS",2176"ANCIENT GREEK NUMBERS",2177"ANCIENTGREEKNUMBERS");21782179/**2180* Constant for the "Ancient Symbols" Unicode character block.2181* @since 1.72182*/2183public static final UnicodeBlock ANCIENT_SYMBOLS =2184new UnicodeBlock("ANCIENT_SYMBOLS",2185"ANCIENT SYMBOLS",2186"ANCIENTSYMBOLS");21872188/**2189* Constant for the "Phaistos Disc" Unicode character block.2190* @since 1.72191*/2192public static final UnicodeBlock PHAISTOS_DISC =2193new UnicodeBlock("PHAISTOS_DISC",2194"PHAISTOS DISC",2195"PHAISTOSDISC");21962197/**2198* Constant for the "Lycian" Unicode character block.2199* @since 1.72200*/2201public static final UnicodeBlock LYCIAN =2202new UnicodeBlock("LYCIAN");22032204/**2205* Constant for the "Carian" Unicode character block.2206* @since 1.72207*/2208public static final UnicodeBlock CARIAN =2209new UnicodeBlock("CARIAN");22102211/**2212* Constant for the "Old Persian" Unicode character block.2213* @since 1.72214*/2215public static final UnicodeBlock OLD_PERSIAN =2216new UnicodeBlock("OLD_PERSIAN",2217"OLD PERSIAN",2218"OLDPERSIAN");22192220/**2221* Constant for the "Imperial Aramaic" Unicode character block.2222* @since 1.72223*/2224public static final UnicodeBlock IMPERIAL_ARAMAIC =2225new UnicodeBlock("IMPERIAL_ARAMAIC",2226"IMPERIAL ARAMAIC",2227"IMPERIALARAMAIC");22282229/**2230* Constant for the "Phoenician" Unicode character block.2231* @since 1.72232*/2233public static final UnicodeBlock PHOENICIAN =2234new UnicodeBlock("PHOENICIAN");22352236/**2237* Constant for the "Lydian" Unicode character block.2238* @since 1.72239*/2240public static final UnicodeBlock LYDIAN =2241new UnicodeBlock("LYDIAN");22422243/**2244* Constant for the "Kharoshthi" Unicode character block.2245* @since 1.72246*/2247public static final UnicodeBlock KHAROSHTHI =2248new UnicodeBlock("KHAROSHTHI");22492250/**2251* Constant for the "Old South Arabian" Unicode character block.2252* @since 1.72253*/2254public static final UnicodeBlock OLD_SOUTH_ARABIAN =2255new UnicodeBlock("OLD_SOUTH_ARABIAN",2256"OLD SOUTH ARABIAN",2257"OLDSOUTHARABIAN");22582259/**2260* Constant for the "Avestan" Unicode character block.2261* @since 1.72262*/2263public static final UnicodeBlock AVESTAN =2264new UnicodeBlock("AVESTAN");22652266/**2267* Constant for the "Inscriptional Parthian" Unicode character block.2268* @since 1.72269*/2270public static final UnicodeBlock INSCRIPTIONAL_PARTHIAN =2271new UnicodeBlock("INSCRIPTIONAL_PARTHIAN",2272"INSCRIPTIONAL PARTHIAN",2273"INSCRIPTIONALPARTHIAN");22742275/**2276* Constant for the "Inscriptional Pahlavi" Unicode character block.2277* @since 1.72278*/2279public static final UnicodeBlock INSCRIPTIONAL_PAHLAVI =2280new UnicodeBlock("INSCRIPTIONAL_PAHLAVI",2281"INSCRIPTIONAL PAHLAVI",2282"INSCRIPTIONALPAHLAVI");22832284/**2285* Constant for the "Old Turkic" Unicode character block.2286* @since 1.72287*/2288public static final UnicodeBlock OLD_TURKIC =2289new UnicodeBlock("OLD_TURKIC",2290"OLD TURKIC",2291"OLDTURKIC");22922293/**2294* Constant for the "Rumi Numeral Symbols" Unicode character block.2295* @since 1.72296*/2297public static final UnicodeBlock RUMI_NUMERAL_SYMBOLS =2298new UnicodeBlock("RUMI_NUMERAL_SYMBOLS",2299"RUMI NUMERAL SYMBOLS",2300"RUMINUMERALSYMBOLS");23012302/**2303* Constant for the "Brahmi" Unicode character block.2304* @since 1.72305*/2306public static final UnicodeBlock BRAHMI =2307new UnicodeBlock("BRAHMI");23082309/**2310* Constant for the "Kaithi" Unicode character block.2311* @since 1.72312*/2313public static final UnicodeBlock KAITHI =2314new UnicodeBlock("KAITHI");23152316/**2317* Constant for the "Cuneiform" Unicode character block.2318* @since 1.72319*/2320public static final UnicodeBlock CUNEIFORM =2321new UnicodeBlock("CUNEIFORM");23222323/**2324* Constant for the "Cuneiform Numbers and Punctuation" Unicode2325* character block.2326* @since 1.72327*/2328public static final UnicodeBlock CUNEIFORM_NUMBERS_AND_PUNCTUATION =2329new UnicodeBlock("CUNEIFORM_NUMBERS_AND_PUNCTUATION",2330"CUNEIFORM NUMBERS AND PUNCTUATION",2331"CUNEIFORMNUMBERSANDPUNCTUATION");23322333/**2334* Constant for the "Egyptian Hieroglyphs" Unicode character block.2335* @since 1.72336*/2337public static final UnicodeBlock EGYPTIAN_HIEROGLYPHS =2338new UnicodeBlock("EGYPTIAN_HIEROGLYPHS",2339"EGYPTIAN HIEROGLYPHS",2340"EGYPTIANHIEROGLYPHS");23412342/**2343* Constant for the "Bamum Supplement" Unicode character block.2344* @since 1.72345*/2346public static final UnicodeBlock BAMUM_SUPPLEMENT =2347new UnicodeBlock("BAMUM_SUPPLEMENT",2348"BAMUM SUPPLEMENT",2349"BAMUMSUPPLEMENT");23502351/**2352* Constant for the "Kana Supplement" Unicode character block.2353* @since 1.72354*/2355public static final UnicodeBlock KANA_SUPPLEMENT =2356new UnicodeBlock("KANA_SUPPLEMENT",2357"KANA SUPPLEMENT",2358"KANASUPPLEMENT");23592360/**2361* Constant for the "Ancient Greek Musical Notation" Unicode character2362* block.2363* @since 1.72364*/2365public static final UnicodeBlock ANCIENT_GREEK_MUSICAL_NOTATION =2366new UnicodeBlock("ANCIENT_GREEK_MUSICAL_NOTATION",2367"ANCIENT GREEK MUSICAL NOTATION",2368"ANCIENTGREEKMUSICALNOTATION");23692370/**2371* Constant for the "Counting Rod Numerals" Unicode character block.2372* @since 1.72373*/2374public static final UnicodeBlock COUNTING_ROD_NUMERALS =2375new UnicodeBlock("COUNTING_ROD_NUMERALS",2376"COUNTING ROD NUMERALS",2377"COUNTINGRODNUMERALS");23782379/**2380* Constant for the "Mahjong Tiles" Unicode character block.2381* @since 1.72382*/2383public static final UnicodeBlock MAHJONG_TILES =2384new UnicodeBlock("MAHJONG_TILES",2385"MAHJONG TILES",2386"MAHJONGTILES");23872388/**2389* Constant for the "Domino Tiles" Unicode character block.2390* @since 1.72391*/2392public static final UnicodeBlock DOMINO_TILES =2393new UnicodeBlock("DOMINO_TILES",2394"DOMINO TILES",2395"DOMINOTILES");23962397/**2398* Constant for the "Playing Cards" Unicode character block.2399* @since 1.72400*/2401public static final UnicodeBlock PLAYING_CARDS =2402new UnicodeBlock("PLAYING_CARDS",2403"PLAYING CARDS",2404"PLAYINGCARDS");24052406/**2407* Constant for the "Enclosed Alphanumeric Supplement" Unicode character2408* block.2409* @since 1.72410*/2411public static final UnicodeBlock ENCLOSED_ALPHANUMERIC_SUPPLEMENT =2412new UnicodeBlock("ENCLOSED_ALPHANUMERIC_SUPPLEMENT",2413"ENCLOSED ALPHANUMERIC SUPPLEMENT",2414"ENCLOSEDALPHANUMERICSUPPLEMENT");24152416/**2417* Constant for the "Enclosed Ideographic Supplement" Unicode character2418* block.2419* @since 1.72420*/2421public static final UnicodeBlock ENCLOSED_IDEOGRAPHIC_SUPPLEMENT =2422new UnicodeBlock("ENCLOSED_IDEOGRAPHIC_SUPPLEMENT",2423"ENCLOSED IDEOGRAPHIC SUPPLEMENT",2424"ENCLOSEDIDEOGRAPHICSUPPLEMENT");24252426/**2427* Constant for the "Miscellaneous Symbols And Pictographs" Unicode2428* character block.2429* @since 1.72430*/2431public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS =2432new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS",2433"MISCELLANEOUS SYMBOLS AND PICTOGRAPHS",2434"MISCELLANEOUSSYMBOLSANDPICTOGRAPHS");24352436/**2437* Constant for the "Emoticons" Unicode character block.2438* @since 1.72439*/2440public static final UnicodeBlock EMOTICONS =2441new UnicodeBlock("EMOTICONS");24422443/**2444* Constant for the "Transport And Map Symbols" Unicode character block.2445* @since 1.72446*/2447public static final UnicodeBlock TRANSPORT_AND_MAP_SYMBOLS =2448new UnicodeBlock("TRANSPORT_AND_MAP_SYMBOLS",2449"TRANSPORT AND MAP SYMBOLS",2450"TRANSPORTANDMAPSYMBOLS");24512452/**2453* Constant for the "Alchemical Symbols" Unicode character block.2454* @since 1.72455*/2456public static final UnicodeBlock ALCHEMICAL_SYMBOLS =2457new UnicodeBlock("ALCHEMICAL_SYMBOLS",2458"ALCHEMICAL SYMBOLS",2459"ALCHEMICALSYMBOLS");24602461/**2462* Constant for the "CJK Unified Ideographs Extension C" Unicode2463* character block.2464* @since 1.72465*/2466public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C =2467new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C",2468"CJK UNIFIED IDEOGRAPHS EXTENSION C",2469"CJKUNIFIEDIDEOGRAPHSEXTENSIONC");24702471/**2472* Constant for the "CJK Unified Ideographs Extension D" Unicode2473* character block.2474* @since 1.72475*/2476public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D =2477new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D",2478"CJK UNIFIED IDEOGRAPHS EXTENSION D",2479"CJKUNIFIEDIDEOGRAPHSEXTENSIOND");24802481/**2482* Constant for the "Arabic Extended-A" Unicode character block.2483* @since 1.82484*/2485public static final UnicodeBlock ARABIC_EXTENDED_A =2486new UnicodeBlock("ARABIC_EXTENDED_A",2487"ARABIC EXTENDED-A",2488"ARABICEXTENDED-A");24892490/**2491* Constant for the "Sundanese Supplement" Unicode character block.2492* @since 1.82493*/2494public static final UnicodeBlock SUNDANESE_SUPPLEMENT =2495new UnicodeBlock("SUNDANESE_SUPPLEMENT",2496"SUNDANESE SUPPLEMENT",2497"SUNDANESESUPPLEMENT");24982499/**2500* Constant for the "Meetei Mayek Extensions" Unicode character block.2501* @since 1.82502*/2503public static final UnicodeBlock MEETEI_MAYEK_EXTENSIONS =2504new UnicodeBlock("MEETEI_MAYEK_EXTENSIONS",2505"MEETEI MAYEK EXTENSIONS",2506"MEETEIMAYEKEXTENSIONS");25072508/**2509* Constant for the "Meroitic Hieroglyphs" Unicode character block.2510* @since 1.82511*/2512public static final UnicodeBlock MEROITIC_HIEROGLYPHS =2513new UnicodeBlock("MEROITIC_HIEROGLYPHS",2514"MEROITIC HIEROGLYPHS",2515"MEROITICHIEROGLYPHS");25162517/**2518* Constant for the "Meroitic Cursive" Unicode character block.2519* @since 1.82520*/2521public static final UnicodeBlock MEROITIC_CURSIVE =2522new UnicodeBlock("MEROITIC_CURSIVE",2523"MEROITIC CURSIVE",2524"MEROITICCURSIVE");25252526/**2527* Constant for the "Sora Sompeng" Unicode character block.2528* @since 1.82529*/2530public static final UnicodeBlock SORA_SOMPENG =2531new UnicodeBlock("SORA_SOMPENG",2532"SORA SOMPENG",2533"SORASOMPENG");25342535/**2536* Constant for the "Chakma" Unicode character block.2537* @since 1.82538*/2539public static final UnicodeBlock CHAKMA =2540new UnicodeBlock("CHAKMA");25412542/**2543* Constant for the "Sharada" Unicode character block.2544* @since 1.82545*/2546public static final UnicodeBlock SHARADA =2547new UnicodeBlock("SHARADA");25482549/**2550* Constant for the "Takri" Unicode character block.2551* @since 1.82552*/2553public static final UnicodeBlock TAKRI =2554new UnicodeBlock("TAKRI");25552556/**2557* Constant for the "Miao" Unicode character block.2558* @since 1.82559*/2560public static final UnicodeBlock MIAO =2561new UnicodeBlock("MIAO");25622563/**2564* Constant for the "Arabic Mathematical Alphabetic Symbols" Unicode2565* character block.2566* @since 1.82567*/2568public static final UnicodeBlock ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS =2569new UnicodeBlock("ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS",2570"ARABIC MATHEMATICAL ALPHABETIC SYMBOLS",2571"ARABICMATHEMATICALALPHABETICSYMBOLS");25722573private static final int[] blockStarts = {25740x0000, // 0000..007F; Basic Latin25750x0080, // 0080..00FF; Latin-1 Supplement25760x0100, // 0100..017F; Latin Extended-A25770x0180, // 0180..024F; Latin Extended-B25780x0250, // 0250..02AF; IPA Extensions25790x02B0, // 02B0..02FF; Spacing Modifier Letters25800x0300, // 0300..036F; Combining Diacritical Marks25810x0370, // 0370..03FF; Greek and Coptic25820x0400, // 0400..04FF; Cyrillic25830x0500, // 0500..052F; Cyrillic Supplement25840x0530, // 0530..058F; Armenian25850x0590, // 0590..05FF; Hebrew25860x0600, // 0600..06FF; Arabic25870x0700, // 0700..074F; Syriac25880x0750, // 0750..077F; Arabic Supplement25890x0780, // 0780..07BF; Thaana25900x07C0, // 07C0..07FF; NKo25910x0800, // 0800..083F; Samaritan25920x0840, // 0840..085F; Mandaic25930x0860, // unassigned25940x08A0, // 08A0..08FF; Arabic Extended-A25950x0900, // 0900..097F; Devanagari25960x0980, // 0980..09FF; Bengali25970x0A00, // 0A00..0A7F; Gurmukhi25980x0A80, // 0A80..0AFF; Gujarati25990x0B00, // 0B00..0B7F; Oriya26000x0B80, // 0B80..0BFF; Tamil26010x0C00, // 0C00..0C7F; Telugu26020x0C80, // 0C80..0CFF; Kannada26030x0D00, // 0D00..0D7F; Malayalam26040x0D80, // 0D80..0DFF; Sinhala26050x0E00, // 0E00..0E7F; Thai26060x0E80, // 0E80..0EFF; Lao26070x0F00, // 0F00..0FFF; Tibetan26080x1000, // 1000..109F; Myanmar26090x10A0, // 10A0..10FF; Georgian26100x1100, // 1100..11FF; Hangul Jamo26110x1200, // 1200..137F; Ethiopic26120x1380, // 1380..139F; Ethiopic Supplement26130x13A0, // 13A0..13FF; Cherokee26140x1400, // 1400..167F; Unified Canadian Aboriginal Syllabics26150x1680, // 1680..169F; Ogham26160x16A0, // 16A0..16FF; Runic26170x1700, // 1700..171F; Tagalog26180x1720, // 1720..173F; Hanunoo26190x1740, // 1740..175F; Buhid26200x1760, // 1760..177F; Tagbanwa26210x1780, // 1780..17FF; Khmer26220x1800, // 1800..18AF; Mongolian26230x18B0, // 18B0..18FF; Unified Canadian Aboriginal Syllabics Extended26240x1900, // 1900..194F; Limbu26250x1950, // 1950..197F; Tai Le26260x1980, // 1980..19DF; New Tai Lue26270x19E0, // 19E0..19FF; Khmer Symbols26280x1A00, // 1A00..1A1F; Buginese26290x1A20, // 1A20..1AAF; Tai Tham26300x1AB0, // unassigned26310x1B00, // 1B00..1B7F; Balinese26320x1B80, // 1B80..1BBF; Sundanese26330x1BC0, // 1BC0..1BFF; Batak26340x1C00, // 1C00..1C4F; Lepcha26350x1C50, // 1C50..1C7F; Ol Chiki26360x1C80, // unassigned26370x1CC0, // 1CC0..1CCF; Sundanese Supplement26380x1CD0, // 1CD0..1CFF; Vedic Extensions26390x1D00, // 1D00..1D7F; Phonetic Extensions26400x1D80, // 1D80..1DBF; Phonetic Extensions Supplement26410x1DC0, // 1DC0..1DFF; Combining Diacritical Marks Supplement26420x1E00, // 1E00..1EFF; Latin Extended Additional26430x1F00, // 1F00..1FFF; Greek Extended26440x2000, // 2000..206F; General Punctuation26450x2070, // 2070..209F; Superscripts and Subscripts26460x20A0, // 20A0..20CF; Currency Symbols26470x20D0, // 20D0..20FF; Combining Diacritical Marks for Symbols26480x2100, // 2100..214F; Letterlike Symbols26490x2150, // 2150..218F; Number Forms26500x2190, // 2190..21FF; Arrows26510x2200, // 2200..22FF; Mathematical Operators26520x2300, // 2300..23FF; Miscellaneous Technical26530x2400, // 2400..243F; Control Pictures26540x2440, // 2440..245F; Optical Character Recognition26550x2460, // 2460..24FF; Enclosed Alphanumerics26560x2500, // 2500..257F; Box Drawing26570x2580, // 2580..259F; Block Elements26580x25A0, // 25A0..25FF; Geometric Shapes26590x2600, // 2600..26FF; Miscellaneous Symbols26600x2700, // 2700..27BF; Dingbats26610x27C0, // 27C0..27EF; Miscellaneous Mathematical Symbols-A26620x27F0, // 27F0..27FF; Supplemental Arrows-A26630x2800, // 2800..28FF; Braille Patterns26640x2900, // 2900..297F; Supplemental Arrows-B26650x2980, // 2980..29FF; Miscellaneous Mathematical Symbols-B26660x2A00, // 2A00..2AFF; Supplemental Mathematical Operators26670x2B00, // 2B00..2BFF; Miscellaneous Symbols and Arrows26680x2C00, // 2C00..2C5F; Glagolitic26690x2C60, // 2C60..2C7F; Latin Extended-C26700x2C80, // 2C80..2CFF; Coptic26710x2D00, // 2D00..2D2F; Georgian Supplement26720x2D30, // 2D30..2D7F; Tifinagh26730x2D80, // 2D80..2DDF; Ethiopic Extended26740x2DE0, // 2DE0..2DFF; Cyrillic Extended-A26750x2E00, // 2E00..2E7F; Supplemental Punctuation26760x2E80, // 2E80..2EFF; CJK Radicals Supplement26770x2F00, // 2F00..2FDF; Kangxi Radicals26780x2FE0, // unassigned26790x2FF0, // 2FF0..2FFF; Ideographic Description Characters26800x3000, // 3000..303F; CJK Symbols and Punctuation26810x3040, // 3040..309F; Hiragana26820x30A0, // 30A0..30FF; Katakana26830x3100, // 3100..312F; Bopomofo26840x3130, // 3130..318F; Hangul Compatibility Jamo26850x3190, // 3190..319F; Kanbun26860x31A0, // 31A0..31BF; Bopomofo Extended26870x31C0, // 31C0..31EF; CJK Strokes26880x31F0, // 31F0..31FF; Katakana Phonetic Extensions26890x3200, // 3200..32FF; Enclosed CJK Letters and Months26900x3300, // 3300..33FF; CJK Compatibility26910x3400, // 3400..4DBF; CJK Unified Ideographs Extension A26920x4DC0, // 4DC0..4DFF; Yijing Hexagram Symbols26930x4E00, // 4E00..9FFF; CJK Unified Ideographs26940xA000, // A000..A48F; Yi Syllables26950xA490, // A490..A4CF; Yi Radicals26960xA4D0, // A4D0..A4FF; Lisu26970xA500, // A500..A63F; Vai26980xA640, // A640..A69F; Cyrillic Extended-B26990xA6A0, // A6A0..A6FF; Bamum27000xA700, // A700..A71F; Modifier Tone Letters27010xA720, // A720..A7FF; Latin Extended-D27020xA800, // A800..A82F; Syloti Nagri27030xA830, // A830..A83F; Common Indic Number Forms27040xA840, // A840..A87F; Phags-pa27050xA880, // A880..A8DF; Saurashtra27060xA8E0, // A8E0..A8FF; Devanagari Extended27070xA900, // A900..A92F; Kayah Li27080xA930, // A930..A95F; Rejang27090xA960, // A960..A97F; Hangul Jamo Extended-A27100xA980, // A980..A9DF; Javanese27110xA9E0, // unassigned27120xAA00, // AA00..AA5F; Cham27130xAA60, // AA60..AA7F; Myanmar Extended-A27140xAA80, // AA80..AADF; Tai Viet27150xAAE0, // AAE0..AAFF; Meetei Mayek Extensions27160xAB00, // AB00..AB2F; Ethiopic Extended-A27170xAB30, // unassigned27180xABC0, // ABC0..ABFF; Meetei Mayek27190xAC00, // AC00..D7AF; Hangul Syllables27200xD7B0, // D7B0..D7FF; Hangul Jamo Extended-B27210xD800, // D800..DB7F; High Surrogates27220xDB80, // DB80..DBFF; High Private Use Surrogates27230xDC00, // DC00..DFFF; Low Surrogates27240xE000, // E000..F8FF; Private Use Area27250xF900, // F900..FAFF; CJK Compatibility Ideographs27260xFB00, // FB00..FB4F; Alphabetic Presentation Forms27270xFB50, // FB50..FDFF; Arabic Presentation Forms-A27280xFE00, // FE00..FE0F; Variation Selectors27290xFE10, // FE10..FE1F; Vertical Forms27300xFE20, // FE20..FE2F; Combining Half Marks27310xFE30, // FE30..FE4F; CJK Compatibility Forms27320xFE50, // FE50..FE6F; Small Form Variants27330xFE70, // FE70..FEFF; Arabic Presentation Forms-B27340xFF00, // FF00..FFEF; Halfwidth and Fullwidth Forms27350xFFF0, // FFF0..FFFF; Specials27360x10000, // 10000..1007F; Linear B Syllabary27370x10080, // 10080..100FF; Linear B Ideograms27380x10100, // 10100..1013F; Aegean Numbers27390x10140, // 10140..1018F; Ancient Greek Numbers27400x10190, // 10190..101CF; Ancient Symbols27410x101D0, // 101D0..101FF; Phaistos Disc27420x10200, // unassigned27430x10280, // 10280..1029F; Lycian27440x102A0, // 102A0..102DF; Carian27450x102E0, // unassigned27460x10300, // 10300..1032F; Old Italic27470x10330, // 10330..1034F; Gothic27480x10350, // unassigned27490x10380, // 10380..1039F; Ugaritic27500x103A0, // 103A0..103DF; Old Persian27510x103E0, // unassigned27520x10400, // 10400..1044F; Deseret27530x10450, // 10450..1047F; Shavian27540x10480, // 10480..104AF; Osmanya27550x104B0, // unassigned27560x10800, // 10800..1083F; Cypriot Syllabary27570x10840, // 10840..1085F; Imperial Aramaic27580x10860, // unassigned27590x10900, // 10900..1091F; Phoenician27600x10920, // 10920..1093F; Lydian27610x10940, // unassigned27620x10980, // 10980..1099F; Meroitic Hieroglyphs27630x109A0, // 109A0..109FF; Meroitic Cursive27640x10A00, // 10A00..10A5F; Kharoshthi27650x10A60, // 10A60..10A7F; Old South Arabian27660x10A80, // unassigned27670x10B00, // 10B00..10B3F; Avestan27680x10B40, // 10B40..10B5F; Inscriptional Parthian27690x10B60, // 10B60..10B7F; Inscriptional Pahlavi27700x10B80, // unassigned27710x10C00, // 10C00..10C4F; Old Turkic27720x10C50, // unassigned27730x10E60, // 10E60..10E7F; Rumi Numeral Symbols27740x10E80, // unassigned27750x11000, // 11000..1107F; Brahmi27760x11080, // 11080..110CF; Kaithi27770x110D0, // 110D0..110FF; Sora Sompeng27780x11100, // 11100..1114F; Chakma27790x11150, // unassigned27800x11180, // 11180..111DF; Sharada27810x111E0, // unassigned27820x11680, // 11680..116CF; Takri27830x116D0, // unassigned27840x12000, // 12000..123FF; Cuneiform27850x12400, // 12400..1247F; Cuneiform Numbers and Punctuation27860x12480, // unassigned27870x13000, // 13000..1342F; Egyptian Hieroglyphs27880x13430, // unassigned27890x16800, // 16800..16A3F; Bamum Supplement27900x16A40, // unassigned27910x16F00, // 16F00..16F9F; Miao27920x16FA0, // unassigned27930x1B000, // 1B000..1B0FF; Kana Supplement27940x1B100, // unassigned27950x1D000, // 1D000..1D0FF; Byzantine Musical Symbols27960x1D100, // 1D100..1D1FF; Musical Symbols27970x1D200, // 1D200..1D24F; Ancient Greek Musical Notation27980x1D250, // unassigned27990x1D300, // 1D300..1D35F; Tai Xuan Jing Symbols28000x1D360, // 1D360..1D37F; Counting Rod Numerals28010x1D380, // unassigned28020x1D400, // 1D400..1D7FF; Mathematical Alphanumeric Symbols28030x1D800, // unassigned28040x1EE00, // 1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols28050x1EF00, // unassigned28060x1F000, // 1F000..1F02F; Mahjong Tiles28070x1F030, // 1F030..1F09F; Domino Tiles28080x1F0A0, // 1F0A0..1F0FF; Playing Cards28090x1F100, // 1F100..1F1FF; Enclosed Alphanumeric Supplement28100x1F200, // 1F200..1F2FF; Enclosed Ideographic Supplement28110x1F300, // 1F300..1F5FF; Miscellaneous Symbols And Pictographs28120x1F600, // 1F600..1F64F; Emoticons28130x1F650, // unassigned28140x1F680, // 1F680..1F6FF; Transport And Map Symbols28150x1F700, // 1F700..1F77F; Alchemical Symbols28160x1F780, // unassigned28170x20000, // 20000..2A6DF; CJK Unified Ideographs Extension B28180x2A6E0, // unassigned28190x2A700, // 2A700..2B73F; CJK Unified Ideographs Extension C28200x2B740, // 2B740..2B81F; CJK Unified Ideographs Extension D28210x2B820, // unassigned28220x2F800, // 2F800..2FA1F; CJK Compatibility Ideographs Supplement28230x2FA20, // unassigned28240xE0000, // E0000..E007F; Tags28250xE0080, // unassigned28260xE0100, // E0100..E01EF; Variation Selectors Supplement28270xE01F0, // unassigned28280xF0000, // F0000..FFFFF; Supplementary Private Use Area-A28290x100000 // 100000..10FFFF; Supplementary Private Use Area-B2830};28312832private static final UnicodeBlock[] blocks = {2833BASIC_LATIN,2834LATIN_1_SUPPLEMENT,2835LATIN_EXTENDED_A,2836LATIN_EXTENDED_B,2837IPA_EXTENSIONS,2838SPACING_MODIFIER_LETTERS,2839COMBINING_DIACRITICAL_MARKS,2840GREEK,2841CYRILLIC,2842CYRILLIC_SUPPLEMENTARY,2843ARMENIAN,2844HEBREW,2845ARABIC,2846SYRIAC,2847ARABIC_SUPPLEMENT,2848THAANA,2849NKO,2850SAMARITAN,2851MANDAIC,2852null,2853ARABIC_EXTENDED_A,2854DEVANAGARI,2855BENGALI,2856GURMUKHI,2857GUJARATI,2858ORIYA,2859TAMIL,2860TELUGU,2861KANNADA,2862MALAYALAM,2863SINHALA,2864THAI,2865LAO,2866TIBETAN,2867MYANMAR,2868GEORGIAN,2869HANGUL_JAMO,2870ETHIOPIC,2871ETHIOPIC_SUPPLEMENT,2872CHEROKEE,2873UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS,2874OGHAM,2875RUNIC,2876TAGALOG,2877HANUNOO,2878BUHID,2879TAGBANWA,2880KHMER,2881MONGOLIAN,2882UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED,2883LIMBU,2884TAI_LE,2885NEW_TAI_LUE,2886KHMER_SYMBOLS,2887BUGINESE,2888TAI_THAM,2889null,2890BALINESE,2891SUNDANESE,2892BATAK,2893LEPCHA,2894OL_CHIKI,2895null,2896SUNDANESE_SUPPLEMENT,2897VEDIC_EXTENSIONS,2898PHONETIC_EXTENSIONS,2899PHONETIC_EXTENSIONS_SUPPLEMENT,2900COMBINING_DIACRITICAL_MARKS_SUPPLEMENT,2901LATIN_EXTENDED_ADDITIONAL,2902GREEK_EXTENDED,2903GENERAL_PUNCTUATION,2904SUPERSCRIPTS_AND_SUBSCRIPTS,2905CURRENCY_SYMBOLS,2906COMBINING_MARKS_FOR_SYMBOLS,2907LETTERLIKE_SYMBOLS,2908NUMBER_FORMS,2909ARROWS,2910MATHEMATICAL_OPERATORS,2911MISCELLANEOUS_TECHNICAL,2912CONTROL_PICTURES,2913OPTICAL_CHARACTER_RECOGNITION,2914ENCLOSED_ALPHANUMERICS,2915BOX_DRAWING,2916BLOCK_ELEMENTS,2917GEOMETRIC_SHAPES,2918MISCELLANEOUS_SYMBOLS,2919DINGBATS,2920MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A,2921SUPPLEMENTAL_ARROWS_A,2922BRAILLE_PATTERNS,2923SUPPLEMENTAL_ARROWS_B,2924MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B,2925SUPPLEMENTAL_MATHEMATICAL_OPERATORS,2926MISCELLANEOUS_SYMBOLS_AND_ARROWS,2927GLAGOLITIC,2928LATIN_EXTENDED_C,2929COPTIC,2930GEORGIAN_SUPPLEMENT,2931TIFINAGH,2932ETHIOPIC_EXTENDED,2933CYRILLIC_EXTENDED_A,2934SUPPLEMENTAL_PUNCTUATION,2935CJK_RADICALS_SUPPLEMENT,2936KANGXI_RADICALS,2937null,2938IDEOGRAPHIC_DESCRIPTION_CHARACTERS,2939CJK_SYMBOLS_AND_PUNCTUATION,2940HIRAGANA,2941KATAKANA,2942BOPOMOFO,2943HANGUL_COMPATIBILITY_JAMO,2944KANBUN,2945BOPOMOFO_EXTENDED,2946CJK_STROKES,2947KATAKANA_PHONETIC_EXTENSIONS,2948ENCLOSED_CJK_LETTERS_AND_MONTHS,2949CJK_COMPATIBILITY,2950CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A,2951YIJING_HEXAGRAM_SYMBOLS,2952CJK_UNIFIED_IDEOGRAPHS,2953YI_SYLLABLES,2954YI_RADICALS,2955LISU,2956VAI,2957CYRILLIC_EXTENDED_B,2958BAMUM,2959MODIFIER_TONE_LETTERS,2960LATIN_EXTENDED_D,2961SYLOTI_NAGRI,2962COMMON_INDIC_NUMBER_FORMS,2963PHAGS_PA,2964SAURASHTRA,2965DEVANAGARI_EXTENDED,2966KAYAH_LI,2967REJANG,2968HANGUL_JAMO_EXTENDED_A,2969JAVANESE,2970null,2971CHAM,2972MYANMAR_EXTENDED_A,2973TAI_VIET,2974MEETEI_MAYEK_EXTENSIONS,2975ETHIOPIC_EXTENDED_A,2976null,2977MEETEI_MAYEK,2978HANGUL_SYLLABLES,2979HANGUL_JAMO_EXTENDED_B,2980HIGH_SURROGATES,2981HIGH_PRIVATE_USE_SURROGATES,2982LOW_SURROGATES,2983PRIVATE_USE_AREA,2984CJK_COMPATIBILITY_IDEOGRAPHS,2985ALPHABETIC_PRESENTATION_FORMS,2986ARABIC_PRESENTATION_FORMS_A,2987VARIATION_SELECTORS,2988VERTICAL_FORMS,2989COMBINING_HALF_MARKS,2990CJK_COMPATIBILITY_FORMS,2991SMALL_FORM_VARIANTS,2992ARABIC_PRESENTATION_FORMS_B,2993HALFWIDTH_AND_FULLWIDTH_FORMS,2994SPECIALS,2995LINEAR_B_SYLLABARY,2996LINEAR_B_IDEOGRAMS,2997AEGEAN_NUMBERS,2998ANCIENT_GREEK_NUMBERS,2999ANCIENT_SYMBOLS,3000PHAISTOS_DISC,3001null,3002LYCIAN,3003CARIAN,3004null,3005OLD_ITALIC,3006GOTHIC,3007null,3008UGARITIC,3009OLD_PERSIAN,3010null,3011DESERET,3012SHAVIAN,3013OSMANYA,3014null,3015CYPRIOT_SYLLABARY,3016IMPERIAL_ARAMAIC,3017null,3018PHOENICIAN,3019LYDIAN,3020null,3021MEROITIC_HIEROGLYPHS,3022MEROITIC_CURSIVE,3023KHAROSHTHI,3024OLD_SOUTH_ARABIAN,3025null,3026AVESTAN,3027INSCRIPTIONAL_PARTHIAN,3028INSCRIPTIONAL_PAHLAVI,3029null,3030OLD_TURKIC,3031null,3032RUMI_NUMERAL_SYMBOLS,3033null,3034BRAHMI,3035KAITHI,3036SORA_SOMPENG,3037CHAKMA,3038null,3039SHARADA,3040null,3041TAKRI,3042null,3043CUNEIFORM,3044CUNEIFORM_NUMBERS_AND_PUNCTUATION,3045null,3046EGYPTIAN_HIEROGLYPHS,3047null,3048BAMUM_SUPPLEMENT,3049null,3050MIAO,3051null,3052KANA_SUPPLEMENT,3053null,3054BYZANTINE_MUSICAL_SYMBOLS,3055MUSICAL_SYMBOLS,3056ANCIENT_GREEK_MUSICAL_NOTATION,3057null,3058TAI_XUAN_JING_SYMBOLS,3059COUNTING_ROD_NUMERALS,3060null,3061MATHEMATICAL_ALPHANUMERIC_SYMBOLS,3062null,3063ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS,3064null,3065MAHJONG_TILES,3066DOMINO_TILES,3067PLAYING_CARDS,3068ENCLOSED_ALPHANUMERIC_SUPPLEMENT,3069ENCLOSED_IDEOGRAPHIC_SUPPLEMENT,3070MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS,3071EMOTICONS,3072null,3073TRANSPORT_AND_MAP_SYMBOLS,3074ALCHEMICAL_SYMBOLS,3075null,3076CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B,3077null,3078CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C,3079CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D,3080null,3081CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT,3082null,3083TAGS,3084null,3085VARIATION_SELECTORS_SUPPLEMENT,3086null,3087SUPPLEMENTARY_PRIVATE_USE_AREA_A,3088SUPPLEMENTARY_PRIVATE_USE_AREA_B3089};309030913092/**3093* Returns the object representing the Unicode block containing the3094* given character, or {@code null} if the character is not a3095* member of a defined block.3096*3097* <p><b>Note:</b> This method cannot handle3098* <a href="Character.html#supplementary"> supplementary3099* characters</a>. To support all Unicode characters, including3100* supplementary characters, use the {@link #of(int)} method.3101*3102* @param c The character in question3103* @return The {@code UnicodeBlock} instance representing the3104* Unicode block of which this character is a member, or3105* {@code null} if the character is not a member of any3106* Unicode block3107*/3108public static UnicodeBlock of(char c) {3109return of((int)c);3110}31113112/**3113* Returns the object representing the Unicode block3114* containing the given character (Unicode code point), or3115* {@code null} if the character is not a member of a3116* defined block.3117*3118* @param codePoint the character (Unicode code point) in question.3119* @return The {@code UnicodeBlock} instance representing the3120* Unicode block of which this character is a member, or3121* {@code null} if the character is not a member of any3122* Unicode block3123* @exception IllegalArgumentException if the specified3124* {@code codePoint} is an invalid Unicode code point.3125* @see Character#isValidCodePoint(int)3126* @since 1.53127*/3128public static UnicodeBlock of(int codePoint) {3129if (!isValidCodePoint(codePoint)) {3130throw new IllegalArgumentException();3131}31323133int top, bottom, current;3134bottom = 0;3135top = blockStarts.length;3136current = top/2;31373138// invariant: top > current >= bottom && codePoint >= unicodeBlockStarts[bottom]3139while (top - bottom > 1) {3140if (codePoint >= blockStarts[current]) {3141bottom = current;3142} else {3143top = current;3144}3145current = (top + bottom) / 2;3146}3147return blocks[current];3148}31493150/**3151* Returns the UnicodeBlock with the given name. Block3152* names are determined by The Unicode Standard. The file3153* Blocks-<version>.txt defines blocks for a particular3154* version of the standard. The {@link Character} class specifies3155* the version of the standard that it supports.3156* <p>3157* This method accepts block names in the following forms:3158* <ol>3159* <li> Canonical block names as defined by the Unicode Standard.3160* For example, the standard defines a "Basic Latin" block. Therefore, this3161* method accepts "Basic Latin" as a valid block name. The documentation of3162* each UnicodeBlock provides the canonical name.3163* <li>Canonical block names with all spaces removed. For example, "BasicLatin"3164* is a valid block name for the "Basic Latin" block.3165* <li>The text representation of each constant UnicodeBlock identifier.3166* For example, this method will return the {@link #BASIC_LATIN} block if3167* provided with the "BASIC_LATIN" name. This form replaces all spaces and3168* hyphens in the canonical name with underscores.3169* </ol>3170* Finally, character case is ignored for all of the valid block name forms.3171* For example, "BASIC_LATIN" and "basic_latin" are both valid block names.3172* The en_US locale's case mapping rules are used to provide case-insensitive3173* string comparisons for block name validation.3174* <p>3175* If the Unicode Standard changes block names, both the previous and3176* current names will be accepted.3177*3178* @param blockName A {@code UnicodeBlock} name.3179* @return The {@code UnicodeBlock} instance identified3180* by {@code blockName}3181* @throws IllegalArgumentException if {@code blockName} is an3182* invalid name3183* @throws NullPointerException if {@code blockName} is null3184* @since 1.53185*/3186public static final UnicodeBlock forName(String blockName) {3187UnicodeBlock block = map.get(blockName.toUpperCase(Locale.US));3188if (block == null) {3189throw new IllegalArgumentException();3190}3191return block;3192}3193}319431953196/**3197* A family of character subsets representing the character scripts3198* defined in the <a href="http://www.unicode.org/reports/tr24/">3199* <i>Unicode Standard Annex #24: Script Names</i></a>. Every Unicode3200* character is assigned to a single Unicode script, either a specific3201* script, such as {@link Character.UnicodeScript#LATIN Latin}, or3202* one of the following three special values,3203* {@link Character.UnicodeScript#INHERITED Inherited},3204* {@link Character.UnicodeScript#COMMON Common} or3205* {@link Character.UnicodeScript#UNKNOWN Unknown}.3206*3207* @since 1.73208*/3209public static enum UnicodeScript {3210/**3211* Unicode script "Common".3212*/3213COMMON,32143215/**3216* Unicode script "Latin".3217*/3218LATIN,32193220/**3221* Unicode script "Greek".3222*/3223GREEK,32243225/**3226* Unicode script "Cyrillic".3227*/3228CYRILLIC,32293230/**3231* Unicode script "Armenian".3232*/3233ARMENIAN,32343235/**3236* Unicode script "Hebrew".3237*/3238HEBREW,32393240/**3241* Unicode script "Arabic".3242*/3243ARABIC,32443245/**3246* Unicode script "Syriac".3247*/3248SYRIAC,32493250/**3251* Unicode script "Thaana".3252*/3253THAANA,32543255/**3256* Unicode script "Devanagari".3257*/3258DEVANAGARI,32593260/**3261* Unicode script "Bengali".3262*/3263BENGALI,32643265/**3266* Unicode script "Gurmukhi".3267*/3268GURMUKHI,32693270/**3271* Unicode script "Gujarati".3272*/3273GUJARATI,32743275/**3276* Unicode script "Oriya".3277*/3278ORIYA,32793280/**3281* Unicode script "Tamil".3282*/3283TAMIL,32843285/**3286* Unicode script "Telugu".3287*/3288TELUGU,32893290/**3291* Unicode script "Kannada".3292*/3293KANNADA,32943295/**3296* Unicode script "Malayalam".3297*/3298MALAYALAM,32993300/**3301* Unicode script "Sinhala".3302*/3303SINHALA,33043305/**3306* Unicode script "Thai".3307*/3308THAI,33093310/**3311* Unicode script "Lao".3312*/3313LAO,33143315/**3316* Unicode script "Tibetan".3317*/3318TIBETAN,33193320/**3321* Unicode script "Myanmar".3322*/3323MYANMAR,33243325/**3326* Unicode script "Georgian".3327*/3328GEORGIAN,33293330/**3331* Unicode script "Hangul".3332*/3333HANGUL,33343335/**3336* Unicode script "Ethiopic".3337*/3338ETHIOPIC,33393340/**3341* Unicode script "Cherokee".3342*/3343CHEROKEE,33443345/**3346* Unicode script "Canadian_Aboriginal".3347*/3348CANADIAN_ABORIGINAL,33493350/**3351* Unicode script "Ogham".3352*/3353OGHAM,33543355/**3356* Unicode script "Runic".3357*/3358RUNIC,33593360/**3361* Unicode script "Khmer".3362*/3363KHMER,33643365/**3366* Unicode script "Mongolian".3367*/3368MONGOLIAN,33693370/**3371* Unicode script "Hiragana".3372*/3373HIRAGANA,33743375/**3376* Unicode script "Katakana".3377*/3378KATAKANA,33793380/**3381* Unicode script "Bopomofo".3382*/3383BOPOMOFO,33843385/**3386* Unicode script "Han".3387*/3388HAN,33893390/**3391* Unicode script "Yi".3392*/3393YI,33943395/**3396* Unicode script "Old_Italic".3397*/3398OLD_ITALIC,33993400/**3401* Unicode script "Gothic".3402*/3403GOTHIC,34043405/**3406* Unicode script "Deseret".3407*/3408DESERET,34093410/**3411* Unicode script "Inherited".3412*/3413INHERITED,34143415/**3416* Unicode script "Tagalog".3417*/3418TAGALOG,34193420/**3421* Unicode script "Hanunoo".3422*/3423HANUNOO,34243425/**3426* Unicode script "Buhid".3427*/3428BUHID,34293430/**3431* Unicode script "Tagbanwa".3432*/3433TAGBANWA,34343435/**3436* Unicode script "Limbu".3437*/3438LIMBU,34393440/**3441* Unicode script "Tai_Le".3442*/3443TAI_LE,34443445/**3446* Unicode script "Linear_B".3447*/3448LINEAR_B,34493450/**3451* Unicode script "Ugaritic".3452*/3453UGARITIC,34543455/**3456* Unicode script "Shavian".3457*/3458SHAVIAN,34593460/**3461* Unicode script "Osmanya".3462*/3463OSMANYA,34643465/**3466* Unicode script "Cypriot".3467*/3468CYPRIOT,34693470/**3471* Unicode script "Braille".3472*/3473BRAILLE,34743475/**3476* Unicode script "Buginese".3477*/3478BUGINESE,34793480/**3481* Unicode script "Coptic".3482*/3483COPTIC,34843485/**3486* Unicode script "New_Tai_Lue".3487*/3488NEW_TAI_LUE,34893490/**3491* Unicode script "Glagolitic".3492*/3493GLAGOLITIC,34943495/**3496* Unicode script "Tifinagh".3497*/3498TIFINAGH,34993500/**3501* Unicode script "Syloti_Nagri".3502*/3503SYLOTI_NAGRI,35043505/**3506* Unicode script "Old_Persian".3507*/3508OLD_PERSIAN,35093510/**3511* Unicode script "Kharoshthi".3512*/3513KHAROSHTHI,35143515/**3516* Unicode script "Balinese".3517*/3518BALINESE,35193520/**3521* Unicode script "Cuneiform".3522*/3523CUNEIFORM,35243525/**3526* Unicode script "Phoenician".3527*/3528PHOENICIAN,35293530/**3531* Unicode script "Phags_Pa".3532*/3533PHAGS_PA,35343535/**3536* Unicode script "Nko".3537*/3538NKO,35393540/**3541* Unicode script "Sundanese".3542*/3543SUNDANESE,35443545/**3546* Unicode script "Batak".3547*/3548BATAK,35493550/**3551* Unicode script "Lepcha".3552*/3553LEPCHA,35543555/**3556* Unicode script "Ol_Chiki".3557*/3558OL_CHIKI,35593560/**3561* Unicode script "Vai".3562*/3563VAI,35643565/**3566* Unicode script "Saurashtra".3567*/3568SAURASHTRA,35693570/**3571* Unicode script "Kayah_Li".3572*/3573KAYAH_LI,35743575/**3576* Unicode script "Rejang".3577*/3578REJANG,35793580/**3581* Unicode script "Lycian".3582*/3583LYCIAN,35843585/**3586* Unicode script "Carian".3587*/3588CARIAN,35893590/**3591* Unicode script "Lydian".3592*/3593LYDIAN,35943595/**3596* Unicode script "Cham".3597*/3598CHAM,35993600/**3601* Unicode script "Tai_Tham".3602*/3603TAI_THAM,36043605/**3606* Unicode script "Tai_Viet".3607*/3608TAI_VIET,36093610/**3611* Unicode script "Avestan".3612*/3613AVESTAN,36143615/**3616* Unicode script "Egyptian_Hieroglyphs".3617*/3618EGYPTIAN_HIEROGLYPHS,36193620/**3621* Unicode script "Samaritan".3622*/3623SAMARITAN,36243625/**3626* Unicode script "Mandaic".3627*/3628MANDAIC,36293630/**3631* Unicode script "Lisu".3632*/3633LISU,36343635/**3636* Unicode script "Bamum".3637*/3638BAMUM,36393640/**3641* Unicode script "Javanese".3642*/3643JAVANESE,36443645/**3646* Unicode script "Meetei_Mayek".3647*/3648MEETEI_MAYEK,36493650/**3651* Unicode script "Imperial_Aramaic".3652*/3653IMPERIAL_ARAMAIC,36543655/**3656* Unicode script "Old_South_Arabian".3657*/3658OLD_SOUTH_ARABIAN,36593660/**3661* Unicode script "Inscriptional_Parthian".3662*/3663INSCRIPTIONAL_PARTHIAN,36643665/**3666* Unicode script "Inscriptional_Pahlavi".3667*/3668INSCRIPTIONAL_PAHLAVI,36693670/**3671* Unicode script "Old_Turkic".3672*/3673OLD_TURKIC,36743675/**3676* Unicode script "Brahmi".3677*/3678BRAHMI,36793680/**3681* Unicode script "Kaithi".3682*/3683KAITHI,36843685/**3686* Unicode script "Meroitic Hieroglyphs".3687*/3688MEROITIC_HIEROGLYPHS,36893690/**3691* Unicode script "Meroitic Cursive".3692*/3693MEROITIC_CURSIVE,36943695/**3696* Unicode script "Sora Sompeng".3697*/3698SORA_SOMPENG,36993700/**3701* Unicode script "Chakma".3702*/3703CHAKMA,37043705/**3706* Unicode script "Sharada".3707*/3708SHARADA,37093710/**3711* Unicode script "Takri".3712*/3713TAKRI,37143715/**3716* Unicode script "Miao".3717*/3718MIAO,37193720/**3721* Unicode script "Unknown".3722*/3723UNKNOWN;37243725private static final int[] scriptStarts = {37260x0000, // 0000..0040; COMMON37270x0041, // 0041..005A; LATIN37280x005B, // 005B..0060; COMMON37290x0061, // 0061..007A; LATIN37300x007B, // 007B..00A9; COMMON37310x00AA, // 00AA..00AA; LATIN37320x00AB, // 00AB..00B9; COMMON37330x00BA, // 00BA..00BA; LATIN37340x00BB, // 00BB..00BF; COMMON37350x00C0, // 00C0..00D6; LATIN37360x00D7, // 00D7..00D7; COMMON37370x00D8, // 00D8..00F6; LATIN37380x00F7, // 00F7..00F7; COMMON37390x00F8, // 00F8..02B8; LATIN37400x02B9, // 02B9..02DF; COMMON37410x02E0, // 02E0..02E4; LATIN37420x02E5, // 02E5..02E9; COMMON37430x02EA, // 02EA..02EB; BOPOMOFO37440x02EC, // 02EC..02FF; COMMON37450x0300, // 0300..036F; INHERITED37460x0370, // 0370..0373; GREEK37470x0374, // 0374..0374; COMMON37480x0375, // 0375..037D; GREEK37490x037E, // 037E..0383; COMMON37500x0384, // 0384..0384; GREEK37510x0385, // 0385..0385; COMMON37520x0386, // 0386..0386; GREEK37530x0387, // 0387..0387; COMMON37540x0388, // 0388..03E1; GREEK37550x03E2, // 03E2..03EF; COPTIC37560x03F0, // 03F0..03FF; GREEK37570x0400, // 0400..0484; CYRILLIC37580x0485, // 0485..0486; INHERITED37590x0487, // 0487..0530; CYRILLIC37600x0531, // 0531..0588; ARMENIAN37610x0589, // 0589..0589; COMMON37620x058A, // 058A..0590; ARMENIAN37630x0591, // 0591..05FF; HEBREW37640x0600, // 0600..060B; ARABIC37650x060C, // 060C..060C; COMMON37660x060D, // 060D..061A; ARABIC37670x061B, // 061B..061D; COMMON37680x061E, // 061E..061E; ARABIC37690x061F, // 061F..061F; COMMON37700x0620, // 0620..063F; ARABIC37710x0640, // 0640..0640; COMMON37720x0641, // 0641..064A; ARABIC37730x064B, // 064B..0655; INHERITED37740x0656, // 0656..065F; ARABIC37750x0660, // 0660..0669; COMMON37760x066A, // 066A..066F; ARABIC37770x0670, // 0670..0670; INHERITED37780x0671, // 0671..06DC; ARABIC37790x06DD, // 06DD..06DD; COMMON37800x06DE, // 06DE..06FF; ARABIC37810x0700, // 0700..074F; SYRIAC37820x0750, // 0750..077F; ARABIC37830x0780, // 0780..07BF; THAANA37840x07C0, // 07C0..07FF; NKO37850x0800, // 0800..083F; SAMARITAN37860x0840, // 0840..089F; MANDAIC37870x08A0, // 08A0..08FF; ARABIC37880x0900, // 0900..0950; DEVANAGARI37890x0951, // 0951..0952; INHERITED37900x0953, // 0953..0963; DEVANAGARI37910x0964, // 0964..0965; COMMON37920x0966, // 0966..0980; DEVANAGARI37930x0981, // 0981..0A00; BENGALI37940x0A01, // 0A01..0A80; GURMUKHI37950x0A81, // 0A81..0B00; GUJARATI37960x0B01, // 0B01..0B81; ORIYA37970x0B82, // 0B82..0C00; TAMIL37980x0C01, // 0C01..0C81; TELUGU37990x0C82, // 0C82..0CF0; KANNADA38000x0D02, // 0D02..0D81; MALAYALAM38010x0D82, // 0D82..0E00; SINHALA38020x0E01, // 0E01..0E3E; THAI38030x0E3F, // 0E3F..0E3F; COMMON38040x0E40, // 0E40..0E80; THAI38050x0E81, // 0E81..0EFF; LAO38060x0F00, // 0F00..0FD4; TIBETAN38070x0FD5, // 0FD5..0FD8; COMMON38080x0FD9, // 0FD9..0FFF; TIBETAN38090x1000, // 1000..109F; MYANMAR38100x10A0, // 10A0..10FA; GEORGIAN38110x10FB, // 10FB..10FB; COMMON38120x10FC, // 10FC..10FF; GEORGIAN38130x1100, // 1100..11FF; HANGUL38140x1200, // 1200..139F; ETHIOPIC38150x13A0, // 13A0..13FF; CHEROKEE38160x1400, // 1400..167F; CANADIAN_ABORIGINAL38170x1680, // 1680..169F; OGHAM38180x16A0, // 16A0..16EA; RUNIC38190x16EB, // 16EB..16ED; COMMON38200x16EE, // 16EE..16FF; RUNIC38210x1700, // 1700..171F; TAGALOG38220x1720, // 1720..1734; HANUNOO38230x1735, // 1735..173F; COMMON38240x1740, // 1740..175F; BUHID38250x1760, // 1760..177F; TAGBANWA38260x1780, // 1780..17FF; KHMER38270x1800, // 1800..1801; MONGOLIAN38280x1802, // 1802..1803; COMMON38290x1804, // 1804..1804; MONGOLIAN38300x1805, // 1805..1805; COMMON38310x1806, // 1806..18AF; MONGOLIAN38320x18B0, // 18B0..18FF; CANADIAN_ABORIGINAL38330x1900, // 1900..194F; LIMBU38340x1950, // 1950..197F; TAI_LE38350x1980, // 1980..19DF; NEW_TAI_LUE38360x19E0, // 19E0..19FF; KHMER38370x1A00, // 1A00..1A1F; BUGINESE38380x1A20, // 1A20..1AFF; TAI_THAM38390x1B00, // 1B00..1B7F; BALINESE38400x1B80, // 1B80..1BBF; SUNDANESE38410x1BC0, // 1BC0..1BFF; BATAK38420x1C00, // 1C00..1C4F; LEPCHA38430x1C50, // 1C50..1CBF; OL_CHIKI38440x1CC0, // 1CC0..1CCF; SUNDANESE38450x1CD0, // 1CD0..1CD2; INHERITED38460x1CD3, // 1CD3..1CD3; COMMON38470x1CD4, // 1CD4..1CE0; INHERITED38480x1CE1, // 1CE1..1CE1; COMMON38490x1CE2, // 1CE2..1CE8; INHERITED38500x1CE9, // 1CE9..1CEC; COMMON38510x1CED, // 1CED..1CED; INHERITED38520x1CEE, // 1CEE..1CF3; COMMON38530x1CF4, // 1CF4..1CF4; INHERITED38540x1CF5, // 1CF5..1CFF; COMMON38550x1D00, // 1D00..1D25; LATIN38560x1D26, // 1D26..1D2A; GREEK38570x1D2B, // 1D2B..1D2B; CYRILLIC38580x1D2C, // 1D2C..1D5C; LATIN38590x1D5D, // 1D5D..1D61; GREEK38600x1D62, // 1D62..1D65; LATIN38610x1D66, // 1D66..1D6A; GREEK38620x1D6B, // 1D6B..1D77; LATIN38630x1D78, // 1D78..1D78; CYRILLIC38640x1D79, // 1D79..1DBE; LATIN38650x1DBF, // 1DBF..1DBF; GREEK38660x1DC0, // 1DC0..1DFF; INHERITED38670x1E00, // 1E00..1EFF; LATIN38680x1F00, // 1F00..1FFF; GREEK38690x2000, // 2000..200B; COMMON38700x200C, // 200C..200D; INHERITED38710x200E, // 200E..2070; COMMON38720x2071, // 2071..2073; LATIN38730x2074, // 2074..207E; COMMON38740x207F, // 207F..207F; LATIN38750x2080, // 2080..208F; COMMON38760x2090, // 2090..209F; LATIN38770x20A0, // 20A0..20CF; COMMON38780x20D0, // 20D0..20FF; INHERITED38790x2100, // 2100..2125; COMMON38800x2126, // 2126..2126; GREEK38810x2127, // 2127..2129; COMMON38820x212A, // 212A..212B; LATIN38830x212C, // 212C..2131; COMMON38840x2132, // 2132..2132; LATIN38850x2133, // 2133..214D; COMMON38860x214E, // 214E..214E; LATIN38870x214F, // 214F..215F; COMMON38880x2160, // 2160..2188; LATIN38890x2189, // 2189..27FF; COMMON38900x2800, // 2800..28FF; BRAILLE38910x2900, // 2900..2BFF; COMMON38920x2C00, // 2C00..2C5F; GLAGOLITIC38930x2C60, // 2C60..2C7F; LATIN38940x2C80, // 2C80..2CFF; COPTIC38950x2D00, // 2D00..2D2F; GEORGIAN38960x2D30, // 2D30..2D7F; TIFINAGH38970x2D80, // 2D80..2DDF; ETHIOPIC38980x2DE0, // 2DE0..2DFF; CYRILLIC38990x2E00, // 2E00..2E7F; COMMON39000x2E80, // 2E80..2FEF; HAN39010x2FF0, // 2FF0..3004; COMMON39020x3005, // 3005..3005; HAN39030x3006, // 3006..3006; COMMON39040x3007, // 3007..3007; HAN39050x3008, // 3008..3020; COMMON39060x3021, // 3021..3029; HAN39070x302A, // 302A..302D; INHERITED39080x302E, // 302E..302F; HANGUL39090x3030, // 3030..3037; COMMON39100x3038, // 3038..303B; HAN39110x303C, // 303C..3040; COMMON39120x3041, // 3041..3098; HIRAGANA39130x3099, // 3099..309A; INHERITED39140x309B, // 309B..309C; COMMON39150x309D, // 309D..309F; HIRAGANA39160x30A0, // 30A0..30A0; COMMON39170x30A1, // 30A1..30FA; KATAKANA39180x30FB, // 30FB..30FC; COMMON39190x30FD, // 30FD..3104; KATAKANA39200x3105, // 3105..3130; BOPOMOFO39210x3131, // 3131..318F; HANGUL39220x3190, // 3190..319F; COMMON39230x31A0, // 31A0..31BF; BOPOMOFO39240x31C0, // 31C0..31EF; COMMON39250x31F0, // 31F0..31FF; KATAKANA39260x3200, // 3200..321F; HANGUL39270x3220, // 3220..325F; COMMON39280x3260, // 3260..327E; HANGUL39290x327F, // 327F..32CF; COMMON39300x32D0, // 32D0..32FE; KATAKANA39310x32FF, // 32FF ; COMMON39320x3300, // 3300..3357; KATAKANA39330x3358, // 3358..33FF; COMMON39340x3400, // 3400..4DBF; HAN39350x4DC0, // 4DC0..4DFF; COMMON39360x4E00, // 4E00..9FFF; HAN39370xA000, // A000..A4CF; YI39380xA4D0, // A4D0..A4FF; LISU39390xA500, // A500..A63F; VAI39400xA640, // A640..A69F; CYRILLIC39410xA6A0, // A6A0..A6FF; BAMUM39420xA700, // A700..A721; COMMON39430xA722, // A722..A787; LATIN39440xA788, // A788..A78A; COMMON39450xA78B, // A78B..A7FF; LATIN39460xA800, // A800..A82F; SYLOTI_NAGRI39470xA830, // A830..A83F; COMMON39480xA840, // A840..A87F; PHAGS_PA39490xA880, // A880..A8DF; SAURASHTRA39500xA8E0, // A8E0..A8FF; DEVANAGARI39510xA900, // A900..A92F; KAYAH_LI39520xA930, // A930..A95F; REJANG39530xA960, // A960..A97F; HANGUL39540xA980, // A980..A9FF; JAVANESE39550xAA00, // AA00..AA5F; CHAM39560xAA60, // AA60..AA7F; MYANMAR39570xAA80, // AA80..AADF; TAI_VIET39580xAAE0, // AAE0..AB00; MEETEI_MAYEK39590xAB01, // AB01..ABBF; ETHIOPIC39600xABC0, // ABC0..ABFF; MEETEI_MAYEK39610xAC00, // AC00..D7FB; HANGUL39620xD7FC, // D7FC..F8FF; UNKNOWN39630xF900, // F900..FAFF; HAN39640xFB00, // FB00..FB12; LATIN39650xFB13, // FB13..FB1C; ARMENIAN39660xFB1D, // FB1D..FB4F; HEBREW39670xFB50, // FB50..FD3D; ARABIC39680xFD3E, // FD3E..FD4F; COMMON39690xFD50, // FD50..FDFC; ARABIC39700xFDFD, // FDFD..FDFF; COMMON39710xFE00, // FE00..FE0F; INHERITED39720xFE10, // FE10..FE1F; COMMON39730xFE20, // FE20..FE2F; INHERITED39740xFE30, // FE30..FE6F; COMMON39750xFE70, // FE70..FEFE; ARABIC39760xFEFF, // FEFF..FF20; COMMON39770xFF21, // FF21..FF3A; LATIN39780xFF3B, // FF3B..FF40; COMMON39790xFF41, // FF41..FF5A; LATIN39800xFF5B, // FF5B..FF65; COMMON39810xFF66, // FF66..FF6F; KATAKANA39820xFF70, // FF70..FF70; COMMON39830xFF71, // FF71..FF9D; KATAKANA39840xFF9E, // FF9E..FF9F; COMMON39850xFFA0, // FFA0..FFDF; HANGUL39860xFFE0, // FFE0..FFFF; COMMON39870x10000, // 10000..100FF; LINEAR_B39880x10100, // 10100..1013F; COMMON39890x10140, // 10140..1018F; GREEK39900x10190, // 10190..101FC; COMMON39910x101FD, // 101FD..1027F; INHERITED39920x10280, // 10280..1029F; LYCIAN39930x102A0, // 102A0..102FF; CARIAN39940x10300, // 10300..1032F; OLD_ITALIC39950x10330, // 10330..1037F; GOTHIC39960x10380, // 10380..1039F; UGARITIC39970x103A0, // 103A0..103FF; OLD_PERSIAN39980x10400, // 10400..1044F; DESERET39990x10450, // 10450..1047F; SHAVIAN40000x10480, // 10480..107FF; OSMANYA40010x10800, // 10800..1083F; CYPRIOT40020x10840, // 10840..108FF; IMPERIAL_ARAMAIC40030x10900, // 10900..1091F; PHOENICIAN40040x10920, // 10920..1097F; LYDIAN40050x10980, // 10980..1099F; MEROITIC_HIEROGLYPHS40060x109A0, // 109A0..109FF; MEROITIC_CURSIVE40070x10A00, // 10A00..10A5F; KHAROSHTHI40080x10A60, // 10A60..10AFF; OLD_SOUTH_ARABIAN40090x10B00, // 10B00..10B3F; AVESTAN40100x10B40, // 10B40..10B5F; INSCRIPTIONAL_PARTHIAN40110x10B60, // 10B60..10BFF; INSCRIPTIONAL_PAHLAVI40120x10C00, // 10C00..10E5F; OLD_TURKIC40130x10E60, // 10E60..10FFF; ARABIC40140x11000, // 11000..1107F; BRAHMI40150x11080, // 11080..110CF; KAITHI40160x110D0, // 110D0..110FF; SORA_SOMPENG40170x11100, // 11100..1117F; CHAKMA40180x11180, // 11180..1167F; SHARADA40190x11680, // 11680..116CF; TAKRI40200x12000, // 12000..12FFF; CUNEIFORM40210x13000, // 13000..167FF; EGYPTIAN_HIEROGLYPHS40220x16800, // 16800..16A38; BAMUM40230x16F00, // 16F00..16F9F; MIAO40240x1B000, // 1B000..1B000; KATAKANA40250x1B001, // 1B001..1CFFF; HIRAGANA40260x1D000, // 1D000..1D166; COMMON40270x1D167, // 1D167..1D169; INHERITED40280x1D16A, // 1D16A..1D17A; COMMON40290x1D17B, // 1D17B..1D182; INHERITED40300x1D183, // 1D183..1D184; COMMON40310x1D185, // 1D185..1D18B; INHERITED40320x1D18C, // 1D18C..1D1A9; COMMON40330x1D1AA, // 1D1AA..1D1AD; INHERITED40340x1D1AE, // 1D1AE..1D1FF; COMMON40350x1D200, // 1D200..1D2FF; GREEK40360x1D300, // 1D300..1EDFF; COMMON40370x1EE00, // 1EE00..1EFFF; ARABIC40380x1F000, // 1F000..1F1FF; COMMON40390x1F200, // 1F200..1F200; HIRAGANA40400x1F201, // 1F210..1FFFF; COMMON40410x20000, // 20000..E0000; HAN40420xE0001, // E0001..E00FF; COMMON40430xE0100, // E0100..E01EF; INHERITED40440xE01F0 // E01F0..10FFFF; UNKNOWN40454046};40474048private static final UnicodeScript[] scripts = {4049COMMON,4050LATIN,4051COMMON,4052LATIN,4053COMMON,4054LATIN,4055COMMON,4056LATIN,4057COMMON,4058LATIN,4059COMMON,4060LATIN,4061COMMON,4062LATIN,4063COMMON,4064LATIN,4065COMMON,4066BOPOMOFO,4067COMMON,4068INHERITED,4069GREEK,4070COMMON,4071GREEK,4072COMMON,4073GREEK,4074COMMON,4075GREEK,4076COMMON,4077GREEK,4078COPTIC,4079GREEK,4080CYRILLIC,4081INHERITED,4082CYRILLIC,4083ARMENIAN,4084COMMON,4085ARMENIAN,4086HEBREW,4087ARABIC,4088COMMON,4089ARABIC,4090COMMON,4091ARABIC,4092COMMON,4093ARABIC,4094COMMON,4095ARABIC,4096INHERITED,4097ARABIC,4098COMMON,4099ARABIC,4100INHERITED,4101ARABIC,4102COMMON,4103ARABIC,4104SYRIAC,4105ARABIC,4106THAANA,4107NKO,4108SAMARITAN,4109MANDAIC,4110ARABIC,4111DEVANAGARI,4112INHERITED,4113DEVANAGARI,4114COMMON,4115DEVANAGARI,4116BENGALI,4117GURMUKHI,4118GUJARATI,4119ORIYA,4120TAMIL,4121TELUGU,4122KANNADA,4123MALAYALAM,4124SINHALA,4125THAI,4126COMMON,4127THAI,4128LAO,4129TIBETAN,4130COMMON,4131TIBETAN,4132MYANMAR,4133GEORGIAN,4134COMMON,4135GEORGIAN,4136HANGUL,4137ETHIOPIC,4138CHEROKEE,4139CANADIAN_ABORIGINAL,4140OGHAM,4141RUNIC,4142COMMON,4143RUNIC,4144TAGALOG,4145HANUNOO,4146COMMON,4147BUHID,4148TAGBANWA,4149KHMER,4150MONGOLIAN,4151COMMON,4152MONGOLIAN,4153COMMON,4154MONGOLIAN,4155CANADIAN_ABORIGINAL,4156LIMBU,4157TAI_LE,4158NEW_TAI_LUE,4159KHMER,4160BUGINESE,4161TAI_THAM,4162BALINESE,4163SUNDANESE,4164BATAK,4165LEPCHA,4166OL_CHIKI,4167SUNDANESE,4168INHERITED,4169COMMON,4170INHERITED,4171COMMON,4172INHERITED,4173COMMON,4174INHERITED,4175COMMON,4176INHERITED,4177COMMON,4178LATIN,4179GREEK,4180CYRILLIC,4181LATIN,4182GREEK,4183LATIN,4184GREEK,4185LATIN,4186CYRILLIC,4187LATIN,4188GREEK,4189INHERITED,4190LATIN,4191GREEK,4192COMMON,4193INHERITED,4194COMMON,4195LATIN,4196COMMON,4197LATIN,4198COMMON,4199LATIN,4200COMMON,4201INHERITED,4202COMMON,4203GREEK,4204COMMON,4205LATIN,4206COMMON,4207LATIN,4208COMMON,4209LATIN,4210COMMON,4211LATIN,4212COMMON,4213BRAILLE,4214COMMON,4215GLAGOLITIC,4216LATIN,4217COPTIC,4218GEORGIAN,4219TIFINAGH,4220ETHIOPIC,4221CYRILLIC,4222COMMON,4223HAN,4224COMMON,4225HAN,4226COMMON,4227HAN,4228COMMON,4229HAN,4230INHERITED,4231HANGUL,4232COMMON,4233HAN,4234COMMON,4235HIRAGANA,4236INHERITED,4237COMMON,4238HIRAGANA,4239COMMON,4240KATAKANA,4241COMMON,4242KATAKANA,4243BOPOMOFO,4244HANGUL,4245COMMON,4246BOPOMOFO,4247COMMON,4248KATAKANA,4249HANGUL,4250COMMON,4251HANGUL,4252COMMON,4253KATAKANA, // 32D0..32FE4254COMMON, // 32FF4255KATAKANA, // 3300..33574256COMMON,4257HAN,4258COMMON,4259HAN,4260YI,4261LISU,4262VAI,4263CYRILLIC,4264BAMUM,4265COMMON,4266LATIN,4267COMMON,4268LATIN,4269SYLOTI_NAGRI,4270COMMON,4271PHAGS_PA,4272SAURASHTRA,4273DEVANAGARI,4274KAYAH_LI,4275REJANG,4276HANGUL,4277JAVANESE,4278CHAM,4279MYANMAR,4280TAI_VIET,4281MEETEI_MAYEK,4282ETHIOPIC,4283MEETEI_MAYEK,4284HANGUL,4285UNKNOWN ,4286HAN,4287LATIN,4288ARMENIAN,4289HEBREW,4290ARABIC,4291COMMON,4292ARABIC,4293COMMON,4294INHERITED,4295COMMON,4296INHERITED,4297COMMON,4298ARABIC,4299COMMON,4300LATIN,4301COMMON,4302LATIN,4303COMMON,4304KATAKANA,4305COMMON,4306KATAKANA,4307COMMON,4308HANGUL,4309COMMON,4310LINEAR_B,4311COMMON,4312GREEK,4313COMMON,4314INHERITED,4315LYCIAN,4316CARIAN,4317OLD_ITALIC,4318GOTHIC,4319UGARITIC,4320OLD_PERSIAN,4321DESERET,4322SHAVIAN,4323OSMANYA,4324CYPRIOT,4325IMPERIAL_ARAMAIC,4326PHOENICIAN,4327LYDIAN,4328MEROITIC_HIEROGLYPHS,4329MEROITIC_CURSIVE,4330KHAROSHTHI,4331OLD_SOUTH_ARABIAN,4332AVESTAN,4333INSCRIPTIONAL_PARTHIAN,4334INSCRIPTIONAL_PAHLAVI,4335OLD_TURKIC,4336ARABIC,4337BRAHMI,4338KAITHI,4339SORA_SOMPENG,4340CHAKMA,4341SHARADA,4342TAKRI,4343CUNEIFORM,4344EGYPTIAN_HIEROGLYPHS,4345BAMUM,4346MIAO,4347KATAKANA,4348HIRAGANA,4349COMMON,4350INHERITED,4351COMMON,4352INHERITED,4353COMMON,4354INHERITED,4355COMMON,4356INHERITED,4357COMMON,4358GREEK,4359COMMON,4360ARABIC,4361COMMON,4362HIRAGANA,4363COMMON,4364HAN,4365COMMON,4366INHERITED,4367UNKNOWN4368};43694370private static final HashMap<String, Character.UnicodeScript> aliases;4371static {4372aliases = new HashMap<>(128);4373aliases.put("ARAB", ARABIC);4374aliases.put("ARMI", IMPERIAL_ARAMAIC);4375aliases.put("ARMN", ARMENIAN);4376aliases.put("AVST", AVESTAN);4377aliases.put("BALI", BALINESE);4378aliases.put("BAMU", BAMUM);4379aliases.put("BATK", BATAK);4380aliases.put("BENG", BENGALI);4381aliases.put("BOPO", BOPOMOFO);4382aliases.put("BRAI", BRAILLE);4383aliases.put("BRAH", BRAHMI);4384aliases.put("BUGI", BUGINESE);4385aliases.put("BUHD", BUHID);4386aliases.put("CAKM", CHAKMA);4387aliases.put("CANS", CANADIAN_ABORIGINAL);4388aliases.put("CARI", CARIAN);4389aliases.put("CHAM", CHAM);4390aliases.put("CHER", CHEROKEE);4391aliases.put("COPT", COPTIC);4392aliases.put("CPRT", CYPRIOT);4393aliases.put("CYRL", CYRILLIC);4394aliases.put("DEVA", DEVANAGARI);4395aliases.put("DSRT", DESERET);4396aliases.put("EGYP", EGYPTIAN_HIEROGLYPHS);4397aliases.put("ETHI", ETHIOPIC);4398aliases.put("GEOR", GEORGIAN);4399aliases.put("GLAG", GLAGOLITIC);4400aliases.put("GOTH", GOTHIC);4401aliases.put("GREK", GREEK);4402aliases.put("GUJR", GUJARATI);4403aliases.put("GURU", GURMUKHI);4404aliases.put("HANG", HANGUL);4405aliases.put("HANI", HAN);4406aliases.put("HANO", HANUNOO);4407aliases.put("HEBR", HEBREW);4408aliases.put("HIRA", HIRAGANA);4409// it appears we don't have the KATAKANA_OR_HIRAGANA4410//aliases.put("HRKT", KATAKANA_OR_HIRAGANA);4411aliases.put("ITAL", OLD_ITALIC);4412aliases.put("JAVA", JAVANESE);4413aliases.put("KALI", KAYAH_LI);4414aliases.put("KANA", KATAKANA);4415aliases.put("KHAR", KHAROSHTHI);4416aliases.put("KHMR", KHMER);4417aliases.put("KNDA", KANNADA);4418aliases.put("KTHI", KAITHI);4419aliases.put("LANA", TAI_THAM);4420aliases.put("LAOO", LAO);4421aliases.put("LATN", LATIN);4422aliases.put("LEPC", LEPCHA);4423aliases.put("LIMB", LIMBU);4424aliases.put("LINB", LINEAR_B);4425aliases.put("LISU", LISU);4426aliases.put("LYCI", LYCIAN);4427aliases.put("LYDI", LYDIAN);4428aliases.put("MAND", MANDAIC);4429aliases.put("MERC", MEROITIC_CURSIVE);4430aliases.put("MERO", MEROITIC_HIEROGLYPHS);4431aliases.put("MLYM", MALAYALAM);4432aliases.put("MONG", MONGOLIAN);4433aliases.put("MTEI", MEETEI_MAYEK);4434aliases.put("MYMR", MYANMAR);4435aliases.put("NKOO", NKO);4436aliases.put("OGAM", OGHAM);4437aliases.put("OLCK", OL_CHIKI);4438aliases.put("ORKH", OLD_TURKIC);4439aliases.put("ORYA", ORIYA);4440aliases.put("OSMA", OSMANYA);4441aliases.put("PHAG", PHAGS_PA);4442aliases.put("PLRD", MIAO);4443aliases.put("PHLI", INSCRIPTIONAL_PAHLAVI);4444aliases.put("PHNX", PHOENICIAN);4445aliases.put("PRTI", INSCRIPTIONAL_PARTHIAN);4446aliases.put("RJNG", REJANG);4447aliases.put("RUNR", RUNIC);4448aliases.put("SAMR", SAMARITAN);4449aliases.put("SARB", OLD_SOUTH_ARABIAN);4450aliases.put("SAUR", SAURASHTRA);4451aliases.put("SHAW", SHAVIAN);4452aliases.put("SHRD", SHARADA);4453aliases.put("SINH", SINHALA);4454aliases.put("SORA", SORA_SOMPENG);4455aliases.put("SUND", SUNDANESE);4456aliases.put("SYLO", SYLOTI_NAGRI);4457aliases.put("SYRC", SYRIAC);4458aliases.put("TAGB", TAGBANWA);4459aliases.put("TALE", TAI_LE);4460aliases.put("TAKR", TAKRI);4461aliases.put("TALU", NEW_TAI_LUE);4462aliases.put("TAML", TAMIL);4463aliases.put("TAVT", TAI_VIET);4464aliases.put("TELU", TELUGU);4465aliases.put("TFNG", TIFINAGH);4466aliases.put("TGLG", TAGALOG);4467aliases.put("THAA", THAANA);4468aliases.put("THAI", THAI);4469aliases.put("TIBT", TIBETAN);4470aliases.put("UGAR", UGARITIC);4471aliases.put("VAII", VAI);4472aliases.put("XPEO", OLD_PERSIAN);4473aliases.put("XSUX", CUNEIFORM);4474aliases.put("YIII", YI);4475aliases.put("ZINH", INHERITED);4476aliases.put("ZYYY", COMMON);4477aliases.put("ZZZZ", UNKNOWN);4478}44794480/**4481* Returns the enum constant representing the Unicode script of which4482* the given character (Unicode code point) is assigned to.4483*4484* @param codePoint the character (Unicode code point) in question.4485* @return The {@code UnicodeScript} constant representing the4486* Unicode script of which this character is assigned to.4487*4488* @exception IllegalArgumentException if the specified4489* {@code codePoint} is an invalid Unicode code point.4490* @see Character#isValidCodePoint(int)4491*4492*/4493public static UnicodeScript of(int codePoint) {4494if (!isValidCodePoint(codePoint))4495throw new IllegalArgumentException();4496int type = getType(codePoint);4497// leave SURROGATE and PRIVATE_USE for table lookup4498if (type == UNASSIGNED)4499return UNKNOWN;4500int index = Arrays.binarySearch(scriptStarts, codePoint);4501if (index < 0)4502index = -index - 2;4503return scripts[index];4504}45054506/**4507* Returns the UnicodeScript constant with the given Unicode script4508* name or the script name alias. Script names and their aliases are4509* determined by The Unicode Standard. The files Scripts<version>.txt4510* and PropertyValueAliases<version>.txt define script names4511* and the script name aliases for a particular version of the4512* standard. The {@link Character} class specifies the version of4513* the standard that it supports.4514* <p>4515* Character case is ignored for all of the valid script names.4516* The en_US locale's case mapping rules are used to provide4517* case-insensitive string comparisons for script name validation.4518* <p>4519*4520* @param scriptName A {@code UnicodeScript} name.4521* @return The {@code UnicodeScript} constant identified4522* by {@code scriptName}4523* @throws IllegalArgumentException if {@code scriptName} is an4524* invalid name4525* @throws NullPointerException if {@code scriptName} is null4526*/4527public static final UnicodeScript forName(String scriptName) {4528scriptName = scriptName.toUpperCase(Locale.ENGLISH);4529//.replace(' ', '_'));4530UnicodeScript sc = aliases.get(scriptName);4531if (sc != null)4532return sc;4533return valueOf(scriptName);4534}4535}45364537/**4538* The value of the {@code Character}.4539*4540* @serial4541*/4542private final char value;45434544/** use serialVersionUID from JDK 1.0.2 for interoperability */4545private static final long serialVersionUID = 3786198910865385080L;45464547/**4548* Constructs a newly allocated {@code Character} object that4549* represents the specified {@code char} value.4550*4551* @param value the value to be represented by the4552* {@code Character} object.4553*/4554public Character(char value) {4555this.value = value;4556}45574558private static class CharacterCache {4559private CharacterCache(){}45604561static final Character cache[] = new Character[127 + 1];45624563static {4564for (int i = 0; i < cache.length; i++)4565cache[i] = new Character((char)i);4566}4567}45684569/**4570* Returns a <tt>Character</tt> instance representing the specified4571* <tt>char</tt> value.4572* If a new <tt>Character</tt> instance is not required, this method4573* should generally be used in preference to the constructor4574* {@link #Character(char)}, as this method is likely to yield4575* significantly better space and time performance by caching4576* frequently requested values.4577*4578* This method will always cache values in the range {@code4579* '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may4580* cache other values outside of this range.4581*4582* @param c a char value.4583* @return a <tt>Character</tt> instance representing <tt>c</tt>.4584* @since 1.54585*/4586public static Character valueOf(char c) {4587if (c <= 127) { // must cache4588return CharacterCache.cache[(int)c];4589}4590return new Character(c);4591}45924593/**4594* Returns the value of this {@code Character} object.4595* @return the primitive {@code char} value represented by4596* this object.4597*/4598public char charValue() {4599return value;4600}46014602/**4603* Returns a hash code for this {@code Character}; equal to the result4604* of invoking {@code charValue()}.4605*4606* @return a hash code value for this {@code Character}4607*/4608@Override4609public int hashCode() {4610return Character.hashCode(value);4611}46124613/**4614* Returns a hash code for a {@code char} value; compatible with4615* {@code Character.hashCode()}.4616*4617* @since 1.84618*4619* @param value The {@code char} for which to return a hash code.4620* @return a hash code value for a {@code char} value.4621*/4622public static int hashCode(char value) {4623return (int)value;4624}46254626/**4627* Compares this object against the specified object.4628* The result is {@code true} if and only if the argument is not4629* {@code null} and is a {@code Character} object that4630* represents the same {@code char} value as this object.4631*4632* @param obj the object to compare with.4633* @return {@code true} if the objects are the same;4634* {@code false} otherwise.4635*/4636public boolean equals(Object obj) {4637if (obj instanceof Character) {4638return value == ((Character)obj).charValue();4639}4640return false;4641}46424643/**4644* Returns a {@code String} object representing this4645* {@code Character}'s value. The result is a string of4646* length 1 whose sole component is the primitive4647* {@code char} value represented by this4648* {@code Character} object.4649*4650* @return a string representation of this object.4651*/4652public String toString() {4653return String.valueOf(value);4654}46554656/**4657* Returns a {@code String} object representing the4658* specified {@code char}. The result is a string of length4659* 1 consisting solely of the specified {@code char}.4660*4661* @param c the {@code char} to be converted4662* @return the string representation of the specified {@code char}4663* @since 1.44664*/4665public static String toString(char c) {4666return String.valueOf(c);4667}46684669/**4670* Determines whether the specified code point is a valid4671* <a href="http://www.unicode.org/glossary/#code_point">4672* Unicode code point value</a>.4673*4674* @param codePoint the Unicode code point to be tested4675* @return {@code true} if the specified code point value is between4676* {@link #MIN_CODE_POINT} and4677* {@link #MAX_CODE_POINT} inclusive;4678* {@code false} otherwise.4679* @since 1.54680*/4681public static boolean isValidCodePoint(int codePoint) {4682// Optimized form of:4683// codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT4684int plane = codePoint >>> 16;4685return plane < ((MAX_CODE_POINT + 1) >>> 16);4686}46874688/**4689* Determines whether the specified character (Unicode code point)4690* is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>.4691* Such code points can be represented using a single {@code char}.4692*4693* @param codePoint the character (Unicode code point) to be tested4694* @return {@code true} if the specified code point is between4695* {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive;4696* {@code false} otherwise.4697* @since 1.74698*/4699public static boolean isBmpCodePoint(int codePoint) {4700return codePoint >>> 16 == 0;4701// Optimized form of:4702// codePoint >= MIN_VALUE && codePoint <= MAX_VALUE4703// We consistently use logical shift (>>>) to facilitate4704// additional runtime optimizations.4705}47064707/**4708* Determines whether the specified character (Unicode code point)4709* is in the <a href="#supplementary">supplementary character</a> range.4710*4711* @param codePoint the character (Unicode code point) to be tested4712* @return {@code true} if the specified code point is between4713* {@link #MIN_SUPPLEMENTARY_CODE_POINT} and4714* {@link #MAX_CODE_POINT} inclusive;4715* {@code false} otherwise.4716* @since 1.54717*/4718public static boolean isSupplementaryCodePoint(int codePoint) {4719return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT4720&& codePoint < MAX_CODE_POINT + 1;4721}47224723/**4724* Determines if the given {@code char} value is a4725* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">4726* Unicode high-surrogate code unit</a>4727* (also known as <i>leading-surrogate code unit</i>).4728*4729* <p>Such values do not represent characters by themselves,4730* but are used in the representation of4731* <a href="#supplementary">supplementary characters</a>4732* in the UTF-16 encoding.4733*4734* @param ch the {@code char} value to be tested.4735* @return {@code true} if the {@code char} value is between4736* {@link #MIN_HIGH_SURROGATE} and4737* {@link #MAX_HIGH_SURROGATE} inclusive;4738* {@code false} otherwise.4739* @see Character#isLowSurrogate(char)4740* @see Character.UnicodeBlock#of(int)4741* @since 1.54742*/4743public static boolean isHighSurrogate(char ch) {4744// Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE4745return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);4746}47474748/**4749* Determines if the given {@code char} value is a4750* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">4751* Unicode low-surrogate code unit</a>4752* (also known as <i>trailing-surrogate code unit</i>).4753*4754* <p>Such values do not represent characters by themselves,4755* but are used in the representation of4756* <a href="#supplementary">supplementary characters</a>4757* in the UTF-16 encoding.4758*4759* @param ch the {@code char} value to be tested.4760* @return {@code true} if the {@code char} value is between4761* {@link #MIN_LOW_SURROGATE} and4762* {@link #MAX_LOW_SURROGATE} inclusive;4763* {@code false} otherwise.4764* @see Character#isHighSurrogate(char)4765* @since 1.54766*/4767public static boolean isLowSurrogate(char ch) {4768return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);4769}47704771/**4772* Determines if the given {@code char} value is a Unicode4773* <i>surrogate code unit</i>.4774*4775* <p>Such values do not represent characters by themselves,4776* but are used in the representation of4777* <a href="#supplementary">supplementary characters</a>4778* in the UTF-16 encoding.4779*4780* <p>A char value is a surrogate code unit if and only if it is either4781* a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or4782* a {@linkplain #isHighSurrogate(char) high-surrogate code unit}.4783*4784* @param ch the {@code char} value to be tested.4785* @return {@code true} if the {@code char} value is between4786* {@link #MIN_SURROGATE} and4787* {@link #MAX_SURROGATE} inclusive;4788* {@code false} otherwise.4789* @since 1.74790*/4791public static boolean isSurrogate(char ch) {4792return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1);4793}47944795/**4796* Determines whether the specified pair of {@code char}4797* values is a valid4798* <a href="http://www.unicode.org/glossary/#surrogate_pair">4799* Unicode surrogate pair</a>.48004801* <p>This method is equivalent to the expression:4802* <blockquote><pre>{@code4803* isHighSurrogate(high) && isLowSurrogate(low)4804* }</pre></blockquote>4805*4806* @param high the high-surrogate code value to be tested4807* @param low the low-surrogate code value to be tested4808* @return {@code true} if the specified high and4809* low-surrogate code values represent a valid surrogate pair;4810* {@code false} otherwise.4811* @since 1.54812*/4813public static boolean isSurrogatePair(char high, char low) {4814return isHighSurrogate(high) && isLowSurrogate(low);4815}48164817/**4818* Determines the number of {@code char} values needed to4819* represent the specified character (Unicode code point). If the4820* specified character is equal to or greater than 0x10000, then4821* the method returns 2. Otherwise, the method returns 1.4822*4823* <p>This method doesn't validate the specified character to be a4824* valid Unicode code point. The caller must validate the4825* character value using {@link #isValidCodePoint(int) isValidCodePoint}4826* if necessary.4827*4828* @param codePoint the character (Unicode code point) to be tested.4829* @return 2 if the character is a valid supplementary character; 1 otherwise.4830* @see Character#isSupplementaryCodePoint(int)4831* @since 1.54832*/4833public static int charCount(int codePoint) {4834return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;4835}48364837/**4838* Converts the specified surrogate pair to its supplementary code4839* point value. This method does not validate the specified4840* surrogate pair. The caller must validate it using {@link4841* #isSurrogatePair(char, char) isSurrogatePair} if necessary.4842*4843* @param high the high-surrogate code unit4844* @param low the low-surrogate code unit4845* @return the supplementary code point composed from the4846* specified surrogate pair.4847* @since 1.54848*/4849public static int toCodePoint(char high, char low) {4850// Optimized form of:4851// return ((high - MIN_HIGH_SURROGATE) << 10)4852// + (low - MIN_LOW_SURROGATE)4853// + MIN_SUPPLEMENTARY_CODE_POINT;4854return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT4855- (MIN_HIGH_SURROGATE << 10)4856- MIN_LOW_SURROGATE);4857}48584859/**4860* Returns the code point at the given index of the4861* {@code CharSequence}. If the {@code char} value at4862* the given index in the {@code CharSequence} is in the4863* high-surrogate range, the following index is less than the4864* length of the {@code CharSequence}, and the4865* {@code char} value at the following index is in the4866* low-surrogate range, then the supplementary code point4867* corresponding to this surrogate pair is returned. Otherwise,4868* the {@code char} value at the given index is returned.4869*4870* @param seq a sequence of {@code char} values (Unicode code4871* units)4872* @param index the index to the {@code char} values (Unicode4873* code units) in {@code seq} to be converted4874* @return the Unicode code point at the given index4875* @exception NullPointerException if {@code seq} is null.4876* @exception IndexOutOfBoundsException if the value4877* {@code index} is negative or not less than4878* {@link CharSequence#length() seq.length()}.4879* @since 1.54880*/4881public static int codePointAt(CharSequence seq, int index) {4882char c1 = seq.charAt(index);4883if (isHighSurrogate(c1) && ++index < seq.length()) {4884char c2 = seq.charAt(index);4885if (isLowSurrogate(c2)) {4886return toCodePoint(c1, c2);4887}4888}4889return c1;4890}48914892/**4893* Returns the code point at the given index of the4894* {@code char} array. If the {@code char} value at4895* the given index in the {@code char} array is in the4896* high-surrogate range, the following index is less than the4897* length of the {@code char} array, and the4898* {@code char} value at the following index is in the4899* low-surrogate range, then the supplementary code point4900* corresponding to this surrogate pair is returned. Otherwise,4901* the {@code char} value at the given index is returned.4902*4903* @param a the {@code char} array4904* @param index the index to the {@code char} values (Unicode4905* code units) in the {@code char} array to be converted4906* @return the Unicode code point at the given index4907* @exception NullPointerException if {@code a} is null.4908* @exception IndexOutOfBoundsException if the value4909* {@code index} is negative or not less than4910* the length of the {@code char} array.4911* @since 1.54912*/4913public static int codePointAt(char[] a, int index) {4914return codePointAtImpl(a, index, a.length);4915}49164917/**4918* Returns the code point at the given index of the4919* {@code char} array, where only array elements with4920* {@code index} less than {@code limit} can be used. If4921* the {@code char} value at the given index in the4922* {@code char} array is in the high-surrogate range, the4923* following index is less than the {@code limit}, and the4924* {@code char} value at the following index is in the4925* low-surrogate range, then the supplementary code point4926* corresponding to this surrogate pair is returned. Otherwise,4927* the {@code char} value at the given index is returned.4928*4929* @param a the {@code char} array4930* @param index the index to the {@code char} values (Unicode4931* code units) in the {@code char} array to be converted4932* @param limit the index after the last array element that4933* can be used in the {@code char} array4934* @return the Unicode code point at the given index4935* @exception NullPointerException if {@code a} is null.4936* @exception IndexOutOfBoundsException if the {@code index}4937* argument is negative or not less than the {@code limit}4938* argument, or if the {@code limit} argument is negative or4939* greater than the length of the {@code char} array.4940* @since 1.54941*/4942public static int codePointAt(char[] a, int index, int limit) {4943if (index >= limit || limit < 0 || limit > a.length) {4944throw new IndexOutOfBoundsException();4945}4946return codePointAtImpl(a, index, limit);4947}49484949// throws ArrayIndexOutOfBoundsException if index out of bounds4950static int codePointAtImpl(char[] a, int index, int limit) {4951char c1 = a[index];4952if (isHighSurrogate(c1) && ++index < limit) {4953char c2 = a[index];4954if (isLowSurrogate(c2)) {4955return toCodePoint(c1, c2);4956}4957}4958return c1;4959}49604961/**4962* Returns the code point preceding the given index of the4963* {@code CharSequence}. If the {@code char} value at4964* {@code (index - 1)} in the {@code CharSequence} is in4965* the low-surrogate range, {@code (index - 2)} is not4966* negative, and the {@code char} value at {@code (index - 2)}4967* in the {@code CharSequence} is in the4968* high-surrogate range, then the supplementary code point4969* corresponding to this surrogate pair is returned. Otherwise,4970* the {@code char} value at {@code (index - 1)} is4971* returned.4972*4973* @param seq the {@code CharSequence} instance4974* @param index the index following the code point that should be returned4975* @return the Unicode code point value before the given index.4976* @exception NullPointerException if {@code seq} is null.4977* @exception IndexOutOfBoundsException if the {@code index}4978* argument is less than 1 or greater than {@link4979* CharSequence#length() seq.length()}.4980* @since 1.54981*/4982public static int codePointBefore(CharSequence seq, int index) {4983char c2 = seq.charAt(--index);4984if (isLowSurrogate(c2) && index > 0) {4985char c1 = seq.charAt(--index);4986if (isHighSurrogate(c1)) {4987return toCodePoint(c1, c2);4988}4989}4990return c2;4991}49924993/**4994* Returns the code point preceding the given index of the4995* {@code char} array. If the {@code char} value at4996* {@code (index - 1)} in the {@code char} array is in4997* the low-surrogate range, {@code (index - 2)} is not4998* negative, and the {@code char} value at {@code (index - 2)}4999* in the {@code char} array is in the5000* high-surrogate range, then the supplementary code point5001* corresponding to this surrogate pair is returned. Otherwise,5002* the {@code char} value at {@code (index - 1)} is5003* returned.5004*5005* @param a the {@code char} array5006* @param index the index following the code point that should be returned5007* @return the Unicode code point value before the given index.5008* @exception NullPointerException if {@code a} is null.5009* @exception IndexOutOfBoundsException if the {@code index}5010* argument is less than 1 or greater than the length of the5011* {@code char} array5012* @since 1.55013*/5014public static int codePointBefore(char[] a, int index) {5015return codePointBeforeImpl(a, index, 0);5016}50175018/**5019* Returns the code point preceding the given index of the5020* {@code char} array, where only array elements with5021* {@code index} greater than or equal to {@code start}5022* can be used. If the {@code char} value at {@code (index - 1)}5023* in the {@code char} array is in the5024* low-surrogate range, {@code (index - 2)} is not less than5025* {@code start}, and the {@code char} value at5026* {@code (index - 2)} in the {@code char} array is in5027* the high-surrogate range, then the supplementary code point5028* corresponding to this surrogate pair is returned. Otherwise,5029* the {@code char} value at {@code (index - 1)} is5030* returned.5031*5032* @param a the {@code char} array5033* @param index the index following the code point that should be returned5034* @param start the index of the first array element in the5035* {@code char} array5036* @return the Unicode code point value before the given index.5037* @exception NullPointerException if {@code a} is null.5038* @exception IndexOutOfBoundsException if the {@code index}5039* argument is not greater than the {@code start} argument or5040* is greater than the length of the {@code char} array, or5041* if the {@code start} argument is negative or not less than5042* the length of the {@code char} array.5043* @since 1.55044*/5045public static int codePointBefore(char[] a, int index, int start) {5046if (index <= start || start < 0 || start >= a.length) {5047throw new IndexOutOfBoundsException();5048}5049return codePointBeforeImpl(a, index, start);5050}50515052// throws ArrayIndexOutOfBoundsException if index-1 out of bounds5053static int codePointBeforeImpl(char[] a, int index, int start) {5054char c2 = a[--index];5055if (isLowSurrogate(c2) && index > start) {5056char c1 = a[--index];5057if (isHighSurrogate(c1)) {5058return toCodePoint(c1, c2);5059}5060}5061return c2;5062}50635064/**5065* Returns the leading surrogate (a5066* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">5067* high surrogate code unit</a>) of the5068* <a href="http://www.unicode.org/glossary/#surrogate_pair">5069* surrogate pair</a>5070* representing the specified supplementary character (Unicode5071* code point) in the UTF-16 encoding. If the specified character5072* is not a5073* <a href="Character.html#supplementary">supplementary character</a>,5074* an unspecified {@code char} is returned.5075*5076* <p>If5077* {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}5078* is {@code true}, then5079* {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and5080* {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x}5081* are also always {@code true}.5082*5083* @param codePoint a supplementary character (Unicode code point)5084* @return the leading surrogate code unit used to represent the5085* character in the UTF-16 encoding5086* @since 1.75087*/5088public static char highSurrogate(int codePoint) {5089return (char) ((codePoint >>> 10)5090+ (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));5091}50925093/**5094* Returns the trailing surrogate (a5095* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">5096* low surrogate code unit</a>) of the5097* <a href="http://www.unicode.org/glossary/#surrogate_pair">5098* surrogate pair</a>5099* representing the specified supplementary character (Unicode5100* code point) in the UTF-16 encoding. If the specified character5101* is not a5102* <a href="Character.html#supplementary">supplementary character</a>,5103* an unspecified {@code char} is returned.5104*5105* <p>If5106* {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}5107* is {@code true}, then5108* {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and5109* {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x}5110* are also always {@code true}.5111*5112* @param codePoint a supplementary character (Unicode code point)5113* @return the trailing surrogate code unit used to represent the5114* character in the UTF-16 encoding5115* @since 1.75116*/5117public static char lowSurrogate(int codePoint) {5118return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);5119}51205121/**5122* Converts the specified character (Unicode code point) to its5123* UTF-16 representation. If the specified code point is a BMP5124* (Basic Multilingual Plane or Plane 0) value, the same value is5125* stored in {@code dst[dstIndex]}, and 1 is returned. If the5126* specified code point is a supplementary character, its5127* surrogate values are stored in {@code dst[dstIndex]}5128* (high-surrogate) and {@code dst[dstIndex+1]}5129* (low-surrogate), and 2 is returned.5130*5131* @param codePoint the character (Unicode code point) to be converted.5132* @param dst an array of {@code char} in which the5133* {@code codePoint}'s UTF-16 value is stored.5134* @param dstIndex the start index into the {@code dst}5135* array where the converted value is stored.5136* @return 1 if the code point is a BMP code point, 2 if the5137* code point is a supplementary code point.5138* @exception IllegalArgumentException if the specified5139* {@code codePoint} is not a valid Unicode code point.5140* @exception NullPointerException if the specified {@code dst} is null.5141* @exception IndexOutOfBoundsException if {@code dstIndex}5142* is negative or not less than {@code dst.length}, or if5143* {@code dst} at {@code dstIndex} doesn't have enough5144* array element(s) to store the resulting {@code char}5145* value(s). (If {@code dstIndex} is equal to5146* {@code dst.length-1} and the specified5147* {@code codePoint} is a supplementary character, the5148* high-surrogate value is not stored in5149* {@code dst[dstIndex]}.)5150* @since 1.55151*/5152public static int toChars(int codePoint, char[] dst, int dstIndex) {5153if (isBmpCodePoint(codePoint)) {5154dst[dstIndex] = (char) codePoint;5155return 1;5156} else if (isValidCodePoint(codePoint)) {5157toSurrogates(codePoint, dst, dstIndex);5158return 2;5159} else {5160throw new IllegalArgumentException();5161}5162}51635164/**5165* Converts the specified character (Unicode code point) to its5166* UTF-16 representation stored in a {@code char} array. If5167* the specified code point is a BMP (Basic Multilingual Plane or5168* Plane 0) value, the resulting {@code char} array has5169* the same value as {@code codePoint}. If the specified code5170* point is a supplementary code point, the resulting5171* {@code char} array has the corresponding surrogate pair.5172*5173* @param codePoint a Unicode code point5174* @return a {@code char} array having5175* {@code codePoint}'s UTF-16 representation.5176* @exception IllegalArgumentException if the specified5177* {@code codePoint} is not a valid Unicode code point.5178* @since 1.55179*/5180public static char[] toChars(int codePoint) {5181if (isBmpCodePoint(codePoint)) {5182return new char[] { (char) codePoint };5183} else if (isValidCodePoint(codePoint)) {5184char[] result = new char[2];5185toSurrogates(codePoint, result, 0);5186return result;5187} else {5188throw new IllegalArgumentException();5189}5190}51915192static void toSurrogates(int codePoint, char[] dst, int index) {5193// We write elements "backwards" to guarantee all-or-nothing5194dst[index+1] = lowSurrogate(codePoint);5195dst[index] = highSurrogate(codePoint);5196}51975198/**5199* Returns the number of Unicode code points in the text range of5200* the specified char sequence. The text range begins at the5201* specified {@code beginIndex} and extends to the5202* {@code char} at index {@code endIndex - 1}. Thus the5203* length (in {@code char}s) of the text range is5204* {@code endIndex-beginIndex}. Unpaired surrogates within5205* the text range count as one code point each.5206*5207* @param seq the char sequence5208* @param beginIndex the index to the first {@code char} of5209* the text range.5210* @param endIndex the index after the last {@code char} of5211* the text range.5212* @return the number of Unicode code points in the specified text5213* range5214* @exception NullPointerException if {@code seq} is null.5215* @exception IndexOutOfBoundsException if the5216* {@code beginIndex} is negative, or {@code endIndex}5217* is larger than the length of the given sequence, or5218* {@code beginIndex} is larger than {@code endIndex}.5219* @since 1.55220*/5221public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) {5222int length = seq.length();5223if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) {5224throw new IndexOutOfBoundsException();5225}5226int n = endIndex - beginIndex;5227for (int i = beginIndex; i < endIndex; ) {5228if (isHighSurrogate(seq.charAt(i++)) && i < endIndex &&5229isLowSurrogate(seq.charAt(i))) {5230n--;5231i++;5232}5233}5234return n;5235}52365237/**5238* Returns the number of Unicode code points in a subarray of the5239* {@code char} array argument. The {@code offset}5240* argument is the index of the first {@code char} of the5241* subarray and the {@code count} argument specifies the5242* length of the subarray in {@code char}s. Unpaired5243* surrogates within the subarray count as one code point each.5244*5245* @param a the {@code char} array5246* @param offset the index of the first {@code char} in the5247* given {@code char} array5248* @param count the length of the subarray in {@code char}s5249* @return the number of Unicode code points in the specified subarray5250* @exception NullPointerException if {@code a} is null.5251* @exception IndexOutOfBoundsException if {@code offset} or5252* {@code count} is negative, or if {@code offset +5253* count} is larger than the length of the given array.5254* @since 1.55255*/5256public static int codePointCount(char[] a, int offset, int count) {5257if (count > a.length - offset || offset < 0 || count < 0) {5258throw new IndexOutOfBoundsException();5259}5260return codePointCountImpl(a, offset, count);5261}52625263static int codePointCountImpl(char[] a, int offset, int count) {5264int endIndex = offset + count;5265int n = count;5266for (int i = offset; i < endIndex; ) {5267if (isHighSurrogate(a[i++]) && i < endIndex &&5268isLowSurrogate(a[i])) {5269n--;5270i++;5271}5272}5273return n;5274}52755276/**5277* Returns the index within the given char sequence that is offset5278* from the given {@code index} by {@code codePointOffset}5279* code points. Unpaired surrogates within the text range given by5280* {@code index} and {@code codePointOffset} count as5281* one code point each.5282*5283* @param seq the char sequence5284* @param index the index to be offset5285* @param codePointOffset the offset in code points5286* @return the index within the char sequence5287* @exception NullPointerException if {@code seq} is null.5288* @exception IndexOutOfBoundsException if {@code index}5289* is negative or larger then the length of the char sequence,5290* or if {@code codePointOffset} is positive and the5291* subsequence starting with {@code index} has fewer than5292* {@code codePointOffset} code points, or if5293* {@code codePointOffset} is negative and the subsequence5294* before {@code index} has fewer than the absolute value5295* of {@code codePointOffset} code points.5296* @since 1.55297*/5298public static int offsetByCodePoints(CharSequence seq, int index,5299int codePointOffset) {5300int length = seq.length();5301if (index < 0 || index > length) {5302throw new IndexOutOfBoundsException();5303}53045305int x = index;5306if (codePointOffset >= 0) {5307int i;5308for (i = 0; x < length && i < codePointOffset; i++) {5309if (isHighSurrogate(seq.charAt(x++)) && x < length &&5310isLowSurrogate(seq.charAt(x))) {5311x++;5312}5313}5314if (i < codePointOffset) {5315throw new IndexOutOfBoundsException();5316}5317} else {5318int i;5319for (i = codePointOffset; x > 0 && i < 0; i++) {5320if (isLowSurrogate(seq.charAt(--x)) && x > 0 &&5321isHighSurrogate(seq.charAt(x-1))) {5322x--;5323}5324}5325if (i < 0) {5326throw new IndexOutOfBoundsException();5327}5328}5329return x;5330}53315332/**5333* Returns the index within the given {@code char} subarray5334* that is offset from the given {@code index} by5335* {@code codePointOffset} code points. The5336* {@code start} and {@code count} arguments specify a5337* subarray of the {@code char} array. Unpaired surrogates5338* within the text range given by {@code index} and5339* {@code codePointOffset} count as one code point each.5340*5341* @param a the {@code char} array5342* @param start the index of the first {@code char} of the5343* subarray5344* @param count the length of the subarray in {@code char}s5345* @param index the index to be offset5346* @param codePointOffset the offset in code points5347* @return the index within the subarray5348* @exception NullPointerException if {@code a} is null.5349* @exception IndexOutOfBoundsException5350* if {@code start} or {@code count} is negative,5351* or if {@code start + count} is larger than the length of5352* the given array,5353* or if {@code index} is less than {@code start} or5354* larger then {@code start + count},5355* or if {@code codePointOffset} is positive and the text range5356* starting with {@code index} and ending with {@code start + count - 1}5357* has fewer than {@code codePointOffset} code5358* points,5359* or if {@code codePointOffset} is negative and the text range5360* starting with {@code start} and ending with {@code index - 1}5361* has fewer than the absolute value of5362* {@code codePointOffset} code points.5363* @since 1.55364*/5365public static int offsetByCodePoints(char[] a, int start, int count,5366int index, int codePointOffset) {5367if (count > a.length-start || start < 0 || count < 05368|| index < start || index > start+count) {5369throw new IndexOutOfBoundsException();5370}5371return offsetByCodePointsImpl(a, start, count, index, codePointOffset);5372}53735374static int offsetByCodePointsImpl(char[]a, int start, int count,5375int index, int codePointOffset) {5376int x = index;5377if (codePointOffset >= 0) {5378int limit = start + count;5379int i;5380for (i = 0; x < limit && i < codePointOffset; i++) {5381if (isHighSurrogate(a[x++]) && x < limit &&5382isLowSurrogate(a[x])) {5383x++;5384}5385}5386if (i < codePointOffset) {5387throw new IndexOutOfBoundsException();5388}5389} else {5390int i;5391for (i = codePointOffset; x > start && i < 0; i++) {5392if (isLowSurrogate(a[--x]) && x > start &&5393isHighSurrogate(a[x-1])) {5394x--;5395}5396}5397if (i < 0) {5398throw new IndexOutOfBoundsException();5399}5400}5401return x;5402}54035404/**5405* Determines if the specified character is a lowercase character.5406* <p>5407* A character is lowercase if its general category type, provided5408* by {@code Character.getType(ch)}, is5409* {@code LOWERCASE_LETTER}, or it has contributory property5410* Other_Lowercase as defined by the Unicode Standard.5411* <p>5412* The following are examples of lowercase characters:5413* <blockquote><pre>5414* a b c d e f g h i j k l m n o p q r s t u v w x y z5415* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'5416* '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE'5417* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'5418* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'5419* </pre></blockquote>5420* <p> Many other Unicode characters are lowercase too.5421*5422* <p><b>Note:</b> This method cannot handle <a5423* href="#supplementary"> supplementary characters</a>. To support5424* all Unicode characters, including supplementary characters, use5425* the {@link #isLowerCase(int)} method.5426*5427* @param ch the character to be tested.5428* @return {@code true} if the character is lowercase;5429* {@code false} otherwise.5430* @see Character#isLowerCase(char)5431* @see Character#isTitleCase(char)5432* @see Character#toLowerCase(char)5433* @see Character#getType(char)5434*/5435public static boolean isLowerCase(char ch) {5436return isLowerCase((int)ch);5437}54385439/**5440* Determines if the specified character (Unicode code point) is a5441* lowercase character.5442* <p>5443* A character is lowercase if its general category type, provided5444* by {@link Character#getType getType(codePoint)}, is5445* {@code LOWERCASE_LETTER}, or it has contributory property5446* Other_Lowercase as defined by the Unicode Standard.5447* <p>5448* The following are examples of lowercase characters:5449* <blockquote><pre>5450* a b c d e f g h i j k l m n o p q r s t u v w x y z5451* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'5452* '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE'5453* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'5454* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'5455* </pre></blockquote>5456* <p> Many other Unicode characters are lowercase too.5457*5458* @param codePoint the character (Unicode code point) to be tested.5459* @return {@code true} if the character is lowercase;5460* {@code false} otherwise.5461* @see Character#isLowerCase(int)5462* @see Character#isTitleCase(int)5463* @see Character#toLowerCase(int)5464* @see Character#getType(int)5465* @since 1.55466*/5467public static boolean isLowerCase(int codePoint) {5468return getType(codePoint) == Character.LOWERCASE_LETTER ||5469CharacterData.of(codePoint).isOtherLowercase(codePoint);5470}54715472/**5473* Determines if the specified character is an uppercase character.5474* <p>5475* A character is uppercase if its general category type, provided by5476* {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}.5477* or it has contributory property Other_Uppercase as defined by the Unicode Standard.5478* <p>5479* The following are examples of uppercase characters:5480* <blockquote><pre>5481* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z5482* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'5483* '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF'5484* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'5485* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'5486* </pre></blockquote>5487* <p> Many other Unicode characters are uppercase too.5488*5489* <p><b>Note:</b> This method cannot handle <a5490* href="#supplementary"> supplementary characters</a>. To support5491* all Unicode characters, including supplementary characters, use5492* the {@link #isUpperCase(int)} method.5493*5494* @param ch the character to be tested.5495* @return {@code true} if the character is uppercase;5496* {@code false} otherwise.5497* @see Character#isLowerCase(char)5498* @see Character#isTitleCase(char)5499* @see Character#toUpperCase(char)5500* @see Character#getType(char)5501* @since 1.05502*/5503public static boolean isUpperCase(char ch) {5504return isUpperCase((int)ch);5505}55065507/**5508* Determines if the specified character (Unicode code point) is an uppercase character.5509* <p>5510* A character is uppercase if its general category type, provided by5511* {@link Character#getType(int) getType(codePoint)}, is {@code UPPERCASE_LETTER},5512* or it has contributory property Other_Uppercase as defined by the Unicode Standard.5513* <p>5514* The following are examples of uppercase characters:5515* <blockquote><pre>5516* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z5517* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'5518* '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF'5519* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'5520* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'5521* </pre></blockquote>5522* <p> Many other Unicode characters are uppercase too.<p>5523*5524* @param codePoint the character (Unicode code point) to be tested.5525* @return {@code true} if the character is uppercase;5526* {@code false} otherwise.5527* @see Character#isLowerCase(int)5528* @see Character#isTitleCase(int)5529* @see Character#toUpperCase(int)5530* @see Character#getType(int)5531* @since 1.55532*/5533public static boolean isUpperCase(int codePoint) {5534return getType(codePoint) == Character.UPPERCASE_LETTER ||5535CharacterData.of(codePoint).isOtherUppercase(codePoint);5536}55375538/**5539* Determines if the specified character is a titlecase character.5540* <p>5541* A character is a titlecase character if its general5542* category type, provided by {@code Character.getType(ch)},5543* is {@code TITLECASE_LETTER}.5544* <p>5545* Some characters look like pairs of Latin letters. For example, there5546* is an uppercase letter that looks like "LJ" and has a corresponding5547* lowercase letter that looks like "lj". A third form, which looks like "Lj",5548* is the appropriate form to use when rendering a word in lowercase5549* with initial capitals, as for a book title.5550* <p>5551* These are some of the Unicode characters for which this method returns5552* {@code true}:5553* <ul>5554* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}5555* <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}5556* <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}5557* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}5558* </ul>5559* <p> Many other Unicode characters are titlecase too.5560*5561* <p><b>Note:</b> This method cannot handle <a5562* href="#supplementary"> supplementary characters</a>. To support5563* all Unicode characters, including supplementary characters, use5564* the {@link #isTitleCase(int)} method.5565*5566* @param ch the character to be tested.5567* @return {@code true} if the character is titlecase;5568* {@code false} otherwise.5569* @see Character#isLowerCase(char)5570* @see Character#isUpperCase(char)5571* @see Character#toTitleCase(char)5572* @see Character#getType(char)5573* @since 1.0.25574*/5575public static boolean isTitleCase(char ch) {5576return isTitleCase((int)ch);5577}55785579/**5580* Determines if the specified character (Unicode code point) is a titlecase character.5581* <p>5582* A character is a titlecase character if its general5583* category type, provided by {@link Character#getType(int) getType(codePoint)},5584* is {@code TITLECASE_LETTER}.5585* <p>5586* Some characters look like pairs of Latin letters. For example, there5587* is an uppercase letter that looks like "LJ" and has a corresponding5588* lowercase letter that looks like "lj". A third form, which looks like "Lj",5589* is the appropriate form to use when rendering a word in lowercase5590* with initial capitals, as for a book title.5591* <p>5592* These are some of the Unicode characters for which this method returns5593* {@code true}:5594* <ul>5595* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}5596* <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}5597* <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}5598* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}5599* </ul>5600* <p> Many other Unicode characters are titlecase too.<p>5601*5602* @param codePoint the character (Unicode code point) to be tested.5603* @return {@code true} if the character is titlecase;5604* {@code false} otherwise.5605* @see Character#isLowerCase(int)5606* @see Character#isUpperCase(int)5607* @see Character#toTitleCase(int)5608* @see Character#getType(int)5609* @since 1.55610*/5611public static boolean isTitleCase(int codePoint) {5612return getType(codePoint) == Character.TITLECASE_LETTER;5613}56145615/**5616* Determines if the specified character is a digit.5617* <p>5618* A character is a digit if its general category type, provided5619* by {@code Character.getType(ch)}, is5620* {@code DECIMAL_DIGIT_NUMBER}.5621* <p>5622* Some Unicode character ranges that contain digits:5623* <ul>5624* <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},5625* ISO-LATIN-1 digits ({@code '0'} through {@code '9'})5626* <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},5627* Arabic-Indic digits5628* <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},5629* Extended Arabic-Indic digits5630* <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},5631* Devanagari digits5632* <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},5633* Fullwidth digits5634* </ul>5635*5636* Many other character ranges contain digits as well.5637*5638* <p><b>Note:</b> This method cannot handle <a5639* href="#supplementary"> supplementary characters</a>. To support5640* all Unicode characters, including supplementary characters, use5641* the {@link #isDigit(int)} method.5642*5643* @param ch the character to be tested.5644* @return {@code true} if the character is a digit;5645* {@code false} otherwise.5646* @see Character#digit(char, int)5647* @see Character#forDigit(int, int)5648* @see Character#getType(char)5649*/5650public static boolean isDigit(char ch) {5651return isDigit((int)ch);5652}56535654/**5655* Determines if the specified character (Unicode code point) is a digit.5656* <p>5657* A character is a digit if its general category type, provided5658* by {@link Character#getType(int) getType(codePoint)}, is5659* {@code DECIMAL_DIGIT_NUMBER}.5660* <p>5661* Some Unicode character ranges that contain digits:5662* <ul>5663* <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},5664* ISO-LATIN-1 digits ({@code '0'} through {@code '9'})5665* <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},5666* Arabic-Indic digits5667* <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},5668* Extended Arabic-Indic digits5669* <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},5670* Devanagari digits5671* <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},5672* Fullwidth digits5673* </ul>5674*5675* Many other character ranges contain digits as well.5676*5677* @param codePoint the character (Unicode code point) to be tested.5678* @return {@code true} if the character is a digit;5679* {@code false} otherwise.5680* @see Character#forDigit(int, int)5681* @see Character#getType(int)5682* @since 1.55683*/5684public static boolean isDigit(int codePoint) {5685return getType(codePoint) == Character.DECIMAL_DIGIT_NUMBER;5686}56875688/**5689* Determines if a character is defined in Unicode.5690* <p>5691* A character is defined if at least one of the following is true:5692* <ul>5693* <li>It has an entry in the UnicodeData file.5694* <li>It has a value in a range defined by the UnicodeData file.5695* </ul>5696*5697* <p><b>Note:</b> This method cannot handle <a5698* href="#supplementary"> supplementary characters</a>. To support5699* all Unicode characters, including supplementary characters, use5700* the {@link #isDefined(int)} method.5701*5702* @param ch the character to be tested5703* @return {@code true} if the character has a defined meaning5704* in Unicode; {@code false} otherwise.5705* @see Character#isDigit(char)5706* @see Character#isLetter(char)5707* @see Character#isLetterOrDigit(char)5708* @see Character#isLowerCase(char)5709* @see Character#isTitleCase(char)5710* @see Character#isUpperCase(char)5711* @since 1.0.25712*/5713public static boolean isDefined(char ch) {5714return isDefined((int)ch);5715}57165717/**5718* Determines if a character (Unicode code point) is defined in Unicode.5719* <p>5720* A character is defined if at least one of the following is true:5721* <ul>5722* <li>It has an entry in the UnicodeData file.5723* <li>It has a value in a range defined by the UnicodeData file.5724* </ul>5725*5726* @param codePoint the character (Unicode code point) to be tested.5727* @return {@code true} if the character has a defined meaning5728* in Unicode; {@code false} otherwise.5729* @see Character#isDigit(int)5730* @see Character#isLetter(int)5731* @see Character#isLetterOrDigit(int)5732* @see Character#isLowerCase(int)5733* @see Character#isTitleCase(int)5734* @see Character#isUpperCase(int)5735* @since 1.55736*/5737public static boolean isDefined(int codePoint) {5738return getType(codePoint) != Character.UNASSIGNED;5739}57405741/**5742* Determines if the specified character is a letter.5743* <p>5744* A character is considered to be a letter if its general5745* category type, provided by {@code Character.getType(ch)},5746* is any of the following:5747* <ul>5748* <li> {@code UPPERCASE_LETTER}5749* <li> {@code LOWERCASE_LETTER}5750* <li> {@code TITLECASE_LETTER}5751* <li> {@code MODIFIER_LETTER}5752* <li> {@code OTHER_LETTER}5753* </ul>5754*5755* Not all letters have case. Many characters are5756* letters but are neither uppercase nor lowercase nor titlecase.5757*5758* <p><b>Note:</b> This method cannot handle <a5759* href="#supplementary"> supplementary characters</a>. To support5760* all Unicode characters, including supplementary characters, use5761* the {@link #isLetter(int)} method.5762*5763* @param ch the character to be tested.5764* @return {@code true} if the character is a letter;5765* {@code false} otherwise.5766* @see Character#isDigit(char)5767* @see Character#isJavaIdentifierStart(char)5768* @see Character#isJavaLetter(char)5769* @see Character#isJavaLetterOrDigit(char)5770* @see Character#isLetterOrDigit(char)5771* @see Character#isLowerCase(char)5772* @see Character#isTitleCase(char)5773* @see Character#isUnicodeIdentifierStart(char)5774* @see Character#isUpperCase(char)5775*/5776public static boolean isLetter(char ch) {5777return isLetter((int)ch);5778}57795780/**5781* Determines if the specified character (Unicode code point) is a letter.5782* <p>5783* A character is considered to be a letter if its general5784* category type, provided by {@link Character#getType(int) getType(codePoint)},5785* is any of the following:5786* <ul>5787* <li> {@code UPPERCASE_LETTER}5788* <li> {@code LOWERCASE_LETTER}5789* <li> {@code TITLECASE_LETTER}5790* <li> {@code MODIFIER_LETTER}5791* <li> {@code OTHER_LETTER}5792* </ul>5793*5794* Not all letters have case. Many characters are5795* letters but are neither uppercase nor lowercase nor titlecase.5796*5797* @param codePoint the character (Unicode code point) to be tested.5798* @return {@code true} if the character is a letter;5799* {@code false} otherwise.5800* @see Character#isDigit(int)5801* @see Character#isJavaIdentifierStart(int)5802* @see Character#isLetterOrDigit(int)5803* @see Character#isLowerCase(int)5804* @see Character#isTitleCase(int)5805* @see Character#isUnicodeIdentifierStart(int)5806* @see Character#isUpperCase(int)5807* @since 1.55808*/5809public static boolean isLetter(int codePoint) {5810return ((((1 << Character.UPPERCASE_LETTER) |5811(1 << Character.LOWERCASE_LETTER) |5812(1 << Character.TITLECASE_LETTER) |5813(1 << Character.MODIFIER_LETTER) |5814(1 << Character.OTHER_LETTER)) >> getType(codePoint)) & 1)5815!= 0;5816}58175818/**5819* Determines if the specified character is a letter or digit.5820* <p>5821* A character is considered to be a letter or digit if either5822* {@code Character.isLetter(char ch)} or5823* {@code Character.isDigit(char ch)} returns5824* {@code true} for the character.5825*5826* <p><b>Note:</b> This method cannot handle <a5827* href="#supplementary"> supplementary characters</a>. To support5828* all Unicode characters, including supplementary characters, use5829* the {@link #isLetterOrDigit(int)} method.5830*5831* @param ch the character to be tested.5832* @return {@code true} if the character is a letter or digit;5833* {@code false} otherwise.5834* @see Character#isDigit(char)5835* @see Character#isJavaIdentifierPart(char)5836* @see Character#isJavaLetter(char)5837* @see Character#isJavaLetterOrDigit(char)5838* @see Character#isLetter(char)5839* @see Character#isUnicodeIdentifierPart(char)5840* @since 1.0.25841*/5842public static boolean isLetterOrDigit(char ch) {5843return isLetterOrDigit((int)ch);5844}58455846/**5847* Determines if the specified character (Unicode code point) is a letter or digit.5848* <p>5849* A character is considered to be a letter or digit if either5850* {@link #isLetter(int) isLetter(codePoint)} or5851* {@link #isDigit(int) isDigit(codePoint)} returns5852* {@code true} for the character.5853*5854* @param codePoint the character (Unicode code point) to be tested.5855* @return {@code true} if the character is a letter or digit;5856* {@code false} otherwise.5857* @see Character#isDigit(int)5858* @see Character#isJavaIdentifierPart(int)5859* @see Character#isLetter(int)5860* @see Character#isUnicodeIdentifierPart(int)5861* @since 1.55862*/5863public static boolean isLetterOrDigit(int codePoint) {5864return ((((1 << Character.UPPERCASE_LETTER) |5865(1 << Character.LOWERCASE_LETTER) |5866(1 << Character.TITLECASE_LETTER) |5867(1 << Character.MODIFIER_LETTER) |5868(1 << Character.OTHER_LETTER) |5869(1 << Character.DECIMAL_DIGIT_NUMBER)) >> getType(codePoint)) & 1)5870!= 0;5871}58725873/**5874* Determines if the specified character is permissible as the first5875* character in a Java identifier.5876* <p>5877* A character may start a Java identifier if and only if5878* one of the following conditions is true:5879* <ul>5880* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}5881* <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}5882* <li> {@code ch} is a currency symbol (such as {@code '$'})5883* <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).5884* </ul>5885*5886* These conditions are tested against the character information from version5887* 6.2 of the Unicode Standard.5888*5889* @param ch the character to be tested.5890* @return {@code true} if the character may start a Java5891* identifier; {@code false} otherwise.5892* @see Character#isJavaLetterOrDigit(char)5893* @see Character#isJavaIdentifierStart(char)5894* @see Character#isJavaIdentifierPart(char)5895* @see Character#isLetter(char)5896* @see Character#isLetterOrDigit(char)5897* @see Character#isUnicodeIdentifierStart(char)5898* @since 1.025899* @deprecated Replaced by isJavaIdentifierStart(char).5900*/5901@Deprecated5902public static boolean isJavaLetter(char ch) {5903return isJavaIdentifierStart(ch);5904}59055906/**5907* Determines if the specified character may be part of a Java5908* identifier as other than the first character.5909* <p>5910* A character may be part of a Java identifier if and only if any5911* of the following conditions are true:5912* <ul>5913* <li> it is a letter5914* <li> it is a currency symbol (such as {@code '$'})5915* <li> it is a connecting punctuation character (such as {@code '_'})5916* <li> it is a digit5917* <li> it is a numeric letter (such as a Roman numeral character)5918* <li> it is a combining mark5919* <li> it is a non-spacing mark5920* <li> {@code isIdentifierIgnorable} returns5921* {@code true} for the character.5922* </ul>5923*5924* These conditions are tested against the character information from version5925* 6.2 of the Unicode Standard.5926*5927* @param ch the character to be tested.5928* @return {@code true} if the character may be part of a5929* Java identifier; {@code false} otherwise.5930* @see Character#isJavaLetter(char)5931* @see Character#isJavaIdentifierStart(char)5932* @see Character#isJavaIdentifierPart(char)5933* @see Character#isLetter(char)5934* @see Character#isLetterOrDigit(char)5935* @see Character#isUnicodeIdentifierPart(char)5936* @see Character#isIdentifierIgnorable(char)5937* @since 1.025938* @deprecated Replaced by isJavaIdentifierPart(char).5939*/5940@Deprecated5941public static boolean isJavaLetterOrDigit(char ch) {5942return isJavaIdentifierPart(ch);5943}59445945/**5946* Determines if the specified character (Unicode code point) is an alphabet.5947* <p>5948* A character is considered to be alphabetic if its general category type,5949* provided by {@link Character#getType(int) getType(codePoint)}, is any of5950* the following:5951* <ul>5952* <li> <code>UPPERCASE_LETTER</code>5953* <li> <code>LOWERCASE_LETTER</code>5954* <li> <code>TITLECASE_LETTER</code>5955* <li> <code>MODIFIER_LETTER</code>5956* <li> <code>OTHER_LETTER</code>5957* <li> <code>LETTER_NUMBER</code>5958* </ul>5959* or it has contributory property Other_Alphabetic as defined by the5960* Unicode Standard.5961*5962* @param codePoint the character (Unicode code point) to be tested.5963* @return <code>true</code> if the character is a Unicode alphabet5964* character, <code>false</code> otherwise.5965* @since 1.75966*/5967public static boolean isAlphabetic(int codePoint) {5968return (((((1 << Character.UPPERCASE_LETTER) |5969(1 << Character.LOWERCASE_LETTER) |5970(1 << Character.TITLECASE_LETTER) |5971(1 << Character.MODIFIER_LETTER) |5972(1 << Character.OTHER_LETTER) |5973(1 << Character.LETTER_NUMBER)) >> getType(codePoint)) & 1) != 0) ||5974CharacterData.of(codePoint).isOtherAlphabetic(codePoint);5975}59765977/**5978* Determines if the specified character (Unicode code point) is a CJKV5979* (Chinese, Japanese, Korean and Vietnamese) ideograph, as defined by5980* the Unicode Standard.5981*5982* @param codePoint the character (Unicode code point) to be tested.5983* @return <code>true</code> if the character is a Unicode ideograph5984* character, <code>false</code> otherwise.5985* @since 1.75986*/5987public static boolean isIdeographic(int codePoint) {5988return CharacterData.of(codePoint).isIdeographic(codePoint);5989}59905991/**5992* Determines if the specified character is5993* permissible as the first character in a Java identifier.5994* <p>5995* A character may start a Java identifier if and only if5996* one of the following conditions is true:5997* <ul>5998* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}5999* <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}6000* <li> {@code ch} is a currency symbol (such as {@code '$'})6001* <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).6002* </ul>6003*6004* These conditions are tested against the character information from version6005* 6.2 of the Unicode Standard.6006*6007* <p><b>Note:</b> This method cannot handle <a6008* href="#supplementary"> supplementary characters</a>. To support6009* all Unicode characters, including supplementary characters, use6010* the {@link #isJavaIdentifierStart(int)} method.6011*6012* @param ch the character to be tested.6013* @return {@code true} if the character may start a Java identifier;6014* {@code false} otherwise.6015* @see Character#isJavaIdentifierPart(char)6016* @see Character#isLetter(char)6017* @see Character#isUnicodeIdentifierStart(char)6018* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6019* @since 1.16020*/6021public static boolean isJavaIdentifierStart(char ch) {6022return isJavaIdentifierStart((int)ch);6023}60246025/**6026* Determines if the character (Unicode code point) is6027* permissible as the first character in a Java identifier.6028* <p>6029* A character may start a Java identifier if and only if6030* one of the following conditions is true:6031* <ul>6032* <li> {@link #isLetter(int) isLetter(codePoint)}6033* returns {@code true}6034* <li> {@link #getType(int) getType(codePoint)}6035* returns {@code LETTER_NUMBER}6036* <li> the referenced character is a currency symbol (such as {@code '$'})6037* <li> the referenced character is a connecting punctuation character6038* (such as {@code '_'}).6039* </ul>6040*6041* These conditions are tested against the character information from version6042* 6.2 of the Unicode Standard.6043*6044* @param codePoint the character (Unicode code point) to be tested.6045* @return {@code true} if the character may start a Java identifier;6046* {@code false} otherwise.6047* @see Character#isJavaIdentifierPart(int)6048* @see Character#isLetter(int)6049* @see Character#isUnicodeIdentifierStart(int)6050* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6051* @since 1.56052*/6053public static boolean isJavaIdentifierStart(int codePoint) {6054return CharacterData.of(codePoint).isJavaIdentifierStart(codePoint);6055}60566057/**6058* Determines if the specified character may be part of a Java6059* identifier as other than the first character.6060* <p>6061* A character may be part of a Java identifier if any of the following6062* conditions are true:6063* <ul>6064* <li> it is a letter6065* <li> it is a currency symbol (such as {@code '$'})6066* <li> it is a connecting punctuation character (such as {@code '_'})6067* <li> it is a digit6068* <li> it is a numeric letter (such as a Roman numeral character)6069* <li> it is a combining mark6070* <li> it is a non-spacing mark6071* <li> {@code isIdentifierIgnorable} returns6072* {@code true} for the character6073* </ul>6074*6075* These conditions are tested against the character information from version6076* 6.2 of the Unicode Standard.6077*6078* <p><b>Note:</b> This method cannot handle <a6079* href="#supplementary"> supplementary characters</a>. To support6080* all Unicode characters, including supplementary characters, use6081* the {@link #isJavaIdentifierPart(int)} method.6082*6083* @param ch the character to be tested.6084* @return {@code true} if the character may be part of a6085* Java identifier; {@code false} otherwise.6086* @see Character#isIdentifierIgnorable(char)6087* @see Character#isJavaIdentifierStart(char)6088* @see Character#isLetterOrDigit(char)6089* @see Character#isUnicodeIdentifierPart(char)6090* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6091* @since 1.16092*/6093public static boolean isJavaIdentifierPart(char ch) {6094return isJavaIdentifierPart((int)ch);6095}60966097/**6098* Determines if the character (Unicode code point) may be part of a Java6099* identifier as other than the first character.6100* <p>6101* A character may be part of a Java identifier if any of the following6102* conditions are true:6103* <ul>6104* <li> it is a letter6105* <li> it is a currency symbol (such as {@code '$'})6106* <li> it is a connecting punctuation character (such as {@code '_'})6107* <li> it is a digit6108* <li> it is a numeric letter (such as a Roman numeral character)6109* <li> it is a combining mark6110* <li> it is a non-spacing mark6111* <li> {@link #isIdentifierIgnorable(int)6112* isIdentifierIgnorable(codePoint)} returns {@code true} for6113* the code point6114* </ul>6115*6116* These conditions are tested against the character information from version6117* 6.2 of the Unicode Standard.6118*6119* @param codePoint the character (Unicode code point) to be tested.6120* @return {@code true} if the character may be part of a6121* Java identifier; {@code false} otherwise.6122* @see Character#isIdentifierIgnorable(int)6123* @see Character#isJavaIdentifierStart(int)6124* @see Character#isLetterOrDigit(int)6125* @see Character#isUnicodeIdentifierPart(int)6126* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6127* @since 1.56128*/6129public static boolean isJavaIdentifierPart(int codePoint) {6130return CharacterData.of(codePoint).isJavaIdentifierPart(codePoint);6131}61326133/**6134* Determines if the specified character is permissible as the6135* first character in a Unicode identifier.6136* <p>6137* A character may start a Unicode identifier if and only if6138* one of the following conditions is true:6139* <ul>6140* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}6141* <li> {@link #getType(char) getType(ch)} returns6142* {@code LETTER_NUMBER}.6143* </ul>6144*6145* <p><b>Note:</b> This method cannot handle <a6146* href="#supplementary"> supplementary characters</a>. To support6147* all Unicode characters, including supplementary characters, use6148* the {@link #isUnicodeIdentifierStart(int)} method.6149*6150* @param ch the character to be tested.6151* @return {@code true} if the character may start a Unicode6152* identifier; {@code false} otherwise.6153* @see Character#isJavaIdentifierStart(char)6154* @see Character#isLetter(char)6155* @see Character#isUnicodeIdentifierPart(char)6156* @since 1.16157*/6158public static boolean isUnicodeIdentifierStart(char ch) {6159return isUnicodeIdentifierStart((int)ch);6160}61616162/**6163* Determines if the specified character (Unicode code point) is permissible as the6164* first character in a Unicode identifier.6165* <p>6166* A character may start a Unicode identifier if and only if6167* one of the following conditions is true:6168* <ul>6169* <li> {@link #isLetter(int) isLetter(codePoint)}6170* returns {@code true}6171* <li> {@link #getType(int) getType(codePoint)}6172* returns {@code LETTER_NUMBER}.6173* </ul>6174* @param codePoint the character (Unicode code point) to be tested.6175* @return {@code true} if the character may start a Unicode6176* identifier; {@code false} otherwise.6177* @see Character#isJavaIdentifierStart(int)6178* @see Character#isLetter(int)6179* @see Character#isUnicodeIdentifierPart(int)6180* @since 1.56181*/6182public static boolean isUnicodeIdentifierStart(int codePoint) {6183return CharacterData.of(codePoint).isUnicodeIdentifierStart(codePoint);6184}61856186/**6187* Determines if the specified character may be part of a Unicode6188* identifier as other than the first character.6189* <p>6190* A character may be part of a Unicode identifier if and only if6191* one of the following statements is true:6192* <ul>6193* <li> it is a letter6194* <li> it is a connecting punctuation character (such as {@code '_'})6195* <li> it is a digit6196* <li> it is a numeric letter (such as a Roman numeral character)6197* <li> it is a combining mark6198* <li> it is a non-spacing mark6199* <li> {@code isIdentifierIgnorable} returns6200* {@code true} for this character.6201* </ul>6202*6203* <p><b>Note:</b> This method cannot handle <a6204* href="#supplementary"> supplementary characters</a>. To support6205* all Unicode characters, including supplementary characters, use6206* the {@link #isUnicodeIdentifierPart(int)} method.6207*6208* @param ch the character to be tested.6209* @return {@code true} if the character may be part of a6210* Unicode identifier; {@code false} otherwise.6211* @see Character#isIdentifierIgnorable(char)6212* @see Character#isJavaIdentifierPart(char)6213* @see Character#isLetterOrDigit(char)6214* @see Character#isUnicodeIdentifierStart(char)6215* @since 1.16216*/6217public static boolean isUnicodeIdentifierPart(char ch) {6218return isUnicodeIdentifierPart((int)ch);6219}62206221/**6222* Determines if the specified character (Unicode code point) may be part of a Unicode6223* identifier as other than the first character.6224* <p>6225* A character may be part of a Unicode identifier if and only if6226* one of the following statements is true:6227* <ul>6228* <li> it is a letter6229* <li> it is a connecting punctuation character (such as {@code '_'})6230* <li> it is a digit6231* <li> it is a numeric letter (such as a Roman numeral character)6232* <li> it is a combining mark6233* <li> it is a non-spacing mark6234* <li> {@code isIdentifierIgnorable} returns6235* {@code true} for this character.6236* </ul>6237* @param codePoint the character (Unicode code point) to be tested.6238* @return {@code true} if the character may be part of a6239* Unicode identifier; {@code false} otherwise.6240* @see Character#isIdentifierIgnorable(int)6241* @see Character#isJavaIdentifierPart(int)6242* @see Character#isLetterOrDigit(int)6243* @see Character#isUnicodeIdentifierStart(int)6244* @since 1.56245*/6246public static boolean isUnicodeIdentifierPart(int codePoint) {6247return CharacterData.of(codePoint).isUnicodeIdentifierPart(codePoint);6248}62496250/**6251* Determines if the specified character should be regarded as6252* an ignorable character in a Java identifier or a Unicode identifier.6253* <p>6254* The following Unicode characters are ignorable in a Java identifier6255* or a Unicode identifier:6256* <ul>6257* <li>ISO control characters that are not whitespace6258* <ul>6259* <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'}6260* <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'}6261* <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'}6262* </ul>6263*6264* <li>all characters that have the {@code FORMAT} general6265* category value6266* </ul>6267*6268* <p><b>Note:</b> This method cannot handle <a6269* href="#supplementary"> supplementary characters</a>. To support6270* all Unicode characters, including supplementary characters, use6271* the {@link #isIdentifierIgnorable(int)} method.6272*6273* @param ch the character to be tested.6274* @return {@code true} if the character is an ignorable control6275* character that may be part of a Java or Unicode identifier;6276* {@code false} otherwise.6277* @see Character#isJavaIdentifierPart(char)6278* @see Character#isUnicodeIdentifierPart(char)6279* @since 1.16280*/6281public static boolean isIdentifierIgnorable(char ch) {6282return isIdentifierIgnorable((int)ch);6283}62846285/**6286* Determines if the specified character (Unicode code point) should be regarded as6287* an ignorable character in a Java identifier or a Unicode identifier.6288* <p>6289* The following Unicode characters are ignorable in a Java identifier6290* or a Unicode identifier:6291* <ul>6292* <li>ISO control characters that are not whitespace6293* <ul>6294* <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'}6295* <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'}6296* <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'}6297* </ul>6298*6299* <li>all characters that have the {@code FORMAT} general6300* category value6301* </ul>6302*6303* @param codePoint the character (Unicode code point) to be tested.6304* @return {@code true} if the character is an ignorable control6305* character that may be part of a Java or Unicode identifier;6306* {@code false} otherwise.6307* @see Character#isJavaIdentifierPart(int)6308* @see Character#isUnicodeIdentifierPart(int)6309* @since 1.56310*/6311public static boolean isIdentifierIgnorable(int codePoint) {6312return CharacterData.of(codePoint).isIdentifierIgnorable(codePoint);6313}63146315/**6316* Converts the character argument to lowercase using case6317* mapping information from the UnicodeData file.6318* <p>6319* Note that6320* {@code Character.isLowerCase(Character.toLowerCase(ch))}6321* does not always return {@code true} for some ranges of6322* characters, particularly those that are symbols or ideographs.6323*6324* <p>In general, {@link String#toLowerCase()} should be used to map6325* characters to lowercase. {@code String} case mapping methods6326* have several benefits over {@code Character} case mapping methods.6327* {@code String} case mapping methods can perform locale-sensitive6328* mappings, context-sensitive mappings, and 1:M character mappings, whereas6329* the {@code Character} case mapping methods cannot.6330*6331* <p><b>Note:</b> This method cannot handle <a6332* href="#supplementary"> supplementary characters</a>. To support6333* all Unicode characters, including supplementary characters, use6334* the {@link #toLowerCase(int)} method.6335*6336* @param ch the character to be converted.6337* @return the lowercase equivalent of the character, if any;6338* otherwise, the character itself.6339* @see Character#isLowerCase(char)6340* @see String#toLowerCase()6341*/6342public static char toLowerCase(char ch) {6343return (char)toLowerCase((int)ch);6344}63456346/**6347* Converts the character (Unicode code point) argument to6348* lowercase using case mapping information from the UnicodeData6349* file.6350*6351* <p> Note that6352* {@code Character.isLowerCase(Character.toLowerCase(codePoint))}6353* does not always return {@code true} for some ranges of6354* characters, particularly those that are symbols or ideographs.6355*6356* <p>In general, {@link String#toLowerCase()} should be used to map6357* characters to lowercase. {@code String} case mapping methods6358* have several benefits over {@code Character} case mapping methods.6359* {@code String} case mapping methods can perform locale-sensitive6360* mappings, context-sensitive mappings, and 1:M character mappings, whereas6361* the {@code Character} case mapping methods cannot.6362*6363* @param codePoint the character (Unicode code point) to be converted.6364* @return the lowercase equivalent of the character (Unicode code6365* point), if any; otherwise, the character itself.6366* @see Character#isLowerCase(int)6367* @see String#toLowerCase()6368*6369* @since 1.56370*/6371public static int toLowerCase(int codePoint) {6372return CharacterData.of(codePoint).toLowerCase(codePoint);6373}63746375/**6376* Converts the character argument to uppercase using case mapping6377* information from the UnicodeData file.6378* <p>6379* Note that6380* {@code Character.isUpperCase(Character.toUpperCase(ch))}6381* does not always return {@code true} for some ranges of6382* characters, particularly those that are symbols or ideographs.6383*6384* <p>In general, {@link String#toUpperCase()} should be used to map6385* characters to uppercase. {@code String} case mapping methods6386* have several benefits over {@code Character} case mapping methods.6387* {@code String} case mapping methods can perform locale-sensitive6388* mappings, context-sensitive mappings, and 1:M character mappings, whereas6389* the {@code Character} case mapping methods cannot.6390*6391* <p><b>Note:</b> This method cannot handle <a6392* href="#supplementary"> supplementary characters</a>. To support6393* all Unicode characters, including supplementary characters, use6394* the {@link #toUpperCase(int)} method.6395*6396* @param ch the character to be converted.6397* @return the uppercase equivalent of the character, if any;6398* otherwise, the character itself.6399* @see Character#isUpperCase(char)6400* @see String#toUpperCase()6401*/6402public static char toUpperCase(char ch) {6403return (char)toUpperCase((int)ch);6404}64056406/**6407* Converts the character (Unicode code point) argument to6408* uppercase using case mapping information from the UnicodeData6409* file.6410*6411* <p>Note that6412* {@code Character.isUpperCase(Character.toUpperCase(codePoint))}6413* does not always return {@code true} for some ranges of6414* characters, particularly those that are symbols or ideographs.6415*6416* <p>In general, {@link String#toUpperCase()} should be used to map6417* characters to uppercase. {@code String} case mapping methods6418* have several benefits over {@code Character} case mapping methods.6419* {@code String} case mapping methods can perform locale-sensitive6420* mappings, context-sensitive mappings, and 1:M character mappings, whereas6421* the {@code Character} case mapping methods cannot.6422*6423* @param codePoint the character (Unicode code point) to be converted.6424* @return the uppercase equivalent of the character, if any;6425* otherwise, the character itself.6426* @see Character#isUpperCase(int)6427* @see String#toUpperCase()6428*6429* @since 1.56430*/6431public static int toUpperCase(int codePoint) {6432return CharacterData.of(codePoint).toUpperCase(codePoint);6433}64346435/**6436* Converts the character argument to titlecase using case mapping6437* information from the UnicodeData file. If a character has no6438* explicit titlecase mapping and is not itself a titlecase char6439* according to UnicodeData, then the uppercase mapping is6440* returned as an equivalent titlecase mapping. If the6441* {@code char} argument is already a titlecase6442* {@code char}, the same {@code char} value will be6443* returned.6444* <p>6445* Note that6446* {@code Character.isTitleCase(Character.toTitleCase(ch))}6447* does not always return {@code true} for some ranges of6448* characters.6449*6450* <p><b>Note:</b> This method cannot handle <a6451* href="#supplementary"> supplementary characters</a>. To support6452* all Unicode characters, including supplementary characters, use6453* the {@link #toTitleCase(int)} method.6454*6455* @param ch the character to be converted.6456* @return the titlecase equivalent of the character, if any;6457* otherwise, the character itself.6458* @see Character#isTitleCase(char)6459* @see Character#toLowerCase(char)6460* @see Character#toUpperCase(char)6461* @since 1.0.26462*/6463public static char toTitleCase(char ch) {6464return (char)toTitleCase((int)ch);6465}64666467/**6468* Converts the character (Unicode code point) argument to titlecase using case mapping6469* information from the UnicodeData file. If a character has no6470* explicit titlecase mapping and is not itself a titlecase char6471* according to UnicodeData, then the uppercase mapping is6472* returned as an equivalent titlecase mapping. If the6473* character argument is already a titlecase6474* character, the same character value will be6475* returned.6476*6477* <p>Note that6478* {@code Character.isTitleCase(Character.toTitleCase(codePoint))}6479* does not always return {@code true} for some ranges of6480* characters.6481*6482* @param codePoint the character (Unicode code point) to be converted.6483* @return the titlecase equivalent of the character, if any;6484* otherwise, the character itself.6485* @see Character#isTitleCase(int)6486* @see Character#toLowerCase(int)6487* @see Character#toUpperCase(int)6488* @since 1.56489*/6490public static int toTitleCase(int codePoint) {6491return CharacterData.of(codePoint).toTitleCase(codePoint);6492}64936494/**6495* Returns the numeric value of the character {@code ch} in the6496* specified radix.6497* <p>6498* If the radix is not in the range {@code MIN_RADIX} ≤6499* {@code radix} ≤ {@code MAX_RADIX} or if the6500* value of {@code ch} is not a valid digit in the specified6501* radix, {@code -1} is returned. A character is a valid digit6502* if at least one of the following is true:6503* <ul>6504* <li>The method {@code isDigit} is {@code true} of the character6505* and the Unicode decimal digit value of the character (or its6506* single-character decomposition) is less than the specified radix.6507* In this case the decimal digit value is returned.6508* <li>The character is one of the uppercase Latin letters6509* {@code 'A'} through {@code 'Z'} and its code is less than6510* {@code radix + 'A' - 10}.6511* In this case, {@code ch - 'A' + 10}6512* is returned.6513* <li>The character is one of the lowercase Latin letters6514* {@code 'a'} through {@code 'z'} and its code is less than6515* {@code radix + 'a' - 10}.6516* In this case, {@code ch - 'a' + 10}6517* is returned.6518* <li>The character is one of the fullwidth uppercase Latin letters A6519* ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})6520* and its code is less than6521* {@code radix + '\u005CuFF21' - 10}.6522* In this case, {@code ch - '\u005CuFF21' + 10}6523* is returned.6524* <li>The character is one of the fullwidth lowercase Latin letters a6525* ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})6526* and its code is less than6527* {@code radix + '\u005CuFF41' - 10}.6528* In this case, {@code ch - '\u005CuFF41' + 10}6529* is returned.6530* </ul>6531*6532* <p><b>Note:</b> This method cannot handle <a6533* href="#supplementary"> supplementary characters</a>. To support6534* all Unicode characters, including supplementary characters, use6535* the {@link #digit(int, int)} method.6536*6537* @param ch the character to be converted.6538* @param radix the radix.6539* @return the numeric value represented by the character in the6540* specified radix.6541* @see Character#forDigit(int, int)6542* @see Character#isDigit(char)6543*/6544public static int digit(char ch, int radix) {6545return digit((int)ch, radix);6546}65476548/**6549* Returns the numeric value of the specified character (Unicode6550* code point) in the specified radix.6551*6552* <p>If the radix is not in the range {@code MIN_RADIX} ≤6553* {@code radix} ≤ {@code MAX_RADIX} or if the6554* character is not a valid digit in the specified6555* radix, {@code -1} is returned. A character is a valid digit6556* if at least one of the following is true:6557* <ul>6558* <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character6559* and the Unicode decimal digit value of the character (or its6560* single-character decomposition) is less than the specified radix.6561* In this case the decimal digit value is returned.6562* <li>The character is one of the uppercase Latin letters6563* {@code 'A'} through {@code 'Z'} and its code is less than6564* {@code radix + 'A' - 10}.6565* In this case, {@code codePoint - 'A' + 10}6566* is returned.6567* <li>The character is one of the lowercase Latin letters6568* {@code 'a'} through {@code 'z'} and its code is less than6569* {@code radix + 'a' - 10}.6570* In this case, {@code codePoint - 'a' + 10}6571* is returned.6572* <li>The character is one of the fullwidth uppercase Latin letters A6573* ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})6574* and its code is less than6575* {@code radix + '\u005CuFF21' - 10}.6576* In this case,6577* {@code codePoint - '\u005CuFF21' + 10}6578* is returned.6579* <li>The character is one of the fullwidth lowercase Latin letters a6580* ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})6581* and its code is less than6582* {@code radix + '\u005CuFF41'- 10}.6583* In this case,6584* {@code codePoint - '\u005CuFF41' + 10}6585* is returned.6586* </ul>6587*6588* @param codePoint the character (Unicode code point) to be converted.6589* @param radix the radix.6590* @return the numeric value represented by the character in the6591* specified radix.6592* @see Character#forDigit(int, int)6593* @see Character#isDigit(int)6594* @since 1.56595*/6596public static int digit(int codePoint, int radix) {6597return CharacterData.of(codePoint).digit(codePoint, radix);6598}65996600/**6601* Returns the {@code int} value that the specified Unicode6602* character represents. For example, the character6603* {@code '\u005Cu216C'} (the roman numeral fifty) will return6604* an int with a value of 50.6605* <p>6606* The letters A-Z in their uppercase ({@code '\u005Cu0041'} through6607* {@code '\u005Cu005A'}), lowercase6608* ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and6609* full width variant ({@code '\u005CuFF21'} through6610* {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through6611* {@code '\u005CuFF5A'}) forms have numeric values from 106612* through 35. This is independent of the Unicode specification,6613* which does not assign numeric values to these {@code char}6614* values.6615* <p>6616* If the character does not have a numeric value, then -1 is returned.6617* If the character has a numeric value that cannot be represented as a6618* nonnegative integer (for example, a fractional value), then -26619* is returned.6620*6621* <p><b>Note:</b> This method cannot handle <a6622* href="#supplementary"> supplementary characters</a>. To support6623* all Unicode characters, including supplementary characters, use6624* the {@link #getNumericValue(int)} method.6625*6626* @param ch the character to be converted.6627* @return the numeric value of the character, as a nonnegative {@code int}6628* value; -2 if the character has a numeric value that is not a6629* nonnegative integer; -1 if the character has no numeric value.6630* @see Character#forDigit(int, int)6631* @see Character#isDigit(char)6632* @since 1.16633*/6634public static int getNumericValue(char ch) {6635return getNumericValue((int)ch);6636}66376638/**6639* Returns the {@code int} value that the specified6640* character (Unicode code point) represents. For example, the character6641* {@code '\u005Cu216C'} (the Roman numeral fifty) will return6642* an {@code int} with a value of 50.6643* <p>6644* The letters A-Z in their uppercase ({@code '\u005Cu0041'} through6645* {@code '\u005Cu005A'}), lowercase6646* ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and6647* full width variant ({@code '\u005CuFF21'} through6648* {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through6649* {@code '\u005CuFF5A'}) forms have numeric values from 106650* through 35. This is independent of the Unicode specification,6651* which does not assign numeric values to these {@code char}6652* values.6653* <p>6654* If the character does not have a numeric value, then -1 is returned.6655* If the character has a numeric value that cannot be represented as a6656* nonnegative integer (for example, a fractional value), then -26657* is returned.6658*6659* @param codePoint the character (Unicode code point) to be converted.6660* @return the numeric value of the character, as a nonnegative {@code int}6661* value; -2 if the character has a numeric value that is not a6662* nonnegative integer; -1 if the character has no numeric value.6663* @see Character#forDigit(int, int)6664* @see Character#isDigit(int)6665* @since 1.56666*/6667public static int getNumericValue(int codePoint) {6668return CharacterData.of(codePoint).getNumericValue(codePoint);6669}66706671/**6672* Determines if the specified character is ISO-LATIN-1 white space.6673* This method returns {@code true} for the following five6674* characters only:6675* <table summary="truechars">6676* <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td>6677* <td>{@code HORIZONTAL TABULATION}</td></tr>6678* <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td>6679* <td>{@code NEW LINE}</td></tr>6680* <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td>6681* <td>{@code FORM FEED}</td></tr>6682* <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td>6683* <td>{@code CARRIAGE RETURN}</td></tr>6684* <tr><td>{@code ' '}</td> <td>{@code U+0020}</td>6685* <td>{@code SPACE}</td></tr>6686* </table>6687*6688* @param ch the character to be tested.6689* @return {@code true} if the character is ISO-LATIN-1 white6690* space; {@code false} otherwise.6691* @see Character#isSpaceChar(char)6692* @see Character#isWhitespace(char)6693* @deprecated Replaced by isWhitespace(char).6694*/6695@Deprecated6696public static boolean isSpace(char ch) {6697return (ch <= 0x0020) &&6698(((((1L << 0x0009) |6699(1L << 0x000A) |6700(1L << 0x000C) |6701(1L << 0x000D) |6702(1L << 0x0020)) >> ch) & 1L) != 0);6703}670467056706/**6707* Determines if the specified character is a Unicode space character.6708* A character is considered to be a space character if and only if6709* it is specified to be a space character by the Unicode Standard. This6710* method returns true if the character's general category type is any of6711* the following:6712* <ul>6713* <li> {@code SPACE_SEPARATOR}6714* <li> {@code LINE_SEPARATOR}6715* <li> {@code PARAGRAPH_SEPARATOR}6716* </ul>6717*6718* <p><b>Note:</b> This method cannot handle <a6719* href="#supplementary"> supplementary characters</a>. To support6720* all Unicode characters, including supplementary characters, use6721* the {@link #isSpaceChar(int)} method.6722*6723* @param ch the character to be tested.6724* @return {@code true} if the character is a space character;6725* {@code false} otherwise.6726* @see Character#isWhitespace(char)6727* @since 1.16728*/6729public static boolean isSpaceChar(char ch) {6730return isSpaceChar((int)ch);6731}67326733/**6734* Determines if the specified character (Unicode code point) is a6735* Unicode space character. A character is considered to be a6736* space character if and only if it is specified to be a space6737* character by the Unicode Standard. This method returns true if6738* the character's general category type is any of the following:6739*6740* <ul>6741* <li> {@link #SPACE_SEPARATOR}6742* <li> {@link #LINE_SEPARATOR}6743* <li> {@link #PARAGRAPH_SEPARATOR}6744* </ul>6745*6746* @param codePoint the character (Unicode code point) to be tested.6747* @return {@code true} if the character is a space character;6748* {@code false} otherwise.6749* @see Character#isWhitespace(int)6750* @since 1.56751*/6752public static boolean isSpaceChar(int codePoint) {6753return ((((1 << Character.SPACE_SEPARATOR) |6754(1 << Character.LINE_SEPARATOR) |6755(1 << Character.PARAGRAPH_SEPARATOR)) >> getType(codePoint)) & 1)6756!= 0;6757}67586759/**6760* Determines if the specified character is white space according to Java.6761* A character is a Java whitespace character if and only if it satisfies6762* one of the following criteria:6763* <ul>6764* <li> It is a Unicode space character ({@code SPACE_SEPARATOR},6765* {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR})6766* but is not also a non-breaking space ({@code '\u005Cu00A0'},6767* {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).6768* <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.6769* <li> It is {@code '\u005Cn'}, U+000A LINE FEED.6770* <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.6771* <li> It is {@code '\u005Cf'}, U+000C FORM FEED.6772* <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.6773* <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.6774* <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.6775* <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.6776* <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.6777* </ul>6778*6779* <p><b>Note:</b> This method cannot handle <a6780* href="#supplementary"> supplementary characters</a>. To support6781* all Unicode characters, including supplementary characters, use6782* the {@link #isWhitespace(int)} method.6783*6784* @param ch the character to be tested.6785* @return {@code true} if the character is a Java whitespace6786* character; {@code false} otherwise.6787* @see Character#isSpaceChar(char)6788* @since 1.16789*/6790public static boolean isWhitespace(char ch) {6791return isWhitespace((int)ch);6792}67936794/**6795* Determines if the specified character (Unicode code point) is6796* white space according to Java. A character is a Java6797* whitespace character if and only if it satisfies one of the6798* following criteria:6799* <ul>6800* <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},6801* {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})6802* but is not also a non-breaking space ({@code '\u005Cu00A0'},6803* {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).6804* <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.6805* <li> It is {@code '\u005Cn'}, U+000A LINE FEED.6806* <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.6807* <li> It is {@code '\u005Cf'}, U+000C FORM FEED.6808* <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.6809* <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.6810* <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.6811* <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.6812* <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.6813* </ul>6814* <p>6815*6816* @param codePoint the character (Unicode code point) to be tested.6817* @return {@code true} if the character is a Java whitespace6818* character; {@code false} otherwise.6819* @see Character#isSpaceChar(int)6820* @since 1.56821*/6822public static boolean isWhitespace(int codePoint) {6823return CharacterData.of(codePoint).isWhitespace(codePoint);6824}68256826/**6827* Determines if the specified character is an ISO control6828* character. A character is considered to be an ISO control6829* character if its code is in the range {@code '\u005Cu0000'}6830* through {@code '\u005Cu001F'} or in the range6831* {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.6832*6833* <p><b>Note:</b> This method cannot handle <a6834* href="#supplementary"> supplementary characters</a>. To support6835* all Unicode characters, including supplementary characters, use6836* the {@link #isISOControl(int)} method.6837*6838* @param ch the character to be tested.6839* @return {@code true} if the character is an ISO control character;6840* {@code false} otherwise.6841*6842* @see Character#isSpaceChar(char)6843* @see Character#isWhitespace(char)6844* @since 1.16845*/6846public static boolean isISOControl(char ch) {6847return isISOControl((int)ch);6848}68496850/**6851* Determines if the referenced character (Unicode code point) is an ISO control6852* character. A character is considered to be an ISO control6853* character if its code is in the range {@code '\u005Cu0000'}6854* through {@code '\u005Cu001F'} or in the range6855* {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.6856*6857* @param codePoint the character (Unicode code point) to be tested.6858* @return {@code true} if the character is an ISO control character;6859* {@code false} otherwise.6860* @see Character#isSpaceChar(int)6861* @see Character#isWhitespace(int)6862* @since 1.56863*/6864public static boolean isISOControl(int codePoint) {6865// Optimized form of:6866// (codePoint >= 0x00 && codePoint <= 0x1F) ||6867// (codePoint >= 0x7F && codePoint <= 0x9F);6868return codePoint <= 0x9F &&6869(codePoint >= 0x7F || (codePoint >>> 5 == 0));6870}68716872/**6873* Returns a value indicating a character's general category.6874*6875* <p><b>Note:</b> This method cannot handle <a6876* href="#supplementary"> supplementary characters</a>. To support6877* all Unicode characters, including supplementary characters, use6878* the {@link #getType(int)} method.6879*6880* @param ch the character to be tested.6881* @return a value of type {@code int} representing the6882* character's general category.6883* @see Character#COMBINING_SPACING_MARK6884* @see Character#CONNECTOR_PUNCTUATION6885* @see Character#CONTROL6886* @see Character#CURRENCY_SYMBOL6887* @see Character#DASH_PUNCTUATION6888* @see Character#DECIMAL_DIGIT_NUMBER6889* @see Character#ENCLOSING_MARK6890* @see Character#END_PUNCTUATION6891* @see Character#FINAL_QUOTE_PUNCTUATION6892* @see Character#FORMAT6893* @see Character#INITIAL_QUOTE_PUNCTUATION6894* @see Character#LETTER_NUMBER6895* @see Character#LINE_SEPARATOR6896* @see Character#LOWERCASE_LETTER6897* @see Character#MATH_SYMBOL6898* @see Character#MODIFIER_LETTER6899* @see Character#MODIFIER_SYMBOL6900* @see Character#NON_SPACING_MARK6901* @see Character#OTHER_LETTER6902* @see Character#OTHER_NUMBER6903* @see Character#OTHER_PUNCTUATION6904* @see Character#OTHER_SYMBOL6905* @see Character#PARAGRAPH_SEPARATOR6906* @see Character#PRIVATE_USE6907* @see Character#SPACE_SEPARATOR6908* @see Character#START_PUNCTUATION6909* @see Character#SURROGATE6910* @see Character#TITLECASE_LETTER6911* @see Character#UNASSIGNED6912* @see Character#UPPERCASE_LETTER6913* @since 1.16914*/6915public static int getType(char ch) {6916return getType((int)ch);6917}69186919/**6920* Returns a value indicating a character's general category.6921*6922* @param codePoint the character (Unicode code point) to be tested.6923* @return a value of type {@code int} representing the6924* character's general category.6925* @see Character#COMBINING_SPACING_MARK COMBINING_SPACING_MARK6926* @see Character#CONNECTOR_PUNCTUATION CONNECTOR_PUNCTUATION6927* @see Character#CONTROL CONTROL6928* @see Character#CURRENCY_SYMBOL CURRENCY_SYMBOL6929* @see Character#DASH_PUNCTUATION DASH_PUNCTUATION6930* @see Character#DECIMAL_DIGIT_NUMBER DECIMAL_DIGIT_NUMBER6931* @see Character#ENCLOSING_MARK ENCLOSING_MARK6932* @see Character#END_PUNCTUATION END_PUNCTUATION6933* @see Character#FINAL_QUOTE_PUNCTUATION FINAL_QUOTE_PUNCTUATION6934* @see Character#FORMAT FORMAT6935* @see Character#INITIAL_QUOTE_PUNCTUATION INITIAL_QUOTE_PUNCTUATION6936* @see Character#LETTER_NUMBER LETTER_NUMBER6937* @see Character#LINE_SEPARATOR LINE_SEPARATOR6938* @see Character#LOWERCASE_LETTER LOWERCASE_LETTER6939* @see Character#MATH_SYMBOL MATH_SYMBOL6940* @see Character#MODIFIER_LETTER MODIFIER_LETTER6941* @see Character#MODIFIER_SYMBOL MODIFIER_SYMBOL6942* @see Character#NON_SPACING_MARK NON_SPACING_MARK6943* @see Character#OTHER_LETTER OTHER_LETTER6944* @see Character#OTHER_NUMBER OTHER_NUMBER6945* @see Character#OTHER_PUNCTUATION OTHER_PUNCTUATION6946* @see Character#OTHER_SYMBOL OTHER_SYMBOL6947* @see Character#PARAGRAPH_SEPARATOR PARAGRAPH_SEPARATOR6948* @see Character#PRIVATE_USE PRIVATE_USE6949* @see Character#SPACE_SEPARATOR SPACE_SEPARATOR6950* @see Character#START_PUNCTUATION START_PUNCTUATION6951* @see Character#SURROGATE SURROGATE6952* @see Character#TITLECASE_LETTER TITLECASE_LETTER6953* @see Character#UNASSIGNED UNASSIGNED6954* @see Character#UPPERCASE_LETTER UPPERCASE_LETTER6955* @since 1.56956*/6957public static int getType(int codePoint) {6958return CharacterData.of(codePoint).getType(codePoint);6959}69606961/**6962* Determines the character representation for a specific digit in6963* the specified radix. If the value of {@code radix} is not a6964* valid radix, or the value of {@code digit} is not a valid6965* digit in the specified radix, the null character6966* ({@code '\u005Cu0000'}) is returned.6967* <p>6968* The {@code radix} argument is valid if it is greater than or6969* equal to {@code MIN_RADIX} and less than or equal to6970* {@code MAX_RADIX}. The {@code digit} argument is valid if6971* {@code 0 <= digit < radix}.6972* <p>6973* If the digit is less than 10, then6974* {@code '0' + digit} is returned. Otherwise, the value6975* {@code 'a' + digit - 10} is returned.6976*6977* @param digit the number to convert to a character.6978* @param radix the radix.6979* @return the {@code char} representation of the specified digit6980* in the specified radix.6981* @see Character#MIN_RADIX6982* @see Character#MAX_RADIX6983* @see Character#digit(char, int)6984*/6985public static char forDigit(int digit, int radix) {6986if ((digit >= radix) || (digit < 0)) {6987return '\0';6988}6989if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) {6990return '\0';6991}6992if (digit < 10) {6993return (char)('0' + digit);6994}6995return (char)('a' - 10 + digit);6996}69976998/**6999* Returns the Unicode directionality property for the given7000* character. Character directionality is used to calculate the7001* visual ordering of text. The directionality value of undefined7002* {@code char} values is {@code DIRECTIONALITY_UNDEFINED}.7003*7004* <p><b>Note:</b> This method cannot handle <a7005* href="#supplementary"> supplementary characters</a>. To support7006* all Unicode characters, including supplementary characters, use7007* the {@link #getDirectionality(int)} method.7008*7009* @param ch {@code char} for which the directionality property7010* is requested.7011* @return the directionality property of the {@code char} value.7012*7013* @see Character#DIRECTIONALITY_UNDEFINED7014* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT7015* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT7016* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC7017* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER7018* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR7019* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR7020* @see Character#DIRECTIONALITY_ARABIC_NUMBER7021* @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR7022* @see Character#DIRECTIONALITY_NONSPACING_MARK7023* @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL7024* @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR7025* @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR7026* @see Character#DIRECTIONALITY_WHITESPACE7027* @see Character#DIRECTIONALITY_OTHER_NEUTRALS7028* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING7029* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE7030* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING7031* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE7032* @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT7033* @since 1.47034*/7035public static byte getDirectionality(char ch) {7036return getDirectionality((int)ch);7037}70387039/**7040* Returns the Unicode directionality property for the given7041* character (Unicode code point). Character directionality is7042* used to calculate the visual ordering of text. The7043* directionality value of undefined character is {@link7044* #DIRECTIONALITY_UNDEFINED}.7045*7046* @param codePoint the character (Unicode code point) for which7047* the directionality property is requested.7048* @return the directionality property of the character.7049*7050* @see Character#DIRECTIONALITY_UNDEFINED DIRECTIONALITY_UNDEFINED7051* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT DIRECTIONALITY_LEFT_TO_RIGHT7052* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT DIRECTIONALITY_RIGHT_TO_LEFT7053* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC7054* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER DIRECTIONALITY_EUROPEAN_NUMBER7055* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR7056* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR7057* @see Character#DIRECTIONALITY_ARABIC_NUMBER DIRECTIONALITY_ARABIC_NUMBER7058* @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR DIRECTIONALITY_COMMON_NUMBER_SEPARATOR7059* @see Character#DIRECTIONALITY_NONSPACING_MARK DIRECTIONALITY_NONSPACING_MARK7060* @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL DIRECTIONALITY_BOUNDARY_NEUTRAL7061* @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR DIRECTIONALITY_PARAGRAPH_SEPARATOR7062* @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR DIRECTIONALITY_SEGMENT_SEPARATOR7063* @see Character#DIRECTIONALITY_WHITESPACE DIRECTIONALITY_WHITESPACE7064* @see Character#DIRECTIONALITY_OTHER_NEUTRALS DIRECTIONALITY_OTHER_NEUTRALS7065* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING7066* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE7067* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING7068* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE7069* @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT DIRECTIONALITY_POP_DIRECTIONAL_FORMAT7070* @since 1.57071*/7072public static byte getDirectionality(int codePoint) {7073return CharacterData.of(codePoint).getDirectionality(codePoint);7074}70757076/**7077* Determines whether the character is mirrored according to the7078* Unicode specification. Mirrored characters should have their7079* glyphs horizontally mirrored when displayed in text that is7080* right-to-left. For example, {@code '\u005Cu0028'} LEFT7081* PARENTHESIS is semantically defined to be an <i>opening7082* parenthesis</i>. This will appear as a "(" in text that is7083* left-to-right but as a ")" in text that is right-to-left.7084*7085* <p><b>Note:</b> This method cannot handle <a7086* href="#supplementary"> supplementary characters</a>. To support7087* all Unicode characters, including supplementary characters, use7088* the {@link #isMirrored(int)} method.7089*7090* @param ch {@code char} for which the mirrored property is requested7091* @return {@code true} if the char is mirrored, {@code false}7092* if the {@code char} is not mirrored or is not defined.7093* @since 1.47094*/7095public static boolean isMirrored(char ch) {7096return isMirrored((int)ch);7097}70987099/**7100* Determines whether the specified character (Unicode code point)7101* is mirrored according to the Unicode specification. Mirrored7102* characters should have their glyphs horizontally mirrored when7103* displayed in text that is right-to-left. For example,7104* {@code '\u005Cu0028'} LEFT PARENTHESIS is semantically7105* defined to be an <i>opening parenthesis</i>. This will appear7106* as a "(" in text that is left-to-right but as a ")" in text7107* that is right-to-left.7108*7109* @param codePoint the character (Unicode code point) to be tested.7110* @return {@code true} if the character is mirrored, {@code false}7111* if the character is not mirrored or is not defined.7112* @since 1.57113*/7114public static boolean isMirrored(int codePoint) {7115return CharacterData.of(codePoint).isMirrored(codePoint);7116}71177118/**7119* Compares two {@code Character} objects numerically.7120*7121* @param anotherCharacter the {@code Character} to be compared.71227123* @return the value {@code 0} if the argument {@code Character}7124* is equal to this {@code Character}; a value less than7125* {@code 0} if this {@code Character} is numerically less7126* than the {@code Character} argument; and a value greater than7127* {@code 0} if this {@code Character} is numerically greater7128* than the {@code Character} argument (unsigned comparison).7129* Note that this is strictly a numerical comparison; it is not7130* locale-dependent.7131* @since 1.27132*/7133public int compareTo(Character anotherCharacter) {7134return compare(this.value, anotherCharacter.value);7135}71367137/**7138* Compares two {@code char} values numerically.7139* The value returned is identical to what would be returned by:7140* <pre>7141* Character.valueOf(x).compareTo(Character.valueOf(y))7142* </pre>7143*7144* @param x the first {@code char} to compare7145* @param y the second {@code char} to compare7146* @return the value {@code 0} if {@code x == y};7147* a value less than {@code 0} if {@code x < y}; and7148* a value greater than {@code 0} if {@code x > y}7149* @since 1.77150*/7151public static int compare(char x, char y) {7152return x - y;7153}71547155/**7156* Converts the character (Unicode code point) argument to uppercase using7157* information from the UnicodeData file.7158* <p>7159*7160* @param codePoint the character (Unicode code point) to be converted.7161* @return either the uppercase equivalent of the character, if7162* any, or an error flag ({@code Character.ERROR})7163* that indicates that a 1:M {@code char} mapping exists.7164* @see Character#isLowerCase(char)7165* @see Character#isUpperCase(char)7166* @see Character#toLowerCase(char)7167* @see Character#toTitleCase(char)7168* @since 1.47169*/7170static int toUpperCaseEx(int codePoint) {7171assert isValidCodePoint(codePoint);7172return CharacterData.of(codePoint).toUpperCaseEx(codePoint);7173}71747175/**7176* Converts the character (Unicode code point) argument to uppercase using case7177* mapping information from the SpecialCasing file in the Unicode7178* specification. If a character has no explicit uppercase7179* mapping, then the {@code char} itself is returned in the7180* {@code char[]}.7181*7182* @param codePoint the character (Unicode code point) to be converted.7183* @return a {@code char[]} with the uppercased character.7184* @since 1.47185*/7186static char[] toUpperCaseCharArray(int codePoint) {7187// As of Unicode 6.0, 1:M uppercasings only happen in the BMP.7188assert isBmpCodePoint(codePoint);7189return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint);7190}71917192/**7193* The number of bits used to represent a <tt>char</tt> value in unsigned7194* binary form, constant {@code 16}.7195*7196* @since 1.57197*/7198public static final int SIZE = 16;71997200/**7201* The number of bytes used to represent a {@code char} value in unsigned7202* binary form.7203*7204* @since 1.87205*/7206public static final int BYTES = SIZE / Byte.SIZE;72077208/**7209* Returns the value obtained by reversing the order of the bytes in the7210* specified <tt>char</tt> value.7211*7212* @param ch The {@code char} of which to reverse the byte order.7213* @return the value obtained by reversing (or, equivalently, swapping)7214* the bytes in the specified <tt>char</tt> value.7215* @since 1.57216*/7217public static char reverseBytes(char ch) {7218return (char) (((ch & 0xFF00) >> 8) | (ch << 8));7219}72207221/**7222* Returns the Unicode name of the specified character7223* {@code codePoint}, or null if the code point is7224* {@link #UNASSIGNED unassigned}.7225* <p>7226* Note: if the specified character is not assigned a name by7227* the <i>UnicodeData</i> file (part of the Unicode Character7228* Database maintained by the Unicode Consortium), the returned7229* name is the same as the result of expression.7230*7231* <blockquote>{@code7232* Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ')7233* + " "7234* + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7235*7236* }</blockquote>7237*7238* @param codePoint the character (Unicode code point)7239*7240* @return the Unicode name of the specified character, or null if7241* the code point is unassigned.7242*7243* @exception IllegalArgumentException if the specified7244* {@code codePoint} is not a valid Unicode7245* code point.7246*7247* @since 1.77248*/7249public static String getName(int codePoint) {7250if (!isValidCodePoint(codePoint)) {7251throw new IllegalArgumentException();7252}7253String name = CharacterName.get(codePoint);7254if (name != null)7255return name;7256if (getType(codePoint) == UNASSIGNED)7257return null;7258UnicodeBlock block = UnicodeBlock.of(codePoint);7259if (block != null)7260return block.toString().replace('_', ' ') + " "7261+ Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7262// should never come here7263return Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7264}7265}726672677268