Path: blob/jdk8u272-b10-aarch32-20201026/jdk/src/share/classes/java/lang/Character.java
83405 views
/*1* Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved.2* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.3*4* This code is free software; you can redistribute it and/or modify it5* under the terms of the GNU General Public License version 2 only, as6* published by the Free Software Foundation. Oracle designates this7* particular file as subject to the "Classpath" exception as provided8* by Oracle in the LICENSE file that accompanied this code.9*10* This code is distributed in the hope that it will be useful, but WITHOUT11* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or12* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License13* version 2 for more details (a copy is included in the LICENSE file that14* accompanied this code).15*16* You should have received a copy of the GNU General Public License version17* 2 along with this work; if not, write to the Free Software Foundation,18* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.19*20* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA21* or visit www.oracle.com if you need additional information or have any22* questions.23*/2425package java.lang;2627import java.util.Arrays;28import java.util.Map;29import java.util.HashMap;30import java.util.Locale;3132/**33* The {@code Character} class wraps a value of the primitive34* type {@code char} in an object. An object of class35* {@code Character} contains a single field whose type is36* {@code char}.37* <p>38* In addition, this class provides a large number of static methods for39* determining a character's category (lowercase letter, digit, etc.)40* and for converting characters from uppercase to lowercase and vice41* versa.42*43* <h3><a id="conformance">Unicode Conformance</a></h3>44* <p>45* The fields and methods of class {@code Character} are defined in terms46* of character information from the Unicode Standard, specifically the47* <i>UnicodeData</i> file that is part of the Unicode Character Database.48* This file specifies properties including name and category for every49* assigned Unicode code point or character range. The file is available50* from the Unicode Consortium at51* <a href="http://www.unicode.org">http://www.unicode.org</a>.52* <p>53* The Java SE 8 Platform uses character information from version 6.254* of the Unicode Standard, with two extensions. First, the Java SE 8 Platform55* allows an implementation of class {@code Character} to use the Japanese Era56* code point, {@code U+32FF}, from the first version of the Unicode Standard57* after 6.2 that assigns the code point. Second, in recognition of the fact58* that new currencies appear frequently, the Java SE 8 Platform allows an59* implementation of class {@code Character} to use the Currency Symbols60* block from version 10.0 of the Unicode Standard. Consequently, the61* behavior of fields and methods of class {@code Character} may vary across62* implementations of the Java SE 8 Platform when processing the aforementioned63* code points ( outside of version 6.2 ), except for the following methods64* that define Java identifiers:65* {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)},66* {@link #isJavaIdentifierPart(int)}, and {@link #isJavaIdentifierPart(char)}.67* Code points in Java identifiers must be drawn from version 6.2 of68* the Unicode Standard.69*70* <h3><a name="unicode">Unicode Character Representations</a></h3>71*72* <p>The {@code char} data type (and therefore the value that a73* {@code Character} object encapsulates) are based on the74* original Unicode specification, which defined characters as75* fixed-width 16-bit entities. The Unicode Standard has since been76* changed to allow for characters whose representation requires more77* than 16 bits. The range of legal <em>code point</em>s is now78* U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.79* (Refer to the <a80* href="http://www.unicode.org/reports/tr27/#notation"><i>81* definition</i></a> of the U+<i>n</i> notation in the Unicode82* Standard.)83*84* <p><a name="BMP">The set of characters from U+0000 to U+FFFF</a> is85* sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.86* <a name="supplementary">Characters</a> whose code points are greater87* than U+FFFF are called <em>supplementary character</em>s. The Java88* platform uses the UTF-16 representation in {@code char} arrays and89* in the {@code String} and {@code StringBuffer} classes. In90* this representation, supplementary characters are represented as a pair91* of {@code char} values, the first from the <em>high-surrogates</em>92* range, (\uD800-\uDBFF), the second from the93* <em>low-surrogates</em> range (\uDC00-\uDFFF).94*95* <p>A {@code char} value, therefore, represents Basic96* Multilingual Plane (BMP) code points, including the surrogate97* code points, or code units of the UTF-16 encoding. An98* {@code int} value represents all Unicode code points,99* including supplementary code points. The lower (least significant)100* 21 bits of {@code int} are used to represent Unicode code101* points and the upper (most significant) 11 bits must be zero.102* Unless otherwise specified, the behavior with respect to103* supplementary characters and surrogate {@code char} values is104* as follows:105*106* <ul>107* <li>The methods that only accept a {@code char} value cannot support108* supplementary characters. They treat {@code char} values from the109* surrogate ranges as undefined characters. For example,110* {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though111* this specific value if followed by any low-surrogate value in a string112* would represent a letter.113*114* <li>The methods that accept an {@code int} value support all115* Unicode characters, including supplementary characters. For116* example, {@code Character.isLetter(0x2F81A)} returns117* {@code true} because the code point value represents a letter118* (a CJK ideograph).119* </ul>120*121* <p>In the Java SE API documentation, <em>Unicode code point</em> is122* used for character values in the range between U+0000 and U+10FFFF,123* and <em>Unicode code unit</em> is used for 16-bit124* {@code char} values that are code units of the <em>UTF-16</em>125* encoding. For more information on Unicode terminology, refer to the126* <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>.127*128* @author Lee Boynton129* @author Guy Steele130* @author Akira Tanaka131* @author Martin Buchholz132* @author Ulf Zibis133* @since 1.0134*/135public final136class Character implements java.io.Serializable, Comparable<Character> {137/**138* The minimum radix available for conversion to and from strings.139* The constant value of this field is the smallest value permitted140* for the radix argument in radix-conversion methods such as the141* {@code digit} method, the {@code forDigit} method, and the142* {@code toString} method of class {@code Integer}.143*144* @see Character#digit(char, int)145* @see Character#forDigit(int, int)146* @see Integer#toString(int, int)147* @see Integer#valueOf(String)148*/149public static final int MIN_RADIX = 2;150151/**152* The maximum radix available for conversion to and from strings.153* The constant value of this field is the largest value permitted154* for the radix argument in radix-conversion methods such as the155* {@code digit} method, the {@code forDigit} method, and the156* {@code toString} method of class {@code Integer}.157*158* @see Character#digit(char, int)159* @see Character#forDigit(int, int)160* @see Integer#toString(int, int)161* @see Integer#valueOf(String)162*/163public static final int MAX_RADIX = 36;164165/**166* The constant value of this field is the smallest value of type167* {@code char}, {@code '\u005Cu0000'}.168*169* @since 1.0.2170*/171public static final char MIN_VALUE = '\u0000';172173/**174* The constant value of this field is the largest value of type175* {@code char}, {@code '\u005CuFFFF'}.176*177* @since 1.0.2178*/179public static final char MAX_VALUE = '\uFFFF';180181/**182* The {@code Class} instance representing the primitive type183* {@code char}.184*185* @since 1.1186*/187@SuppressWarnings("unchecked")188public static final Class<Character> TYPE = (Class<Character>) Class.getPrimitiveClass("char");189190/*191* Normative general types192*/193194/*195* General character types196*/197198/**199* General category "Cn" in the Unicode specification.200* @since 1.1201*/202public static final byte UNASSIGNED = 0;203204/**205* General category "Lu" in the Unicode specification.206* @since 1.1207*/208public static final byte UPPERCASE_LETTER = 1;209210/**211* General category "Ll" in the Unicode specification.212* @since 1.1213*/214public static final byte LOWERCASE_LETTER = 2;215216/**217* General category "Lt" in the Unicode specification.218* @since 1.1219*/220public static final byte TITLECASE_LETTER = 3;221222/**223* General category "Lm" in the Unicode specification.224* @since 1.1225*/226public static final byte MODIFIER_LETTER = 4;227228/**229* General category "Lo" in the Unicode specification.230* @since 1.1231*/232public static final byte OTHER_LETTER = 5;233234/**235* General category "Mn" in the Unicode specification.236* @since 1.1237*/238public static final byte NON_SPACING_MARK = 6;239240/**241* General category "Me" in the Unicode specification.242* @since 1.1243*/244public static final byte ENCLOSING_MARK = 7;245246/**247* General category "Mc" in the Unicode specification.248* @since 1.1249*/250public static final byte COMBINING_SPACING_MARK = 8;251252/**253* General category "Nd" in the Unicode specification.254* @since 1.1255*/256public static final byte DECIMAL_DIGIT_NUMBER = 9;257258/**259* General category "Nl" in the Unicode specification.260* @since 1.1261*/262public static final byte LETTER_NUMBER = 10;263264/**265* General category "No" in the Unicode specification.266* @since 1.1267*/268public static final byte OTHER_NUMBER = 11;269270/**271* General category "Zs" in the Unicode specification.272* @since 1.1273*/274public static final byte SPACE_SEPARATOR = 12;275276/**277* General category "Zl" in the Unicode specification.278* @since 1.1279*/280public static final byte LINE_SEPARATOR = 13;281282/**283* General category "Zp" in the Unicode specification.284* @since 1.1285*/286public static final byte PARAGRAPH_SEPARATOR = 14;287288/**289* General category "Cc" in the Unicode specification.290* @since 1.1291*/292public static final byte CONTROL = 15;293294/**295* General category "Cf" in the Unicode specification.296* @since 1.1297*/298public static final byte FORMAT = 16;299300/**301* General category "Co" in the Unicode specification.302* @since 1.1303*/304public static final byte PRIVATE_USE = 18;305306/**307* General category "Cs" in the Unicode specification.308* @since 1.1309*/310public static final byte SURROGATE = 19;311312/**313* General category "Pd" in the Unicode specification.314* @since 1.1315*/316public static final byte DASH_PUNCTUATION = 20;317318/**319* General category "Ps" in the Unicode specification.320* @since 1.1321*/322public static final byte START_PUNCTUATION = 21;323324/**325* General category "Pe" in the Unicode specification.326* @since 1.1327*/328public static final byte END_PUNCTUATION = 22;329330/**331* General category "Pc" in the Unicode specification.332* @since 1.1333*/334public static final byte CONNECTOR_PUNCTUATION = 23;335336/**337* General category "Po" in the Unicode specification.338* @since 1.1339*/340public static final byte OTHER_PUNCTUATION = 24;341342/**343* General category "Sm" in the Unicode specification.344* @since 1.1345*/346public static final byte MATH_SYMBOL = 25;347348/**349* General category "Sc" in the Unicode specification.350* @since 1.1351*/352public static final byte CURRENCY_SYMBOL = 26;353354/**355* General category "Sk" in the Unicode specification.356* @since 1.1357*/358public static final byte MODIFIER_SYMBOL = 27;359360/**361* General category "So" in the Unicode specification.362* @since 1.1363*/364public static final byte OTHER_SYMBOL = 28;365366/**367* General category "Pi" in the Unicode specification.368* @since 1.4369*/370public static final byte INITIAL_QUOTE_PUNCTUATION = 29;371372/**373* General category "Pf" in the Unicode specification.374* @since 1.4375*/376public static final byte FINAL_QUOTE_PUNCTUATION = 30;377378/**379* Error flag. Use int (code point) to avoid confusion with U+FFFF.380*/381static final int ERROR = 0xFFFFFFFF;382383384/**385* Undefined bidirectional character type. Undefined {@code char}386* values have undefined directionality in the Unicode specification.387* @since 1.4388*/389public static final byte DIRECTIONALITY_UNDEFINED = -1;390391/**392* Strong bidirectional character type "L" in the Unicode specification.393* @since 1.4394*/395public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0;396397/**398* Strong bidirectional character type "R" in the Unicode specification.399* @since 1.4400*/401public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1;402403/**404* Strong bidirectional character type "AL" in the Unicode specification.405* @since 1.4406*/407public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2;408409/**410* Weak bidirectional character type "EN" in the Unicode specification.411* @since 1.4412*/413public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3;414415/**416* Weak bidirectional character type "ES" in the Unicode specification.417* @since 1.4418*/419public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4;420421/**422* Weak bidirectional character type "ET" in the Unicode specification.423* @since 1.4424*/425public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5;426427/**428* Weak bidirectional character type "AN" in the Unicode specification.429* @since 1.4430*/431public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6;432433/**434* Weak bidirectional character type "CS" in the Unicode specification.435* @since 1.4436*/437public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7;438439/**440* Weak bidirectional character type "NSM" in the Unicode specification.441* @since 1.4442*/443public static final byte DIRECTIONALITY_NONSPACING_MARK = 8;444445/**446* Weak bidirectional character type "BN" in the Unicode specification.447* @since 1.4448*/449public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9;450451/**452* Neutral bidirectional character type "B" in the Unicode specification.453* @since 1.4454*/455public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10;456457/**458* Neutral bidirectional character type "S" in the Unicode specification.459* @since 1.4460*/461public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11;462463/**464* Neutral bidirectional character type "WS" in the Unicode specification.465* @since 1.4466*/467public static final byte DIRECTIONALITY_WHITESPACE = 12;468469/**470* Neutral bidirectional character type "ON" in the Unicode specification.471* @since 1.4472*/473public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13;474475/**476* Strong bidirectional character type "LRE" in the Unicode specification.477* @since 1.4478*/479public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14;480481/**482* Strong bidirectional character type "LRO" in the Unicode specification.483* @since 1.4484*/485public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15;486487/**488* Strong bidirectional character type "RLE" in the Unicode specification.489* @since 1.4490*/491public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16;492493/**494* Strong bidirectional character type "RLO" in the Unicode specification.495* @since 1.4496*/497public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17;498499/**500* Weak bidirectional character type "PDF" in the Unicode specification.501* @since 1.4502*/503public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18;504505/**506* The minimum value of a507* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">508* Unicode high-surrogate code unit</a>509* in the UTF-16 encoding, constant {@code '\u005CuD800'}.510* A high-surrogate is also known as a <i>leading-surrogate</i>.511*512* @since 1.5513*/514public static final char MIN_HIGH_SURROGATE = '\uD800';515516/**517* The maximum value of a518* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">519* Unicode high-surrogate code unit</a>520* in the UTF-16 encoding, constant {@code '\u005CuDBFF'}.521* A high-surrogate is also known as a <i>leading-surrogate</i>.522*523* @since 1.5524*/525public static final char MAX_HIGH_SURROGATE = '\uDBFF';526527/**528* The minimum value of a529* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">530* Unicode low-surrogate code unit</a>531* in the UTF-16 encoding, constant {@code '\u005CuDC00'}.532* A low-surrogate is also known as a <i>trailing-surrogate</i>.533*534* @since 1.5535*/536public static final char MIN_LOW_SURROGATE = '\uDC00';537538/**539* The maximum value of a540* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">541* Unicode low-surrogate code unit</a>542* in the UTF-16 encoding, constant {@code '\u005CuDFFF'}.543* A low-surrogate is also known as a <i>trailing-surrogate</i>.544*545* @since 1.5546*/547public static final char MAX_LOW_SURROGATE = '\uDFFF';548549/**550* The minimum value of a Unicode surrogate code unit in the551* UTF-16 encoding, constant {@code '\u005CuD800'}.552*553* @since 1.5554*/555public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;556557/**558* The maximum value of a Unicode surrogate code unit in the559* UTF-16 encoding, constant {@code '\u005CuDFFF'}.560*561* @since 1.5562*/563public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;564565/**566* The minimum value of a567* <a href="http://www.unicode.org/glossary/#supplementary_code_point">568* Unicode supplementary code point</a>, constant {@code U+10000}.569*570* @since 1.5571*/572public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;573574/**575* The minimum value of a576* <a href="http://www.unicode.org/glossary/#code_point">577* Unicode code point</a>, constant {@code U+0000}.578*579* @since 1.5580*/581public static final int MIN_CODE_POINT = 0x000000;582583/**584* The maximum value of a585* <a href="http://www.unicode.org/glossary/#code_point">586* Unicode code point</a>, constant {@code U+10FFFF}.587*588* @since 1.5589*/590public static final int MAX_CODE_POINT = 0X10FFFF;591592593/**594* Instances of this class represent particular subsets of the Unicode595* character set. The only family of subsets defined in the596* {@code Character} class is {@link Character.UnicodeBlock}.597* Other portions of the Java API may define other subsets for their598* own purposes.599*600* @since 1.2601*/602public static class Subset {603604private String name;605606/**607* Constructs a new {@code Subset} instance.608*609* @param name The name of this subset610* @exception NullPointerException if name is {@code null}611*/612protected Subset(String name) {613if (name == null) {614throw new NullPointerException("name");615}616this.name = name;617}618619/**620* Compares two {@code Subset} objects for equality.621* This method returns {@code true} if and only if622* {@code this} and the argument refer to the same623* object; since this method is {@code final}, this624* guarantee holds for all subclasses.625*/626public final boolean equals(Object obj) {627return (this == obj);628}629630/**631* Returns the standard hash code as defined by the632* {@link Object#hashCode} method. This method633* is {@code final} in order to ensure that the634* {@code equals} and {@code hashCode} methods will635* be consistent in all subclasses.636*/637public final int hashCode() {638return super.hashCode();639}640641/**642* Returns the name of this subset.643*/644public final String toString() {645return name;646}647}648649// See http://www.unicode.org/Public/UNIDATA/Blocks.txt650// for the latest specification of Unicode Blocks.651652/**653* A family of character subsets representing the character blocks in the654* Unicode specification. Character blocks generally define characters655* used for a specific script or purpose. A character is contained by656* at most one Unicode block.657*658* @since 1.2659*/660public static final class UnicodeBlock extends Subset {661662private static Map<String, UnicodeBlock> map = new HashMap<>(256);663664/**665* Creates a UnicodeBlock with the given identifier name.666* This name must be the same as the block identifier.667*/668private UnicodeBlock(String idName) {669super(idName);670map.put(idName, this);671}672673/**674* Creates a UnicodeBlock with the given identifier name and675* alias name.676*/677private UnicodeBlock(String idName, String alias) {678this(idName);679map.put(alias, this);680}681682/**683* Creates a UnicodeBlock with the given identifier name and684* alias names.685*/686private UnicodeBlock(String idName, String... aliases) {687this(idName);688for (String alias : aliases)689map.put(alias, this);690}691692/**693* Constant for the "Basic Latin" Unicode character block.694* @since 1.2695*/696public static final UnicodeBlock BASIC_LATIN =697new UnicodeBlock("BASIC_LATIN",698"BASIC LATIN",699"BASICLATIN");700701/**702* Constant for the "Latin-1 Supplement" Unicode character block.703* @since 1.2704*/705public static final UnicodeBlock LATIN_1_SUPPLEMENT =706new UnicodeBlock("LATIN_1_SUPPLEMENT",707"LATIN-1 SUPPLEMENT",708"LATIN-1SUPPLEMENT");709710/**711* Constant for the "Latin Extended-A" Unicode character block.712* @since 1.2713*/714public static final UnicodeBlock LATIN_EXTENDED_A =715new UnicodeBlock("LATIN_EXTENDED_A",716"LATIN EXTENDED-A",717"LATINEXTENDED-A");718719/**720* Constant for the "Latin Extended-B" Unicode character block.721* @since 1.2722*/723public static final UnicodeBlock LATIN_EXTENDED_B =724new UnicodeBlock("LATIN_EXTENDED_B",725"LATIN EXTENDED-B",726"LATINEXTENDED-B");727728/**729* Constant for the "IPA Extensions" Unicode character block.730* @since 1.2731*/732public static final UnicodeBlock IPA_EXTENSIONS =733new UnicodeBlock("IPA_EXTENSIONS",734"IPA EXTENSIONS",735"IPAEXTENSIONS");736737/**738* Constant for the "Spacing Modifier Letters" Unicode character block.739* @since 1.2740*/741public static final UnicodeBlock SPACING_MODIFIER_LETTERS =742new UnicodeBlock("SPACING_MODIFIER_LETTERS",743"SPACING MODIFIER LETTERS",744"SPACINGMODIFIERLETTERS");745746/**747* Constant for the "Combining Diacritical Marks" Unicode character block.748* @since 1.2749*/750public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS =751new UnicodeBlock("COMBINING_DIACRITICAL_MARKS",752"COMBINING DIACRITICAL MARKS",753"COMBININGDIACRITICALMARKS");754755/**756* Constant for the "Greek and Coptic" Unicode character block.757* <p>758* This block was previously known as the "Greek" block.759*760* @since 1.2761*/762public static final UnicodeBlock GREEK =763new UnicodeBlock("GREEK",764"GREEK AND COPTIC",765"GREEKANDCOPTIC");766767/**768* Constant for the "Cyrillic" Unicode character block.769* @since 1.2770*/771public static final UnicodeBlock CYRILLIC =772new UnicodeBlock("CYRILLIC");773774/**775* Constant for the "Armenian" Unicode character block.776* @since 1.2777*/778public static final UnicodeBlock ARMENIAN =779new UnicodeBlock("ARMENIAN");780781/**782* Constant for the "Hebrew" Unicode character block.783* @since 1.2784*/785public static final UnicodeBlock HEBREW =786new UnicodeBlock("HEBREW");787788/**789* Constant for the "Arabic" Unicode character block.790* @since 1.2791*/792public static final UnicodeBlock ARABIC =793new UnicodeBlock("ARABIC");794795/**796* Constant for the "Devanagari" Unicode character block.797* @since 1.2798*/799public static final UnicodeBlock DEVANAGARI =800new UnicodeBlock("DEVANAGARI");801802/**803* Constant for the "Bengali" Unicode character block.804* @since 1.2805*/806public static final UnicodeBlock BENGALI =807new UnicodeBlock("BENGALI");808809/**810* Constant for the "Gurmukhi" Unicode character block.811* @since 1.2812*/813public static final UnicodeBlock GURMUKHI =814new UnicodeBlock("GURMUKHI");815816/**817* Constant for the "Gujarati" Unicode character block.818* @since 1.2819*/820public static final UnicodeBlock GUJARATI =821new UnicodeBlock("GUJARATI");822823/**824* Constant for the "Oriya" Unicode character block.825* @since 1.2826*/827public static final UnicodeBlock ORIYA =828new UnicodeBlock("ORIYA");829830/**831* Constant for the "Tamil" Unicode character block.832* @since 1.2833*/834public static final UnicodeBlock TAMIL =835new UnicodeBlock("TAMIL");836837/**838* Constant for the "Telugu" Unicode character block.839* @since 1.2840*/841public static final UnicodeBlock TELUGU =842new UnicodeBlock("TELUGU");843844/**845* Constant for the "Kannada" Unicode character block.846* @since 1.2847*/848public static final UnicodeBlock KANNADA =849new UnicodeBlock("KANNADA");850851/**852* Constant for the "Malayalam" Unicode character block.853* @since 1.2854*/855public static final UnicodeBlock MALAYALAM =856new UnicodeBlock("MALAYALAM");857858/**859* Constant for the "Thai" Unicode character block.860* @since 1.2861*/862public static final UnicodeBlock THAI =863new UnicodeBlock("THAI");864865/**866* Constant for the "Lao" Unicode character block.867* @since 1.2868*/869public static final UnicodeBlock LAO =870new UnicodeBlock("LAO");871872/**873* Constant for the "Tibetan" Unicode character block.874* @since 1.2875*/876public static final UnicodeBlock TIBETAN =877new UnicodeBlock("TIBETAN");878879/**880* Constant for the "Georgian" Unicode character block.881* @since 1.2882*/883public static final UnicodeBlock GEORGIAN =884new UnicodeBlock("GEORGIAN");885886/**887* Constant for the "Hangul Jamo" Unicode character block.888* @since 1.2889*/890public static final UnicodeBlock HANGUL_JAMO =891new UnicodeBlock("HANGUL_JAMO",892"HANGUL JAMO",893"HANGULJAMO");894895/**896* Constant for the "Latin Extended Additional" Unicode character block.897* @since 1.2898*/899public static final UnicodeBlock LATIN_EXTENDED_ADDITIONAL =900new UnicodeBlock("LATIN_EXTENDED_ADDITIONAL",901"LATIN EXTENDED ADDITIONAL",902"LATINEXTENDEDADDITIONAL");903904/**905* Constant for the "Greek Extended" Unicode character block.906* @since 1.2907*/908public static final UnicodeBlock GREEK_EXTENDED =909new UnicodeBlock("GREEK_EXTENDED",910"GREEK EXTENDED",911"GREEKEXTENDED");912913/**914* Constant for the "General Punctuation" Unicode character block.915* @since 1.2916*/917public static final UnicodeBlock GENERAL_PUNCTUATION =918new UnicodeBlock("GENERAL_PUNCTUATION",919"GENERAL PUNCTUATION",920"GENERALPUNCTUATION");921922/**923* Constant for the "Superscripts and Subscripts" Unicode character924* block.925* @since 1.2926*/927public static final UnicodeBlock SUPERSCRIPTS_AND_SUBSCRIPTS =928new UnicodeBlock("SUPERSCRIPTS_AND_SUBSCRIPTS",929"SUPERSCRIPTS AND SUBSCRIPTS",930"SUPERSCRIPTSANDSUBSCRIPTS");931932/**933* Constant for the "Currency Symbols" Unicode character block.934* @since 1.2935*/936public static final UnicodeBlock CURRENCY_SYMBOLS =937new UnicodeBlock("CURRENCY_SYMBOLS",938"CURRENCY SYMBOLS",939"CURRENCYSYMBOLS");940941/**942* Constant for the "Combining Diacritical Marks for Symbols" Unicode943* character block.944* <p>945* This block was previously known as "Combining Marks for Symbols".946* @since 1.2947*/948public static final UnicodeBlock COMBINING_MARKS_FOR_SYMBOLS =949new UnicodeBlock("COMBINING_MARKS_FOR_SYMBOLS",950"COMBINING DIACRITICAL MARKS FOR SYMBOLS",951"COMBININGDIACRITICALMARKSFORSYMBOLS",952"COMBINING MARKS FOR SYMBOLS",953"COMBININGMARKSFORSYMBOLS");954955/**956* Constant for the "Letterlike Symbols" Unicode character block.957* @since 1.2958*/959public static final UnicodeBlock LETTERLIKE_SYMBOLS =960new UnicodeBlock("LETTERLIKE_SYMBOLS",961"LETTERLIKE SYMBOLS",962"LETTERLIKESYMBOLS");963964/**965* Constant for the "Number Forms" Unicode character block.966* @since 1.2967*/968public static final UnicodeBlock NUMBER_FORMS =969new UnicodeBlock("NUMBER_FORMS",970"NUMBER FORMS",971"NUMBERFORMS");972973/**974* Constant for the "Arrows" Unicode character block.975* @since 1.2976*/977public static final UnicodeBlock ARROWS =978new UnicodeBlock("ARROWS");979980/**981* Constant for the "Mathematical Operators" Unicode character block.982* @since 1.2983*/984public static final UnicodeBlock MATHEMATICAL_OPERATORS =985new UnicodeBlock("MATHEMATICAL_OPERATORS",986"MATHEMATICAL OPERATORS",987"MATHEMATICALOPERATORS");988989/**990* Constant for the "Miscellaneous Technical" Unicode character block.991* @since 1.2992*/993public static final UnicodeBlock MISCELLANEOUS_TECHNICAL =994new UnicodeBlock("MISCELLANEOUS_TECHNICAL",995"MISCELLANEOUS TECHNICAL",996"MISCELLANEOUSTECHNICAL");997998/**999* Constant for the "Control Pictures" Unicode character block.1000* @since 1.21001*/1002public static final UnicodeBlock CONTROL_PICTURES =1003new UnicodeBlock("CONTROL_PICTURES",1004"CONTROL PICTURES",1005"CONTROLPICTURES");10061007/**1008* Constant for the "Optical Character Recognition" Unicode character block.1009* @since 1.21010*/1011public static final UnicodeBlock OPTICAL_CHARACTER_RECOGNITION =1012new UnicodeBlock("OPTICAL_CHARACTER_RECOGNITION",1013"OPTICAL CHARACTER RECOGNITION",1014"OPTICALCHARACTERRECOGNITION");10151016/**1017* Constant for the "Enclosed Alphanumerics" Unicode character block.1018* @since 1.21019*/1020public static final UnicodeBlock ENCLOSED_ALPHANUMERICS =1021new UnicodeBlock("ENCLOSED_ALPHANUMERICS",1022"ENCLOSED ALPHANUMERICS",1023"ENCLOSEDALPHANUMERICS");10241025/**1026* Constant for the "Box Drawing" Unicode character block.1027* @since 1.21028*/1029public static final UnicodeBlock BOX_DRAWING =1030new UnicodeBlock("BOX_DRAWING",1031"BOX DRAWING",1032"BOXDRAWING");10331034/**1035* Constant for the "Block Elements" Unicode character block.1036* @since 1.21037*/1038public static final UnicodeBlock BLOCK_ELEMENTS =1039new UnicodeBlock("BLOCK_ELEMENTS",1040"BLOCK ELEMENTS",1041"BLOCKELEMENTS");10421043/**1044* Constant for the "Geometric Shapes" Unicode character block.1045* @since 1.21046*/1047public static final UnicodeBlock GEOMETRIC_SHAPES =1048new UnicodeBlock("GEOMETRIC_SHAPES",1049"GEOMETRIC SHAPES",1050"GEOMETRICSHAPES");10511052/**1053* Constant for the "Miscellaneous Symbols" Unicode character block.1054* @since 1.21055*/1056public static final UnicodeBlock MISCELLANEOUS_SYMBOLS =1057new UnicodeBlock("MISCELLANEOUS_SYMBOLS",1058"MISCELLANEOUS SYMBOLS",1059"MISCELLANEOUSSYMBOLS");10601061/**1062* Constant for the "Dingbats" Unicode character block.1063* @since 1.21064*/1065public static final UnicodeBlock DINGBATS =1066new UnicodeBlock("DINGBATS");10671068/**1069* Constant for the "CJK Symbols and Punctuation" Unicode character block.1070* @since 1.21071*/1072public static final UnicodeBlock CJK_SYMBOLS_AND_PUNCTUATION =1073new UnicodeBlock("CJK_SYMBOLS_AND_PUNCTUATION",1074"CJK SYMBOLS AND PUNCTUATION",1075"CJKSYMBOLSANDPUNCTUATION");10761077/**1078* Constant for the "Hiragana" Unicode character block.1079* @since 1.21080*/1081public static final UnicodeBlock HIRAGANA =1082new UnicodeBlock("HIRAGANA");10831084/**1085* Constant for the "Katakana" Unicode character block.1086* @since 1.21087*/1088public static final UnicodeBlock KATAKANA =1089new UnicodeBlock("KATAKANA");10901091/**1092* Constant for the "Bopomofo" Unicode character block.1093* @since 1.21094*/1095public static final UnicodeBlock BOPOMOFO =1096new UnicodeBlock("BOPOMOFO");10971098/**1099* Constant for the "Hangul Compatibility Jamo" Unicode character block.1100* @since 1.21101*/1102public static final UnicodeBlock HANGUL_COMPATIBILITY_JAMO =1103new UnicodeBlock("HANGUL_COMPATIBILITY_JAMO",1104"HANGUL COMPATIBILITY JAMO",1105"HANGULCOMPATIBILITYJAMO");11061107/**1108* Constant for the "Kanbun" Unicode character block.1109* @since 1.21110*/1111public static final UnicodeBlock KANBUN =1112new UnicodeBlock("KANBUN");11131114/**1115* Constant for the "Enclosed CJK Letters and Months" Unicode character block.1116* @since 1.21117*/1118public static final UnicodeBlock ENCLOSED_CJK_LETTERS_AND_MONTHS =1119new UnicodeBlock("ENCLOSED_CJK_LETTERS_AND_MONTHS",1120"ENCLOSED CJK LETTERS AND MONTHS",1121"ENCLOSEDCJKLETTERSANDMONTHS");11221123/**1124* Constant for the "CJK Compatibility" Unicode character block.1125* @since 1.21126*/1127public static final UnicodeBlock CJK_COMPATIBILITY =1128new UnicodeBlock("CJK_COMPATIBILITY",1129"CJK COMPATIBILITY",1130"CJKCOMPATIBILITY");11311132/**1133* Constant for the "CJK Unified Ideographs" Unicode character block.1134* @since 1.21135*/1136public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS =1137new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS",1138"CJK UNIFIED IDEOGRAPHS",1139"CJKUNIFIEDIDEOGRAPHS");11401141/**1142* Constant for the "Hangul Syllables" Unicode character block.1143* @since 1.21144*/1145public static final UnicodeBlock HANGUL_SYLLABLES =1146new UnicodeBlock("HANGUL_SYLLABLES",1147"HANGUL SYLLABLES",1148"HANGULSYLLABLES");11491150/**1151* Constant for the "Private Use Area" Unicode character block.1152* @since 1.21153*/1154public static final UnicodeBlock PRIVATE_USE_AREA =1155new UnicodeBlock("PRIVATE_USE_AREA",1156"PRIVATE USE AREA",1157"PRIVATEUSEAREA");11581159/**1160* Constant for the "CJK Compatibility Ideographs" Unicode character1161* block.1162* @since 1.21163*/1164public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS =1165new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS",1166"CJK COMPATIBILITY IDEOGRAPHS",1167"CJKCOMPATIBILITYIDEOGRAPHS");11681169/**1170* Constant for the "Alphabetic Presentation Forms" Unicode character block.1171* @since 1.21172*/1173public static final UnicodeBlock ALPHABETIC_PRESENTATION_FORMS =1174new UnicodeBlock("ALPHABETIC_PRESENTATION_FORMS",1175"ALPHABETIC PRESENTATION FORMS",1176"ALPHABETICPRESENTATIONFORMS");11771178/**1179* Constant for the "Arabic Presentation Forms-A" Unicode character1180* block.1181* @since 1.21182*/1183public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_A =1184new UnicodeBlock("ARABIC_PRESENTATION_FORMS_A",1185"ARABIC PRESENTATION FORMS-A",1186"ARABICPRESENTATIONFORMS-A");11871188/**1189* Constant for the "Combining Half Marks" Unicode character block.1190* @since 1.21191*/1192public static final UnicodeBlock COMBINING_HALF_MARKS =1193new UnicodeBlock("COMBINING_HALF_MARKS",1194"COMBINING HALF MARKS",1195"COMBININGHALFMARKS");11961197/**1198* Constant for the "CJK Compatibility Forms" Unicode character block.1199* @since 1.21200*/1201public static final UnicodeBlock CJK_COMPATIBILITY_FORMS =1202new UnicodeBlock("CJK_COMPATIBILITY_FORMS",1203"CJK COMPATIBILITY FORMS",1204"CJKCOMPATIBILITYFORMS");12051206/**1207* Constant for the "Small Form Variants" Unicode character block.1208* @since 1.21209*/1210public static final UnicodeBlock SMALL_FORM_VARIANTS =1211new UnicodeBlock("SMALL_FORM_VARIANTS",1212"SMALL FORM VARIANTS",1213"SMALLFORMVARIANTS");12141215/**1216* Constant for the "Arabic Presentation Forms-B" Unicode character block.1217* @since 1.21218*/1219public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_B =1220new UnicodeBlock("ARABIC_PRESENTATION_FORMS_B",1221"ARABIC PRESENTATION FORMS-B",1222"ARABICPRESENTATIONFORMS-B");12231224/**1225* Constant for the "Halfwidth and Fullwidth Forms" Unicode character1226* block.1227* @since 1.21228*/1229public static final UnicodeBlock HALFWIDTH_AND_FULLWIDTH_FORMS =1230new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS",1231"HALFWIDTH AND FULLWIDTH FORMS",1232"HALFWIDTHANDFULLWIDTHFORMS");12331234/**1235* Constant for the "Specials" Unicode character block.1236* @since 1.21237*/1238public static final UnicodeBlock SPECIALS =1239new UnicodeBlock("SPECIALS");12401241/**1242* @deprecated As of J2SE 5, use {@link #HIGH_SURROGATES},1243* {@link #HIGH_PRIVATE_USE_SURROGATES}, and1244* {@link #LOW_SURROGATES}. These new constants match1245* the block definitions of the Unicode Standard.1246* The {@link #of(char)} and {@link #of(int)} methods1247* return the new constants, not SURROGATES_AREA.1248*/1249@Deprecated1250public static final UnicodeBlock SURROGATES_AREA =1251new UnicodeBlock("SURROGATES_AREA");12521253/**1254* Constant for the "Syriac" Unicode character block.1255* @since 1.41256*/1257public static final UnicodeBlock SYRIAC =1258new UnicodeBlock("SYRIAC");12591260/**1261* Constant for the "Thaana" Unicode character block.1262* @since 1.41263*/1264public static final UnicodeBlock THAANA =1265new UnicodeBlock("THAANA");12661267/**1268* Constant for the "Sinhala" Unicode character block.1269* @since 1.41270*/1271public static final UnicodeBlock SINHALA =1272new UnicodeBlock("SINHALA");12731274/**1275* Constant for the "Myanmar" Unicode character block.1276* @since 1.41277*/1278public static final UnicodeBlock MYANMAR =1279new UnicodeBlock("MYANMAR");12801281/**1282* Constant for the "Ethiopic" Unicode character block.1283* @since 1.41284*/1285public static final UnicodeBlock ETHIOPIC =1286new UnicodeBlock("ETHIOPIC");12871288/**1289* Constant for the "Cherokee" Unicode character block.1290* @since 1.41291*/1292public static final UnicodeBlock CHEROKEE =1293new UnicodeBlock("CHEROKEE");12941295/**1296* Constant for the "Unified Canadian Aboriginal Syllabics" Unicode character block.1297* @since 1.41298*/1299public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS =1300new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS",1301"UNIFIED CANADIAN ABORIGINAL SYLLABICS",1302"UNIFIEDCANADIANABORIGINALSYLLABICS");13031304/**1305* Constant for the "Ogham" Unicode character block.1306* @since 1.41307*/1308public static final UnicodeBlock OGHAM =1309new UnicodeBlock("OGHAM");13101311/**1312* Constant for the "Runic" Unicode character block.1313* @since 1.41314*/1315public static final UnicodeBlock RUNIC =1316new UnicodeBlock("RUNIC");13171318/**1319* Constant for the "Khmer" Unicode character block.1320* @since 1.41321*/1322public static final UnicodeBlock KHMER =1323new UnicodeBlock("KHMER");13241325/**1326* Constant for the "Mongolian" Unicode character block.1327* @since 1.41328*/1329public static final UnicodeBlock MONGOLIAN =1330new UnicodeBlock("MONGOLIAN");13311332/**1333* Constant for the "Braille Patterns" Unicode character block.1334* @since 1.41335*/1336public static final UnicodeBlock BRAILLE_PATTERNS =1337new UnicodeBlock("BRAILLE_PATTERNS",1338"BRAILLE PATTERNS",1339"BRAILLEPATTERNS");13401341/**1342* Constant for the "CJK Radicals Supplement" Unicode character block.1343* @since 1.41344*/1345public static final UnicodeBlock CJK_RADICALS_SUPPLEMENT =1346new UnicodeBlock("CJK_RADICALS_SUPPLEMENT",1347"CJK RADICALS SUPPLEMENT",1348"CJKRADICALSSUPPLEMENT");13491350/**1351* Constant for the "Kangxi Radicals" Unicode character block.1352* @since 1.41353*/1354public static final UnicodeBlock KANGXI_RADICALS =1355new UnicodeBlock("KANGXI_RADICALS",1356"KANGXI RADICALS",1357"KANGXIRADICALS");13581359/**1360* Constant for the "Ideographic Description Characters" Unicode character block.1361* @since 1.41362*/1363public static final UnicodeBlock IDEOGRAPHIC_DESCRIPTION_CHARACTERS =1364new UnicodeBlock("IDEOGRAPHIC_DESCRIPTION_CHARACTERS",1365"IDEOGRAPHIC DESCRIPTION CHARACTERS",1366"IDEOGRAPHICDESCRIPTIONCHARACTERS");13671368/**1369* Constant for the "Bopomofo Extended" Unicode character block.1370* @since 1.41371*/1372public static final UnicodeBlock BOPOMOFO_EXTENDED =1373new UnicodeBlock("BOPOMOFO_EXTENDED",1374"BOPOMOFO EXTENDED",1375"BOPOMOFOEXTENDED");13761377/**1378* Constant for the "CJK Unified Ideographs Extension A" Unicode character block.1379* @since 1.41380*/1381public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A =1382new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A",1383"CJK UNIFIED IDEOGRAPHS EXTENSION A",1384"CJKUNIFIEDIDEOGRAPHSEXTENSIONA");13851386/**1387* Constant for the "Yi Syllables" Unicode character block.1388* @since 1.41389*/1390public static final UnicodeBlock YI_SYLLABLES =1391new UnicodeBlock("YI_SYLLABLES",1392"YI SYLLABLES",1393"YISYLLABLES");13941395/**1396* Constant for the "Yi Radicals" Unicode character block.1397* @since 1.41398*/1399public static final UnicodeBlock YI_RADICALS =1400new UnicodeBlock("YI_RADICALS",1401"YI RADICALS",1402"YIRADICALS");14031404/**1405* Constant for the "Cyrillic Supplementary" Unicode character block.1406* @since 1.51407*/1408public static final UnicodeBlock CYRILLIC_SUPPLEMENTARY =1409new UnicodeBlock("CYRILLIC_SUPPLEMENTARY",1410"CYRILLIC SUPPLEMENTARY",1411"CYRILLICSUPPLEMENTARY",1412"CYRILLIC SUPPLEMENT",1413"CYRILLICSUPPLEMENT");14141415/**1416* Constant for the "Tagalog" Unicode character block.1417* @since 1.51418*/1419public static final UnicodeBlock TAGALOG =1420new UnicodeBlock("TAGALOG");14211422/**1423* Constant for the "Hanunoo" Unicode character block.1424* @since 1.51425*/1426public static final UnicodeBlock HANUNOO =1427new UnicodeBlock("HANUNOO");14281429/**1430* Constant for the "Buhid" Unicode character block.1431* @since 1.51432*/1433public static final UnicodeBlock BUHID =1434new UnicodeBlock("BUHID");14351436/**1437* Constant for the "Tagbanwa" Unicode character block.1438* @since 1.51439*/1440public static final UnicodeBlock TAGBANWA =1441new UnicodeBlock("TAGBANWA");14421443/**1444* Constant for the "Limbu" Unicode character block.1445* @since 1.51446*/1447public static final UnicodeBlock LIMBU =1448new UnicodeBlock("LIMBU");14491450/**1451* Constant for the "Tai Le" Unicode character block.1452* @since 1.51453*/1454public static final UnicodeBlock TAI_LE =1455new UnicodeBlock("TAI_LE",1456"TAI LE",1457"TAILE");14581459/**1460* Constant for the "Khmer Symbols" Unicode character block.1461* @since 1.51462*/1463public static final UnicodeBlock KHMER_SYMBOLS =1464new UnicodeBlock("KHMER_SYMBOLS",1465"KHMER SYMBOLS",1466"KHMERSYMBOLS");14671468/**1469* Constant for the "Phonetic Extensions" Unicode character block.1470* @since 1.51471*/1472public static final UnicodeBlock PHONETIC_EXTENSIONS =1473new UnicodeBlock("PHONETIC_EXTENSIONS",1474"PHONETIC EXTENSIONS",1475"PHONETICEXTENSIONS");14761477/**1478* Constant for the "Miscellaneous Mathematical Symbols-A" Unicode character block.1479* @since 1.51480*/1481public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A =1482new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A",1483"MISCELLANEOUS MATHEMATICAL SYMBOLS-A",1484"MISCELLANEOUSMATHEMATICALSYMBOLS-A");14851486/**1487* Constant for the "Supplemental Arrows-A" Unicode character block.1488* @since 1.51489*/1490public static final UnicodeBlock SUPPLEMENTAL_ARROWS_A =1491new UnicodeBlock("SUPPLEMENTAL_ARROWS_A",1492"SUPPLEMENTAL ARROWS-A",1493"SUPPLEMENTALARROWS-A");14941495/**1496* Constant for the "Supplemental Arrows-B" Unicode character block.1497* @since 1.51498*/1499public static final UnicodeBlock SUPPLEMENTAL_ARROWS_B =1500new UnicodeBlock("SUPPLEMENTAL_ARROWS_B",1501"SUPPLEMENTAL ARROWS-B",1502"SUPPLEMENTALARROWS-B");15031504/**1505* Constant for the "Miscellaneous Mathematical Symbols-B" Unicode1506* character block.1507* @since 1.51508*/1509public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B =1510new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B",1511"MISCELLANEOUS MATHEMATICAL SYMBOLS-B",1512"MISCELLANEOUSMATHEMATICALSYMBOLS-B");15131514/**1515* Constant for the "Supplemental Mathematical Operators" Unicode1516* character block.1517* @since 1.51518*/1519public static final UnicodeBlock SUPPLEMENTAL_MATHEMATICAL_OPERATORS =1520new UnicodeBlock("SUPPLEMENTAL_MATHEMATICAL_OPERATORS",1521"SUPPLEMENTAL MATHEMATICAL OPERATORS",1522"SUPPLEMENTALMATHEMATICALOPERATORS");15231524/**1525* Constant for the "Miscellaneous Symbols and Arrows" Unicode character1526* block.1527* @since 1.51528*/1529public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_ARROWS =1530new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_ARROWS",1531"MISCELLANEOUS SYMBOLS AND ARROWS",1532"MISCELLANEOUSSYMBOLSANDARROWS");15331534/**1535* Constant for the "Katakana Phonetic Extensions" Unicode character1536* block.1537* @since 1.51538*/1539public static final UnicodeBlock KATAKANA_PHONETIC_EXTENSIONS =1540new UnicodeBlock("KATAKANA_PHONETIC_EXTENSIONS",1541"KATAKANA PHONETIC EXTENSIONS",1542"KATAKANAPHONETICEXTENSIONS");15431544/**1545* Constant for the "Yijing Hexagram Symbols" Unicode character block.1546* @since 1.51547*/1548public static final UnicodeBlock YIJING_HEXAGRAM_SYMBOLS =1549new UnicodeBlock("YIJING_HEXAGRAM_SYMBOLS",1550"YIJING HEXAGRAM SYMBOLS",1551"YIJINGHEXAGRAMSYMBOLS");15521553/**1554* Constant for the "Variation Selectors" Unicode character block.1555* @since 1.51556*/1557public static final UnicodeBlock VARIATION_SELECTORS =1558new UnicodeBlock("VARIATION_SELECTORS",1559"VARIATION SELECTORS",1560"VARIATIONSELECTORS");15611562/**1563* Constant for the "Linear B Syllabary" Unicode character block.1564* @since 1.51565*/1566public static final UnicodeBlock LINEAR_B_SYLLABARY =1567new UnicodeBlock("LINEAR_B_SYLLABARY",1568"LINEAR B SYLLABARY",1569"LINEARBSYLLABARY");15701571/**1572* Constant for the "Linear B Ideograms" Unicode character block.1573* @since 1.51574*/1575public static final UnicodeBlock LINEAR_B_IDEOGRAMS =1576new UnicodeBlock("LINEAR_B_IDEOGRAMS",1577"LINEAR B IDEOGRAMS",1578"LINEARBIDEOGRAMS");15791580/**1581* Constant for the "Aegean Numbers" Unicode character block.1582* @since 1.51583*/1584public static final UnicodeBlock AEGEAN_NUMBERS =1585new UnicodeBlock("AEGEAN_NUMBERS",1586"AEGEAN NUMBERS",1587"AEGEANNUMBERS");15881589/**1590* Constant for the "Old Italic" Unicode character block.1591* @since 1.51592*/1593public static final UnicodeBlock OLD_ITALIC =1594new UnicodeBlock("OLD_ITALIC",1595"OLD ITALIC",1596"OLDITALIC");15971598/**1599* Constant for the "Gothic" Unicode character block.1600* @since 1.51601*/1602public static final UnicodeBlock GOTHIC =1603new UnicodeBlock("GOTHIC");16041605/**1606* Constant for the "Ugaritic" Unicode character block.1607* @since 1.51608*/1609public static final UnicodeBlock UGARITIC =1610new UnicodeBlock("UGARITIC");16111612/**1613* Constant for the "Deseret" Unicode character block.1614* @since 1.51615*/1616public static final UnicodeBlock DESERET =1617new UnicodeBlock("DESERET");16181619/**1620* Constant for the "Shavian" Unicode character block.1621* @since 1.51622*/1623public static final UnicodeBlock SHAVIAN =1624new UnicodeBlock("SHAVIAN");16251626/**1627* Constant for the "Osmanya" Unicode character block.1628* @since 1.51629*/1630public static final UnicodeBlock OSMANYA =1631new UnicodeBlock("OSMANYA");16321633/**1634* Constant for the "Cypriot Syllabary" Unicode character block.1635* @since 1.51636*/1637public static final UnicodeBlock CYPRIOT_SYLLABARY =1638new UnicodeBlock("CYPRIOT_SYLLABARY",1639"CYPRIOT SYLLABARY",1640"CYPRIOTSYLLABARY");16411642/**1643* Constant for the "Byzantine Musical Symbols" Unicode character block.1644* @since 1.51645*/1646public static final UnicodeBlock BYZANTINE_MUSICAL_SYMBOLS =1647new UnicodeBlock("BYZANTINE_MUSICAL_SYMBOLS",1648"BYZANTINE MUSICAL SYMBOLS",1649"BYZANTINEMUSICALSYMBOLS");16501651/**1652* Constant for the "Musical Symbols" Unicode character block.1653* @since 1.51654*/1655public static final UnicodeBlock MUSICAL_SYMBOLS =1656new UnicodeBlock("MUSICAL_SYMBOLS",1657"MUSICAL SYMBOLS",1658"MUSICALSYMBOLS");16591660/**1661* Constant for the "Tai Xuan Jing Symbols" Unicode character block.1662* @since 1.51663*/1664public static final UnicodeBlock TAI_XUAN_JING_SYMBOLS =1665new UnicodeBlock("TAI_XUAN_JING_SYMBOLS",1666"TAI XUAN JING SYMBOLS",1667"TAIXUANJINGSYMBOLS");16681669/**1670* Constant for the "Mathematical Alphanumeric Symbols" Unicode1671* character block.1672* @since 1.51673*/1674public static final UnicodeBlock MATHEMATICAL_ALPHANUMERIC_SYMBOLS =1675new UnicodeBlock("MATHEMATICAL_ALPHANUMERIC_SYMBOLS",1676"MATHEMATICAL ALPHANUMERIC SYMBOLS",1677"MATHEMATICALALPHANUMERICSYMBOLS");16781679/**1680* Constant for the "CJK Unified Ideographs Extension B" Unicode1681* character block.1682* @since 1.51683*/1684public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B =1685new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B",1686"CJK UNIFIED IDEOGRAPHS EXTENSION B",1687"CJKUNIFIEDIDEOGRAPHSEXTENSIONB");16881689/**1690* Constant for the "CJK Compatibility Ideographs Supplement" Unicode character block.1691* @since 1.51692*/1693public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT =1694new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT",1695"CJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT",1696"CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT");16971698/**1699* Constant for the "Tags" Unicode character block.1700* @since 1.51701*/1702public static final UnicodeBlock TAGS =1703new UnicodeBlock("TAGS");17041705/**1706* Constant for the "Variation Selectors Supplement" Unicode character1707* block.1708* @since 1.51709*/1710public static final UnicodeBlock VARIATION_SELECTORS_SUPPLEMENT =1711new UnicodeBlock("VARIATION_SELECTORS_SUPPLEMENT",1712"VARIATION SELECTORS SUPPLEMENT",1713"VARIATIONSELECTORSSUPPLEMENT");17141715/**1716* Constant for the "Supplementary Private Use Area-A" Unicode character1717* block.1718* @since 1.51719*/1720public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A =1721new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A",1722"SUPPLEMENTARY PRIVATE USE AREA-A",1723"SUPPLEMENTARYPRIVATEUSEAREA-A");17241725/**1726* Constant for the "Supplementary Private Use Area-B" Unicode character1727* block.1728* @since 1.51729*/1730public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_B =1731new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_B",1732"SUPPLEMENTARY PRIVATE USE AREA-B",1733"SUPPLEMENTARYPRIVATEUSEAREA-B");17341735/**1736* Constant for the "High Surrogates" Unicode character block.1737* This block represents codepoint values in the high surrogate1738* range: U+D800 through U+DB7F1739*1740* @since 1.51741*/1742public static final UnicodeBlock HIGH_SURROGATES =1743new UnicodeBlock("HIGH_SURROGATES",1744"HIGH SURROGATES",1745"HIGHSURROGATES");17461747/**1748* Constant for the "High Private Use Surrogates" Unicode character1749* block.1750* This block represents codepoint values in the private use high1751* surrogate range: U+DB80 through U+DBFF1752*1753* @since 1.51754*/1755public static final UnicodeBlock HIGH_PRIVATE_USE_SURROGATES =1756new UnicodeBlock("HIGH_PRIVATE_USE_SURROGATES",1757"HIGH PRIVATE USE SURROGATES",1758"HIGHPRIVATEUSESURROGATES");17591760/**1761* Constant for the "Low Surrogates" Unicode character block.1762* This block represents codepoint values in the low surrogate1763* range: U+DC00 through U+DFFF1764*1765* @since 1.51766*/1767public static final UnicodeBlock LOW_SURROGATES =1768new UnicodeBlock("LOW_SURROGATES",1769"LOW SURROGATES",1770"LOWSURROGATES");17711772/**1773* Constant for the "Arabic Supplement" Unicode character block.1774* @since 1.71775*/1776public static final UnicodeBlock ARABIC_SUPPLEMENT =1777new UnicodeBlock("ARABIC_SUPPLEMENT",1778"ARABIC SUPPLEMENT",1779"ARABICSUPPLEMENT");17801781/**1782* Constant for the "NKo" Unicode character block.1783* @since 1.71784*/1785public static final UnicodeBlock NKO =1786new UnicodeBlock("NKO");17871788/**1789* Constant for the "Samaritan" Unicode character block.1790* @since 1.71791*/1792public static final UnicodeBlock SAMARITAN =1793new UnicodeBlock("SAMARITAN");17941795/**1796* Constant for the "Mandaic" Unicode character block.1797* @since 1.71798*/1799public static final UnicodeBlock MANDAIC =1800new UnicodeBlock("MANDAIC");18011802/**1803* Constant for the "Ethiopic Supplement" Unicode character block.1804* @since 1.71805*/1806public static final UnicodeBlock ETHIOPIC_SUPPLEMENT =1807new UnicodeBlock("ETHIOPIC_SUPPLEMENT",1808"ETHIOPIC SUPPLEMENT",1809"ETHIOPICSUPPLEMENT");18101811/**1812* Constant for the "Unified Canadian Aboriginal Syllabics Extended"1813* Unicode character block.1814* @since 1.71815*/1816public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED =1817new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED",1818"UNIFIED CANADIAN ABORIGINAL SYLLABICS EXTENDED",1819"UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED");18201821/**1822* Constant for the "New Tai Lue" Unicode character block.1823* @since 1.71824*/1825public static final UnicodeBlock NEW_TAI_LUE =1826new UnicodeBlock("NEW_TAI_LUE",1827"NEW TAI LUE",1828"NEWTAILUE");18291830/**1831* Constant for the "Buginese" Unicode character block.1832* @since 1.71833*/1834public static final UnicodeBlock BUGINESE =1835new UnicodeBlock("BUGINESE");18361837/**1838* Constant for the "Tai Tham" Unicode character block.1839* @since 1.71840*/1841public static final UnicodeBlock TAI_THAM =1842new UnicodeBlock("TAI_THAM",1843"TAI THAM",1844"TAITHAM");18451846/**1847* Constant for the "Balinese" Unicode character block.1848* @since 1.71849*/1850public static final UnicodeBlock BALINESE =1851new UnicodeBlock("BALINESE");18521853/**1854* Constant for the "Sundanese" Unicode character block.1855* @since 1.71856*/1857public static final UnicodeBlock SUNDANESE =1858new UnicodeBlock("SUNDANESE");18591860/**1861* Constant for the "Batak" Unicode character block.1862* @since 1.71863*/1864public static final UnicodeBlock BATAK =1865new UnicodeBlock("BATAK");18661867/**1868* Constant for the "Lepcha" Unicode character block.1869* @since 1.71870*/1871public static final UnicodeBlock LEPCHA =1872new UnicodeBlock("LEPCHA");18731874/**1875* Constant for the "Ol Chiki" Unicode character block.1876* @since 1.71877*/1878public static final UnicodeBlock OL_CHIKI =1879new UnicodeBlock("OL_CHIKI",1880"OL CHIKI",1881"OLCHIKI");18821883/**1884* Constant for the "Vedic Extensions" Unicode character block.1885* @since 1.71886*/1887public static final UnicodeBlock VEDIC_EXTENSIONS =1888new UnicodeBlock("VEDIC_EXTENSIONS",1889"VEDIC EXTENSIONS",1890"VEDICEXTENSIONS");18911892/**1893* Constant for the "Phonetic Extensions Supplement" Unicode character1894* block.1895* @since 1.71896*/1897public static final UnicodeBlock PHONETIC_EXTENSIONS_SUPPLEMENT =1898new UnicodeBlock("PHONETIC_EXTENSIONS_SUPPLEMENT",1899"PHONETIC EXTENSIONS SUPPLEMENT",1900"PHONETICEXTENSIONSSUPPLEMENT");19011902/**1903* Constant for the "Combining Diacritical Marks Supplement" Unicode1904* character block.1905* @since 1.71906*/1907public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS_SUPPLEMENT =1908new UnicodeBlock("COMBINING_DIACRITICAL_MARKS_SUPPLEMENT",1909"COMBINING DIACRITICAL MARKS SUPPLEMENT",1910"COMBININGDIACRITICALMARKSSUPPLEMENT");19111912/**1913* Constant for the "Glagolitic" Unicode character block.1914* @since 1.71915*/1916public static final UnicodeBlock GLAGOLITIC =1917new UnicodeBlock("GLAGOLITIC");19181919/**1920* Constant for the "Latin Extended-C" Unicode character block.1921* @since 1.71922*/1923public static final UnicodeBlock LATIN_EXTENDED_C =1924new UnicodeBlock("LATIN_EXTENDED_C",1925"LATIN EXTENDED-C",1926"LATINEXTENDED-C");19271928/**1929* Constant for the "Coptic" Unicode character block.1930* @since 1.71931*/1932public static final UnicodeBlock COPTIC =1933new UnicodeBlock("COPTIC");19341935/**1936* Constant for the "Georgian Supplement" Unicode character block.1937* @since 1.71938*/1939public static final UnicodeBlock GEORGIAN_SUPPLEMENT =1940new UnicodeBlock("GEORGIAN_SUPPLEMENT",1941"GEORGIAN SUPPLEMENT",1942"GEORGIANSUPPLEMENT");19431944/**1945* Constant for the "Tifinagh" Unicode character block.1946* @since 1.71947*/1948public static final UnicodeBlock TIFINAGH =1949new UnicodeBlock("TIFINAGH");19501951/**1952* Constant for the "Ethiopic Extended" Unicode character block.1953* @since 1.71954*/1955public static final UnicodeBlock ETHIOPIC_EXTENDED =1956new UnicodeBlock("ETHIOPIC_EXTENDED",1957"ETHIOPIC EXTENDED",1958"ETHIOPICEXTENDED");19591960/**1961* Constant for the "Cyrillic Extended-A" Unicode character block.1962* @since 1.71963*/1964public static final UnicodeBlock CYRILLIC_EXTENDED_A =1965new UnicodeBlock("CYRILLIC_EXTENDED_A",1966"CYRILLIC EXTENDED-A",1967"CYRILLICEXTENDED-A");19681969/**1970* Constant for the "Supplemental Punctuation" Unicode character block.1971* @since 1.71972*/1973public static final UnicodeBlock SUPPLEMENTAL_PUNCTUATION =1974new UnicodeBlock("SUPPLEMENTAL_PUNCTUATION",1975"SUPPLEMENTAL PUNCTUATION",1976"SUPPLEMENTALPUNCTUATION");19771978/**1979* Constant for the "CJK Strokes" Unicode character block.1980* @since 1.71981*/1982public static final UnicodeBlock CJK_STROKES =1983new UnicodeBlock("CJK_STROKES",1984"CJK STROKES",1985"CJKSTROKES");19861987/**1988* Constant for the "Lisu" Unicode character block.1989* @since 1.71990*/1991public static final UnicodeBlock LISU =1992new UnicodeBlock("LISU");19931994/**1995* Constant for the "Vai" Unicode character block.1996* @since 1.71997*/1998public static final UnicodeBlock VAI =1999new UnicodeBlock("VAI");20002001/**2002* Constant for the "Cyrillic Extended-B" Unicode character block.2003* @since 1.72004*/2005public static final UnicodeBlock CYRILLIC_EXTENDED_B =2006new UnicodeBlock("CYRILLIC_EXTENDED_B",2007"CYRILLIC EXTENDED-B",2008"CYRILLICEXTENDED-B");20092010/**2011* Constant for the "Bamum" Unicode character block.2012* @since 1.72013*/2014public static final UnicodeBlock BAMUM =2015new UnicodeBlock("BAMUM");20162017/**2018* Constant for the "Modifier Tone Letters" Unicode character block.2019* @since 1.72020*/2021public static final UnicodeBlock MODIFIER_TONE_LETTERS =2022new UnicodeBlock("MODIFIER_TONE_LETTERS",2023"MODIFIER TONE LETTERS",2024"MODIFIERTONELETTERS");20252026/**2027* Constant for the "Latin Extended-D" Unicode character block.2028* @since 1.72029*/2030public static final UnicodeBlock LATIN_EXTENDED_D =2031new UnicodeBlock("LATIN_EXTENDED_D",2032"LATIN EXTENDED-D",2033"LATINEXTENDED-D");20342035/**2036* Constant for the "Syloti Nagri" Unicode character block.2037* @since 1.72038*/2039public static final UnicodeBlock SYLOTI_NAGRI =2040new UnicodeBlock("SYLOTI_NAGRI",2041"SYLOTI NAGRI",2042"SYLOTINAGRI");20432044/**2045* Constant for the "Common Indic Number Forms" Unicode character block.2046* @since 1.72047*/2048public static final UnicodeBlock COMMON_INDIC_NUMBER_FORMS =2049new UnicodeBlock("COMMON_INDIC_NUMBER_FORMS",2050"COMMON INDIC NUMBER FORMS",2051"COMMONINDICNUMBERFORMS");20522053/**2054* Constant for the "Phags-pa" Unicode character block.2055* @since 1.72056*/2057public static final UnicodeBlock PHAGS_PA =2058new UnicodeBlock("PHAGS_PA",2059"PHAGS-PA");20602061/**2062* Constant for the "Saurashtra" Unicode character block.2063* @since 1.72064*/2065public static final UnicodeBlock SAURASHTRA =2066new UnicodeBlock("SAURASHTRA");20672068/**2069* Constant for the "Devanagari Extended" Unicode character block.2070* @since 1.72071*/2072public static final UnicodeBlock DEVANAGARI_EXTENDED =2073new UnicodeBlock("DEVANAGARI_EXTENDED",2074"DEVANAGARI EXTENDED",2075"DEVANAGARIEXTENDED");20762077/**2078* Constant for the "Kayah Li" Unicode character block.2079* @since 1.72080*/2081public static final UnicodeBlock KAYAH_LI =2082new UnicodeBlock("KAYAH_LI",2083"KAYAH LI",2084"KAYAHLI");20852086/**2087* Constant for the "Rejang" Unicode character block.2088* @since 1.72089*/2090public static final UnicodeBlock REJANG =2091new UnicodeBlock("REJANG");20922093/**2094* Constant for the "Hangul Jamo Extended-A" Unicode character block.2095* @since 1.72096*/2097public static final UnicodeBlock HANGUL_JAMO_EXTENDED_A =2098new UnicodeBlock("HANGUL_JAMO_EXTENDED_A",2099"HANGUL JAMO EXTENDED-A",2100"HANGULJAMOEXTENDED-A");21012102/**2103* Constant for the "Javanese" Unicode character block.2104* @since 1.72105*/2106public static final UnicodeBlock JAVANESE =2107new UnicodeBlock("JAVANESE");21082109/**2110* Constant for the "Cham" Unicode character block.2111* @since 1.72112*/2113public static final UnicodeBlock CHAM =2114new UnicodeBlock("CHAM");21152116/**2117* Constant for the "Myanmar Extended-A" Unicode character block.2118* @since 1.72119*/2120public static final UnicodeBlock MYANMAR_EXTENDED_A =2121new UnicodeBlock("MYANMAR_EXTENDED_A",2122"MYANMAR EXTENDED-A",2123"MYANMAREXTENDED-A");21242125/**2126* Constant for the "Tai Viet" Unicode character block.2127* @since 1.72128*/2129public static final UnicodeBlock TAI_VIET =2130new UnicodeBlock("TAI_VIET",2131"TAI VIET",2132"TAIVIET");21332134/**2135* Constant for the "Ethiopic Extended-A" Unicode character block.2136* @since 1.72137*/2138public static final UnicodeBlock ETHIOPIC_EXTENDED_A =2139new UnicodeBlock("ETHIOPIC_EXTENDED_A",2140"ETHIOPIC EXTENDED-A",2141"ETHIOPICEXTENDED-A");21422143/**2144* Constant for the "Meetei Mayek" Unicode character block.2145* @since 1.72146*/2147public static final UnicodeBlock MEETEI_MAYEK =2148new UnicodeBlock("MEETEI_MAYEK",2149"MEETEI MAYEK",2150"MEETEIMAYEK");21512152/**2153* Constant for the "Hangul Jamo Extended-B" Unicode character block.2154* @since 1.72155*/2156public static final UnicodeBlock HANGUL_JAMO_EXTENDED_B =2157new UnicodeBlock("HANGUL_JAMO_EXTENDED_B",2158"HANGUL JAMO EXTENDED-B",2159"HANGULJAMOEXTENDED-B");21602161/**2162* Constant for the "Vertical Forms" Unicode character block.2163* @since 1.72164*/2165public static final UnicodeBlock VERTICAL_FORMS =2166new UnicodeBlock("VERTICAL_FORMS",2167"VERTICAL FORMS",2168"VERTICALFORMS");21692170/**2171* Constant for the "Ancient Greek Numbers" Unicode character block.2172* @since 1.72173*/2174public static final UnicodeBlock ANCIENT_GREEK_NUMBERS =2175new UnicodeBlock("ANCIENT_GREEK_NUMBERS",2176"ANCIENT GREEK NUMBERS",2177"ANCIENTGREEKNUMBERS");21782179/**2180* Constant for the "Ancient Symbols" Unicode character block.2181* @since 1.72182*/2183public static final UnicodeBlock ANCIENT_SYMBOLS =2184new UnicodeBlock("ANCIENT_SYMBOLS",2185"ANCIENT SYMBOLS",2186"ANCIENTSYMBOLS");21872188/**2189* Constant for the "Phaistos Disc" Unicode character block.2190* @since 1.72191*/2192public static final UnicodeBlock PHAISTOS_DISC =2193new UnicodeBlock("PHAISTOS_DISC",2194"PHAISTOS DISC",2195"PHAISTOSDISC");21962197/**2198* Constant for the "Lycian" Unicode character block.2199* @since 1.72200*/2201public static final UnicodeBlock LYCIAN =2202new UnicodeBlock("LYCIAN");22032204/**2205* Constant for the "Carian" Unicode character block.2206* @since 1.72207*/2208public static final UnicodeBlock CARIAN =2209new UnicodeBlock("CARIAN");22102211/**2212* Constant for the "Old Persian" Unicode character block.2213* @since 1.72214*/2215public static final UnicodeBlock OLD_PERSIAN =2216new UnicodeBlock("OLD_PERSIAN",2217"OLD PERSIAN",2218"OLDPERSIAN");22192220/**2221* Constant for the "Imperial Aramaic" Unicode character block.2222* @since 1.72223*/2224public static final UnicodeBlock IMPERIAL_ARAMAIC =2225new UnicodeBlock("IMPERIAL_ARAMAIC",2226"IMPERIAL ARAMAIC",2227"IMPERIALARAMAIC");22282229/**2230* Constant for the "Phoenician" Unicode character block.2231* @since 1.72232*/2233public static final UnicodeBlock PHOENICIAN =2234new UnicodeBlock("PHOENICIAN");22352236/**2237* Constant for the "Lydian" Unicode character block.2238* @since 1.72239*/2240public static final UnicodeBlock LYDIAN =2241new UnicodeBlock("LYDIAN");22422243/**2244* Constant for the "Kharoshthi" Unicode character block.2245* @since 1.72246*/2247public static final UnicodeBlock KHAROSHTHI =2248new UnicodeBlock("KHAROSHTHI");22492250/**2251* Constant for the "Old South Arabian" Unicode character block.2252* @since 1.72253*/2254public static final UnicodeBlock OLD_SOUTH_ARABIAN =2255new UnicodeBlock("OLD_SOUTH_ARABIAN",2256"OLD SOUTH ARABIAN",2257"OLDSOUTHARABIAN");22582259/**2260* Constant for the "Avestan" Unicode character block.2261* @since 1.72262*/2263public static final UnicodeBlock AVESTAN =2264new UnicodeBlock("AVESTAN");22652266/**2267* Constant for the "Inscriptional Parthian" Unicode character block.2268* @since 1.72269*/2270public static final UnicodeBlock INSCRIPTIONAL_PARTHIAN =2271new UnicodeBlock("INSCRIPTIONAL_PARTHIAN",2272"INSCRIPTIONAL PARTHIAN",2273"INSCRIPTIONALPARTHIAN");22742275/**2276* Constant for the "Inscriptional Pahlavi" Unicode character block.2277* @since 1.72278*/2279public static final UnicodeBlock INSCRIPTIONAL_PAHLAVI =2280new UnicodeBlock("INSCRIPTIONAL_PAHLAVI",2281"INSCRIPTIONAL PAHLAVI",2282"INSCRIPTIONALPAHLAVI");22832284/**2285* Constant for the "Old Turkic" Unicode character block.2286* @since 1.72287*/2288public static final UnicodeBlock OLD_TURKIC =2289new UnicodeBlock("OLD_TURKIC",2290"OLD TURKIC",2291"OLDTURKIC");22922293/**2294* Constant for the "Rumi Numeral Symbols" Unicode character block.2295* @since 1.72296*/2297public static final UnicodeBlock RUMI_NUMERAL_SYMBOLS =2298new UnicodeBlock("RUMI_NUMERAL_SYMBOLS",2299"RUMI NUMERAL SYMBOLS",2300"RUMINUMERALSYMBOLS");23012302/**2303* Constant for the "Brahmi" Unicode character block.2304* @since 1.72305*/2306public static final UnicodeBlock BRAHMI =2307new UnicodeBlock("BRAHMI");23082309/**2310* Constant for the "Kaithi" Unicode character block.2311* @since 1.72312*/2313public static final UnicodeBlock KAITHI =2314new UnicodeBlock("KAITHI");23152316/**2317* Constant for the "Cuneiform" Unicode character block.2318* @since 1.72319*/2320public static final UnicodeBlock CUNEIFORM =2321new UnicodeBlock("CUNEIFORM");23222323/**2324* Constant for the "Cuneiform Numbers and Punctuation" Unicode2325* character block.2326* @since 1.72327*/2328public static final UnicodeBlock CUNEIFORM_NUMBERS_AND_PUNCTUATION =2329new UnicodeBlock("CUNEIFORM_NUMBERS_AND_PUNCTUATION",2330"CUNEIFORM NUMBERS AND PUNCTUATION",2331"CUNEIFORMNUMBERSANDPUNCTUATION");23322333/**2334* Constant for the "Egyptian Hieroglyphs" Unicode character block.2335* @since 1.72336*/2337public static final UnicodeBlock EGYPTIAN_HIEROGLYPHS =2338new UnicodeBlock("EGYPTIAN_HIEROGLYPHS",2339"EGYPTIAN HIEROGLYPHS",2340"EGYPTIANHIEROGLYPHS");23412342/**2343* Constant for the "Bamum Supplement" Unicode character block.2344* @since 1.72345*/2346public static final UnicodeBlock BAMUM_SUPPLEMENT =2347new UnicodeBlock("BAMUM_SUPPLEMENT",2348"BAMUM SUPPLEMENT",2349"BAMUMSUPPLEMENT");23502351/**2352* Constant for the "Kana Supplement" Unicode character block.2353* @since 1.72354*/2355public static final UnicodeBlock KANA_SUPPLEMENT =2356new UnicodeBlock("KANA_SUPPLEMENT",2357"KANA SUPPLEMENT",2358"KANASUPPLEMENT");23592360/**2361* Constant for the "Ancient Greek Musical Notation" Unicode character2362* block.2363* @since 1.72364*/2365public static final UnicodeBlock ANCIENT_GREEK_MUSICAL_NOTATION =2366new UnicodeBlock("ANCIENT_GREEK_MUSICAL_NOTATION",2367"ANCIENT GREEK MUSICAL NOTATION",2368"ANCIENTGREEKMUSICALNOTATION");23692370/**2371* Constant for the "Counting Rod Numerals" Unicode character block.2372* @since 1.72373*/2374public static final UnicodeBlock COUNTING_ROD_NUMERALS =2375new UnicodeBlock("COUNTING_ROD_NUMERALS",2376"COUNTING ROD NUMERALS",2377"COUNTINGRODNUMERALS");23782379/**2380* Constant for the "Mahjong Tiles" Unicode character block.2381* @since 1.72382*/2383public static final UnicodeBlock MAHJONG_TILES =2384new UnicodeBlock("MAHJONG_TILES",2385"MAHJONG TILES",2386"MAHJONGTILES");23872388/**2389* Constant for the "Domino Tiles" Unicode character block.2390* @since 1.72391*/2392public static final UnicodeBlock DOMINO_TILES =2393new UnicodeBlock("DOMINO_TILES",2394"DOMINO TILES",2395"DOMINOTILES");23962397/**2398* Constant for the "Playing Cards" Unicode character block.2399* @since 1.72400*/2401public static final UnicodeBlock PLAYING_CARDS =2402new UnicodeBlock("PLAYING_CARDS",2403"PLAYING CARDS",2404"PLAYINGCARDS");24052406/**2407* Constant for the "Enclosed Alphanumeric Supplement" Unicode character2408* block.2409* @since 1.72410*/2411public static final UnicodeBlock ENCLOSED_ALPHANUMERIC_SUPPLEMENT =2412new UnicodeBlock("ENCLOSED_ALPHANUMERIC_SUPPLEMENT",2413"ENCLOSED ALPHANUMERIC SUPPLEMENT",2414"ENCLOSEDALPHANUMERICSUPPLEMENT");24152416/**2417* Constant for the "Enclosed Ideographic Supplement" Unicode character2418* block.2419* @since 1.72420*/2421public static final UnicodeBlock ENCLOSED_IDEOGRAPHIC_SUPPLEMENT =2422new UnicodeBlock("ENCLOSED_IDEOGRAPHIC_SUPPLEMENT",2423"ENCLOSED IDEOGRAPHIC SUPPLEMENT",2424"ENCLOSEDIDEOGRAPHICSUPPLEMENT");24252426/**2427* Constant for the "Miscellaneous Symbols And Pictographs" Unicode2428* character block.2429* @since 1.72430*/2431public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS =2432new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS",2433"MISCELLANEOUS SYMBOLS AND PICTOGRAPHS",2434"MISCELLANEOUSSYMBOLSANDPICTOGRAPHS");24352436/**2437* Constant for the "Emoticons" Unicode character block.2438* @since 1.72439*/2440public static final UnicodeBlock EMOTICONS =2441new UnicodeBlock("EMOTICONS");24422443/**2444* Constant for the "Transport And Map Symbols" Unicode character block.2445* @since 1.72446*/2447public static final UnicodeBlock TRANSPORT_AND_MAP_SYMBOLS =2448new UnicodeBlock("TRANSPORT_AND_MAP_SYMBOLS",2449"TRANSPORT AND MAP SYMBOLS",2450"TRANSPORTANDMAPSYMBOLS");24512452/**2453* Constant for the "Alchemical Symbols" Unicode character block.2454* @since 1.72455*/2456public static final UnicodeBlock ALCHEMICAL_SYMBOLS =2457new UnicodeBlock("ALCHEMICAL_SYMBOLS",2458"ALCHEMICAL SYMBOLS",2459"ALCHEMICALSYMBOLS");24602461/**2462* Constant for the "CJK Unified Ideographs Extension C" Unicode2463* character block.2464* @since 1.72465*/2466public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C =2467new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C",2468"CJK UNIFIED IDEOGRAPHS EXTENSION C",2469"CJKUNIFIEDIDEOGRAPHSEXTENSIONC");24702471/**2472* Constant for the "CJK Unified Ideographs Extension D" Unicode2473* character block.2474* @since 1.72475*/2476public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D =2477new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D",2478"CJK UNIFIED IDEOGRAPHS EXTENSION D",2479"CJKUNIFIEDIDEOGRAPHSEXTENSIOND");24802481/**2482* Constant for the "Arabic Extended-A" Unicode character block.2483* @since 1.82484*/2485public static final UnicodeBlock ARABIC_EXTENDED_A =2486new UnicodeBlock("ARABIC_EXTENDED_A",2487"ARABIC EXTENDED-A",2488"ARABICEXTENDED-A");24892490/**2491* Constant for the "Sundanese Supplement" Unicode character block.2492* @since 1.82493*/2494public static final UnicodeBlock SUNDANESE_SUPPLEMENT =2495new UnicodeBlock("SUNDANESE_SUPPLEMENT",2496"SUNDANESE SUPPLEMENT",2497"SUNDANESESUPPLEMENT");24982499/**2500* Constant for the "Meetei Mayek Extensions" Unicode character block.2501* @since 1.82502*/2503public static final UnicodeBlock MEETEI_MAYEK_EXTENSIONS =2504new UnicodeBlock("MEETEI_MAYEK_EXTENSIONS",2505"MEETEI MAYEK EXTENSIONS",2506"MEETEIMAYEKEXTENSIONS");25072508/**2509* Constant for the "Meroitic Hieroglyphs" Unicode character block.2510* @since 1.82511*/2512public static final UnicodeBlock MEROITIC_HIEROGLYPHS =2513new UnicodeBlock("MEROITIC_HIEROGLYPHS",2514"MEROITIC HIEROGLYPHS",2515"MEROITICHIEROGLYPHS");25162517/**2518* Constant for the "Meroitic Cursive" Unicode character block.2519* @since 1.82520*/2521public static final UnicodeBlock MEROITIC_CURSIVE =2522new UnicodeBlock("MEROITIC_CURSIVE",2523"MEROITIC CURSIVE",2524"MEROITICCURSIVE");25252526/**2527* Constant for the "Sora Sompeng" Unicode character block.2528* @since 1.82529*/2530public static final UnicodeBlock SORA_SOMPENG =2531new UnicodeBlock("SORA_SOMPENG",2532"SORA SOMPENG",2533"SORASOMPENG");25342535/**2536* Constant for the "Chakma" Unicode character block.2537* @since 1.82538*/2539public static final UnicodeBlock CHAKMA =2540new UnicodeBlock("CHAKMA");25412542/**2543* Constant for the "Sharada" Unicode character block.2544* @since 1.82545*/2546public static final UnicodeBlock SHARADA =2547new UnicodeBlock("SHARADA");25482549/**2550* Constant for the "Takri" Unicode character block.2551* @since 1.82552*/2553public static final UnicodeBlock TAKRI =2554new UnicodeBlock("TAKRI");25552556/**2557* Constant for the "Miao" Unicode character block.2558* @since 1.82559*/2560public static final UnicodeBlock MIAO =2561new UnicodeBlock("MIAO");25622563/**2564* Constant for the "Arabic Mathematical Alphabetic Symbols" Unicode2565* character block.2566* @since 1.82567*/2568public static final UnicodeBlock ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS =2569new UnicodeBlock("ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS",2570"ARABIC MATHEMATICAL ALPHABETIC SYMBOLS",2571"ARABICMATHEMATICALALPHABETICSYMBOLS");25722573private static final int blockStarts[] = {25740x0000, // 0000..007F; Basic Latin25750x0080, // 0080..00FF; Latin-1 Supplement25760x0100, // 0100..017F; Latin Extended-A25770x0180, // 0180..024F; Latin Extended-B25780x0250, // 0250..02AF; IPA Extensions25790x02B0, // 02B0..02FF; Spacing Modifier Letters25800x0300, // 0300..036F; Combining Diacritical Marks25810x0370, // 0370..03FF; Greek and Coptic25820x0400, // 0400..04FF; Cyrillic25830x0500, // 0500..052F; Cyrillic Supplement25840x0530, // 0530..058F; Armenian25850x0590, // 0590..05FF; Hebrew25860x0600, // 0600..06FF; Arabic25870x0700, // 0700..074F; Syriac25880x0750, // 0750..077F; Arabic Supplement25890x0780, // 0780..07BF; Thaana25900x07C0, // 07C0..07FF; NKo25910x0800, // 0800..083F; Samaritan25920x0840, // 0840..085F; Mandaic25930x0860, // unassigned25940x08A0, // 08A0..08FF; Arabic Extended-A25950x0900, // 0900..097F; Devanagari25960x0980, // 0980..09FF; Bengali25970x0A00, // 0A00..0A7F; Gurmukhi25980x0A80, // 0A80..0AFF; Gujarati25990x0B00, // 0B00..0B7F; Oriya26000x0B80, // 0B80..0BFF; Tamil26010x0C00, // 0C00..0C7F; Telugu26020x0C80, // 0C80..0CFF; Kannada26030x0D00, // 0D00..0D7F; Malayalam26040x0D80, // 0D80..0DFF; Sinhala26050x0E00, // 0E00..0E7F; Thai26060x0E80, // 0E80..0EFF; Lao26070x0F00, // 0F00..0FFF; Tibetan26080x1000, // 1000..109F; Myanmar26090x10A0, // 10A0..10FF; Georgian26100x1100, // 1100..11FF; Hangul Jamo26110x1200, // 1200..137F; Ethiopic26120x1380, // 1380..139F; Ethiopic Supplement26130x13A0, // 13A0..13FF; Cherokee26140x1400, // 1400..167F; Unified Canadian Aboriginal Syllabics26150x1680, // 1680..169F; Ogham26160x16A0, // 16A0..16FF; Runic26170x1700, // 1700..171F; Tagalog26180x1720, // 1720..173F; Hanunoo26190x1740, // 1740..175F; Buhid26200x1760, // 1760..177F; Tagbanwa26210x1780, // 1780..17FF; Khmer26220x1800, // 1800..18AF; Mongolian26230x18B0, // 18B0..18FF; Unified Canadian Aboriginal Syllabics Extended26240x1900, // 1900..194F; Limbu26250x1950, // 1950..197F; Tai Le26260x1980, // 1980..19DF; New Tai Lue26270x19E0, // 19E0..19FF; Khmer Symbols26280x1A00, // 1A00..1A1F; Buginese26290x1A20, // 1A20..1AAF; Tai Tham26300x1AB0, // unassigned26310x1B00, // 1B00..1B7F; Balinese26320x1B80, // 1B80..1BBF; Sundanese26330x1BC0, // 1BC0..1BFF; Batak26340x1C00, // 1C00..1C4F; Lepcha26350x1C50, // 1C50..1C7F; Ol Chiki26360x1C80, // unassigned26370x1CC0, // 1CC0..1CCF; Sundanese Supplement26380x1CD0, // 1CD0..1CFF; Vedic Extensions26390x1D00, // 1D00..1D7F; Phonetic Extensions26400x1D80, // 1D80..1DBF; Phonetic Extensions Supplement26410x1DC0, // 1DC0..1DFF; Combining Diacritical Marks Supplement26420x1E00, // 1E00..1EFF; Latin Extended Additional26430x1F00, // 1F00..1FFF; Greek Extended26440x2000, // 2000..206F; General Punctuation26450x2070, // 2070..209F; Superscripts and Subscripts26460x20A0, // 20A0..20CF; Currency Symbols26470x20D0, // 20D0..20FF; Combining Diacritical Marks for Symbols26480x2100, // 2100..214F; Letterlike Symbols26490x2150, // 2150..218F; Number Forms26500x2190, // 2190..21FF; Arrows26510x2200, // 2200..22FF; Mathematical Operators26520x2300, // 2300..23FF; Miscellaneous Technical26530x2400, // 2400..243F; Control Pictures26540x2440, // 2440..245F; Optical Character Recognition26550x2460, // 2460..24FF; Enclosed Alphanumerics26560x2500, // 2500..257F; Box Drawing26570x2580, // 2580..259F; Block Elements26580x25A0, // 25A0..25FF; Geometric Shapes26590x2600, // 2600..26FF; Miscellaneous Symbols26600x2700, // 2700..27BF; Dingbats26610x27C0, // 27C0..27EF; Miscellaneous Mathematical Symbols-A26620x27F0, // 27F0..27FF; Supplemental Arrows-A26630x2800, // 2800..28FF; Braille Patterns26640x2900, // 2900..297F; Supplemental Arrows-B26650x2980, // 2980..29FF; Miscellaneous Mathematical Symbols-B26660x2A00, // 2A00..2AFF; Supplemental Mathematical Operators26670x2B00, // 2B00..2BFF; Miscellaneous Symbols and Arrows26680x2C00, // 2C00..2C5F; Glagolitic26690x2C60, // 2C60..2C7F; Latin Extended-C26700x2C80, // 2C80..2CFF; Coptic26710x2D00, // 2D00..2D2F; Georgian Supplement26720x2D30, // 2D30..2D7F; Tifinagh26730x2D80, // 2D80..2DDF; Ethiopic Extended26740x2DE0, // 2DE0..2DFF; Cyrillic Extended-A26750x2E00, // 2E00..2E7F; Supplemental Punctuation26760x2E80, // 2E80..2EFF; CJK Radicals Supplement26770x2F00, // 2F00..2FDF; Kangxi Radicals26780x2FE0, // unassigned26790x2FF0, // 2FF0..2FFF; Ideographic Description Characters26800x3000, // 3000..303F; CJK Symbols and Punctuation26810x3040, // 3040..309F; Hiragana26820x30A0, // 30A0..30FF; Katakana26830x3100, // 3100..312F; Bopomofo26840x3130, // 3130..318F; Hangul Compatibility Jamo26850x3190, // 3190..319F; Kanbun26860x31A0, // 31A0..31BF; Bopomofo Extended26870x31C0, // 31C0..31EF; CJK Strokes26880x31F0, // 31F0..31FF; Katakana Phonetic Extensions26890x3200, // 3200..32FF; Enclosed CJK Letters and Months26900x3300, // 3300..33FF; CJK Compatibility26910x3400, // 3400..4DBF; CJK Unified Ideographs Extension A26920x4DC0, // 4DC0..4DFF; Yijing Hexagram Symbols26930x4E00, // 4E00..9FFF; CJK Unified Ideographs26940xA000, // A000..A48F; Yi Syllables26950xA490, // A490..A4CF; Yi Radicals26960xA4D0, // A4D0..A4FF; Lisu26970xA500, // A500..A63F; Vai26980xA640, // A640..A69F; Cyrillic Extended-B26990xA6A0, // A6A0..A6FF; Bamum27000xA700, // A700..A71F; Modifier Tone Letters27010xA720, // A720..A7FF; Latin Extended-D27020xA800, // A800..A82F; Syloti Nagri27030xA830, // A830..A83F; Common Indic Number Forms27040xA840, // A840..A87F; Phags-pa27050xA880, // A880..A8DF; Saurashtra27060xA8E0, // A8E0..A8FF; Devanagari Extended27070xA900, // A900..A92F; Kayah Li27080xA930, // A930..A95F; Rejang27090xA960, // A960..A97F; Hangul Jamo Extended-A27100xA980, // A980..A9DF; Javanese27110xA9E0, // unassigned27120xAA00, // AA00..AA5F; Cham27130xAA60, // AA60..AA7F; Myanmar Extended-A27140xAA80, // AA80..AADF; Tai Viet27150xAAE0, // AAE0..AAFF; Meetei Mayek Extensions27160xAB00, // AB00..AB2F; Ethiopic Extended-A27170xAB30, // unassigned27180xABC0, // ABC0..ABFF; Meetei Mayek27190xAC00, // AC00..D7AF; Hangul Syllables27200xD7B0, // D7B0..D7FF; Hangul Jamo Extended-B27210xD800, // D800..DB7F; High Surrogates27220xDB80, // DB80..DBFF; High Private Use Surrogates27230xDC00, // DC00..DFFF; Low Surrogates27240xE000, // E000..F8FF; Private Use Area27250xF900, // F900..FAFF; CJK Compatibility Ideographs27260xFB00, // FB00..FB4F; Alphabetic Presentation Forms27270xFB50, // FB50..FDFF; Arabic Presentation Forms-A27280xFE00, // FE00..FE0F; Variation Selectors27290xFE10, // FE10..FE1F; Vertical Forms27300xFE20, // FE20..FE2F; Combining Half Marks27310xFE30, // FE30..FE4F; CJK Compatibility Forms27320xFE50, // FE50..FE6F; Small Form Variants27330xFE70, // FE70..FEFF; Arabic Presentation Forms-B27340xFF00, // FF00..FFEF; Halfwidth and Fullwidth Forms27350xFFF0, // FFF0..FFFF; Specials27360x10000, // 10000..1007F; Linear B Syllabary27370x10080, // 10080..100FF; Linear B Ideograms27380x10100, // 10100..1013F; Aegean Numbers27390x10140, // 10140..1018F; Ancient Greek Numbers27400x10190, // 10190..101CF; Ancient Symbols27410x101D0, // 101D0..101FF; Phaistos Disc27420x10200, // unassigned27430x10280, // 10280..1029F; Lycian27440x102A0, // 102A0..102DF; Carian27450x102E0, // unassigned27460x10300, // 10300..1032F; Old Italic27470x10330, // 10330..1034F; Gothic27480x10350, // unassigned27490x10380, // 10380..1039F; Ugaritic27500x103A0, // 103A0..103DF; Old Persian27510x103E0, // unassigned27520x10400, // 10400..1044F; Deseret27530x10450, // 10450..1047F; Shavian27540x10480, // 10480..104AF; Osmanya27550x104B0, // unassigned27560x10800, // 10800..1083F; Cypriot Syllabary27570x10840, // 10840..1085F; Imperial Aramaic27580x10860, // unassigned27590x10900, // 10900..1091F; Phoenician27600x10920, // 10920..1093F; Lydian27610x10940, // unassigned27620x10980, // 10980..1099F; Meroitic Hieroglyphs27630x109A0, // 109A0..109FF; Meroitic Cursive27640x10A00, // 10A00..10A5F; Kharoshthi27650x10A60, // 10A60..10A7F; Old South Arabian27660x10A80, // unassigned27670x10B00, // 10B00..10B3F; Avestan27680x10B40, // 10B40..10B5F; Inscriptional Parthian27690x10B60, // 10B60..10B7F; Inscriptional Pahlavi27700x10B80, // unassigned27710x10C00, // 10C00..10C4F; Old Turkic27720x10C50, // unassigned27730x10E60, // 10E60..10E7F; Rumi Numeral Symbols27740x10E80, // unassigned27750x11000, // 11000..1107F; Brahmi27760x11080, // 11080..110CF; Kaithi27770x110D0, // 110D0..110FF; Sora Sompeng27780x11100, // 11100..1114F; Chakma27790x11150, // unassigned27800x11180, // 11180..111DF; Sharada27810x111E0, // unassigned27820x11680, // 11680..116CF; Takri27830x116D0, // unassigned27840x12000, // 12000..123FF; Cuneiform27850x12400, // 12400..1247F; Cuneiform Numbers and Punctuation27860x12480, // unassigned27870x13000, // 13000..1342F; Egyptian Hieroglyphs27880x13430, // unassigned27890x16800, // 16800..16A3F; Bamum Supplement27900x16A40, // unassigned27910x16F00, // 16F00..16F9F; Miao27920x16FA0, // unassigned27930x1B000, // 1B000..1B0FF; Kana Supplement27940x1B100, // unassigned27950x1D000, // 1D000..1D0FF; Byzantine Musical Symbols27960x1D100, // 1D100..1D1FF; Musical Symbols27970x1D200, // 1D200..1D24F; Ancient Greek Musical Notation27980x1D250, // unassigned27990x1D300, // 1D300..1D35F; Tai Xuan Jing Symbols28000x1D360, // 1D360..1D37F; Counting Rod Numerals28010x1D380, // unassigned28020x1D400, // 1D400..1D7FF; Mathematical Alphanumeric Symbols28030x1D800, // unassigned28040x1EE00, // 1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols28050x1EF00, // unassigned28060x1F000, // 1F000..1F02F; Mahjong Tiles28070x1F030, // 1F030..1F09F; Domino Tiles28080x1F0A0, // 1F0A0..1F0FF; Playing Cards28090x1F100, // 1F100..1F1FF; Enclosed Alphanumeric Supplement28100x1F200, // 1F200..1F2FF; Enclosed Ideographic Supplement28110x1F300, // 1F300..1F5FF; Miscellaneous Symbols And Pictographs28120x1F600, // 1F600..1F64F; Emoticons28130x1F650, // unassigned28140x1F680, // 1F680..1F6FF; Transport And Map Symbols28150x1F700, // 1F700..1F77F; Alchemical Symbols28160x1F780, // unassigned28170x20000, // 20000..2A6DF; CJK Unified Ideographs Extension B28180x2A6E0, // unassigned28190x2A700, // 2A700..2B73F; CJK Unified Ideographs Extension C28200x2B740, // 2B740..2B81F; CJK Unified Ideographs Extension D28210x2B820, // unassigned28220x2F800, // 2F800..2FA1F; CJK Compatibility Ideographs Supplement28230x2FA20, // unassigned28240xE0000, // E0000..E007F; Tags28250xE0080, // unassigned28260xE0100, // E0100..E01EF; Variation Selectors Supplement28270xE01F0, // unassigned28280xF0000, // F0000..FFFFF; Supplementary Private Use Area-A28290x100000 // 100000..10FFFF; Supplementary Private Use Area-B2830};28312832private static final UnicodeBlock[] blocks = {2833BASIC_LATIN,2834LATIN_1_SUPPLEMENT,2835LATIN_EXTENDED_A,2836LATIN_EXTENDED_B,2837IPA_EXTENSIONS,2838SPACING_MODIFIER_LETTERS,2839COMBINING_DIACRITICAL_MARKS,2840GREEK,2841CYRILLIC,2842CYRILLIC_SUPPLEMENTARY,2843ARMENIAN,2844HEBREW,2845ARABIC,2846SYRIAC,2847ARABIC_SUPPLEMENT,2848THAANA,2849NKO,2850SAMARITAN,2851MANDAIC,2852null,2853ARABIC_EXTENDED_A,2854DEVANAGARI,2855BENGALI,2856GURMUKHI,2857GUJARATI,2858ORIYA,2859TAMIL,2860TELUGU,2861KANNADA,2862MALAYALAM,2863SINHALA,2864THAI,2865LAO,2866TIBETAN,2867MYANMAR,2868GEORGIAN,2869HANGUL_JAMO,2870ETHIOPIC,2871ETHIOPIC_SUPPLEMENT,2872CHEROKEE,2873UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS,2874OGHAM,2875RUNIC,2876TAGALOG,2877HANUNOO,2878BUHID,2879TAGBANWA,2880KHMER,2881MONGOLIAN,2882UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED,2883LIMBU,2884TAI_LE,2885NEW_TAI_LUE,2886KHMER_SYMBOLS,2887BUGINESE,2888TAI_THAM,2889null,2890BALINESE,2891SUNDANESE,2892BATAK,2893LEPCHA,2894OL_CHIKI,2895null,2896SUNDANESE_SUPPLEMENT,2897VEDIC_EXTENSIONS,2898PHONETIC_EXTENSIONS,2899PHONETIC_EXTENSIONS_SUPPLEMENT,2900COMBINING_DIACRITICAL_MARKS_SUPPLEMENT,2901LATIN_EXTENDED_ADDITIONAL,2902GREEK_EXTENDED,2903GENERAL_PUNCTUATION,2904SUPERSCRIPTS_AND_SUBSCRIPTS,2905CURRENCY_SYMBOLS,2906COMBINING_MARKS_FOR_SYMBOLS,2907LETTERLIKE_SYMBOLS,2908NUMBER_FORMS,2909ARROWS,2910MATHEMATICAL_OPERATORS,2911MISCELLANEOUS_TECHNICAL,2912CONTROL_PICTURES,2913OPTICAL_CHARACTER_RECOGNITION,2914ENCLOSED_ALPHANUMERICS,2915BOX_DRAWING,2916BLOCK_ELEMENTS,2917GEOMETRIC_SHAPES,2918MISCELLANEOUS_SYMBOLS,2919DINGBATS,2920MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A,2921SUPPLEMENTAL_ARROWS_A,2922BRAILLE_PATTERNS,2923SUPPLEMENTAL_ARROWS_B,2924MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B,2925SUPPLEMENTAL_MATHEMATICAL_OPERATORS,2926MISCELLANEOUS_SYMBOLS_AND_ARROWS,2927GLAGOLITIC,2928LATIN_EXTENDED_C,2929COPTIC,2930GEORGIAN_SUPPLEMENT,2931TIFINAGH,2932ETHIOPIC_EXTENDED,2933CYRILLIC_EXTENDED_A,2934SUPPLEMENTAL_PUNCTUATION,2935CJK_RADICALS_SUPPLEMENT,2936KANGXI_RADICALS,2937null,2938IDEOGRAPHIC_DESCRIPTION_CHARACTERS,2939CJK_SYMBOLS_AND_PUNCTUATION,2940HIRAGANA,2941KATAKANA,2942BOPOMOFO,2943HANGUL_COMPATIBILITY_JAMO,2944KANBUN,2945BOPOMOFO_EXTENDED,2946CJK_STROKES,2947KATAKANA_PHONETIC_EXTENSIONS,2948ENCLOSED_CJK_LETTERS_AND_MONTHS,2949CJK_COMPATIBILITY,2950CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A,2951YIJING_HEXAGRAM_SYMBOLS,2952CJK_UNIFIED_IDEOGRAPHS,2953YI_SYLLABLES,2954YI_RADICALS,2955LISU,2956VAI,2957CYRILLIC_EXTENDED_B,2958BAMUM,2959MODIFIER_TONE_LETTERS,2960LATIN_EXTENDED_D,2961SYLOTI_NAGRI,2962COMMON_INDIC_NUMBER_FORMS,2963PHAGS_PA,2964SAURASHTRA,2965DEVANAGARI_EXTENDED,2966KAYAH_LI,2967REJANG,2968HANGUL_JAMO_EXTENDED_A,2969JAVANESE,2970null,2971CHAM,2972MYANMAR_EXTENDED_A,2973TAI_VIET,2974MEETEI_MAYEK_EXTENSIONS,2975ETHIOPIC_EXTENDED_A,2976null,2977MEETEI_MAYEK,2978HANGUL_SYLLABLES,2979HANGUL_JAMO_EXTENDED_B,2980HIGH_SURROGATES,2981HIGH_PRIVATE_USE_SURROGATES,2982LOW_SURROGATES,2983PRIVATE_USE_AREA,2984CJK_COMPATIBILITY_IDEOGRAPHS,2985ALPHABETIC_PRESENTATION_FORMS,2986ARABIC_PRESENTATION_FORMS_A,2987VARIATION_SELECTORS,2988VERTICAL_FORMS,2989COMBINING_HALF_MARKS,2990CJK_COMPATIBILITY_FORMS,2991SMALL_FORM_VARIANTS,2992ARABIC_PRESENTATION_FORMS_B,2993HALFWIDTH_AND_FULLWIDTH_FORMS,2994SPECIALS,2995LINEAR_B_SYLLABARY,2996LINEAR_B_IDEOGRAMS,2997AEGEAN_NUMBERS,2998ANCIENT_GREEK_NUMBERS,2999ANCIENT_SYMBOLS,3000PHAISTOS_DISC,3001null,3002LYCIAN,3003CARIAN,3004null,3005OLD_ITALIC,3006GOTHIC,3007null,3008UGARITIC,3009OLD_PERSIAN,3010null,3011DESERET,3012SHAVIAN,3013OSMANYA,3014null,3015CYPRIOT_SYLLABARY,3016IMPERIAL_ARAMAIC,3017null,3018PHOENICIAN,3019LYDIAN,3020null,3021MEROITIC_HIEROGLYPHS,3022MEROITIC_CURSIVE,3023KHAROSHTHI,3024OLD_SOUTH_ARABIAN,3025null,3026AVESTAN,3027INSCRIPTIONAL_PARTHIAN,3028INSCRIPTIONAL_PAHLAVI,3029null,3030OLD_TURKIC,3031null,3032RUMI_NUMERAL_SYMBOLS,3033null,3034BRAHMI,3035KAITHI,3036SORA_SOMPENG,3037CHAKMA,3038null,3039SHARADA,3040null,3041TAKRI,3042null,3043CUNEIFORM,3044CUNEIFORM_NUMBERS_AND_PUNCTUATION,3045null,3046EGYPTIAN_HIEROGLYPHS,3047null,3048BAMUM_SUPPLEMENT,3049null,3050MIAO,3051null,3052KANA_SUPPLEMENT,3053null,3054BYZANTINE_MUSICAL_SYMBOLS,3055MUSICAL_SYMBOLS,3056ANCIENT_GREEK_MUSICAL_NOTATION,3057null,3058TAI_XUAN_JING_SYMBOLS,3059COUNTING_ROD_NUMERALS,3060null,3061MATHEMATICAL_ALPHANUMERIC_SYMBOLS,3062null,3063ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS,3064null,3065MAHJONG_TILES,3066DOMINO_TILES,3067PLAYING_CARDS,3068ENCLOSED_ALPHANUMERIC_SUPPLEMENT,3069ENCLOSED_IDEOGRAPHIC_SUPPLEMENT,3070MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS,3071EMOTICONS,3072null,3073TRANSPORT_AND_MAP_SYMBOLS,3074ALCHEMICAL_SYMBOLS,3075null,3076CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B,3077null,3078CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C,3079CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D,3080null,3081CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT,3082null,3083TAGS,3084null,3085VARIATION_SELECTORS_SUPPLEMENT,3086null,3087SUPPLEMENTARY_PRIVATE_USE_AREA_A,3088SUPPLEMENTARY_PRIVATE_USE_AREA_B3089};309030913092/**3093* Returns the object representing the Unicode block containing the3094* given character, or {@code null} if the character is not a3095* member of a defined block.3096*3097* <p><b>Note:</b> This method cannot handle3098* <a href="Character.html#supplementary"> supplementary3099* characters</a>. To support all Unicode characters, including3100* supplementary characters, use the {@link #of(int)} method.3101*3102* @param c The character in question3103* @return The {@code UnicodeBlock} instance representing the3104* Unicode block of which this character is a member, or3105* {@code null} if the character is not a member of any3106* Unicode block3107*/3108public static UnicodeBlock of(char c) {3109return of((int)c);3110}31113112/**3113* Returns the object representing the Unicode block3114* containing the given character (Unicode code point), or3115* {@code null} if the character is not a member of a3116* defined block.3117*3118* @param codePoint the character (Unicode code point) in question.3119* @return The {@code UnicodeBlock} instance representing the3120* Unicode block of which this character is a member, or3121* {@code null} if the character is not a member of any3122* Unicode block3123* @exception IllegalArgumentException if the specified3124* {@code codePoint} is an invalid Unicode code point.3125* @see Character#isValidCodePoint(int)3126* @since 1.53127*/3128public static UnicodeBlock of(int codePoint) {3129if (!isValidCodePoint(codePoint)) {3130throw new IllegalArgumentException();3131}31323133int top, bottom, current;3134bottom = 0;3135top = blockStarts.length;3136current = top/2;31373138// invariant: top > current >= bottom && codePoint >= unicodeBlockStarts[bottom]3139while (top - bottom > 1) {3140if (codePoint >= blockStarts[current]) {3141bottom = current;3142} else {3143top = current;3144}3145current = (top + bottom) / 2;3146}3147return blocks[current];3148}31493150/**3151* Returns the UnicodeBlock with the given name. Block3152* names are determined by The Unicode Standard. The file3153* Blocks-<version>.txt defines blocks for a particular3154* version of the standard. The {@link Character} class specifies3155* the version of the standard that it supports.3156* <p>3157* This method accepts block names in the following forms:3158* <ol>3159* <li> Canonical block names as defined by the Unicode Standard.3160* For example, the standard defines a "Basic Latin" block. Therefore, this3161* method accepts "Basic Latin" as a valid block name. The documentation of3162* each UnicodeBlock provides the canonical name.3163* <li>Canonical block names with all spaces removed. For example, "BasicLatin"3164* is a valid block name for the "Basic Latin" block.3165* <li>The text representation of each constant UnicodeBlock identifier.3166* For example, this method will return the {@link #BASIC_LATIN} block if3167* provided with the "BASIC_LATIN" name. This form replaces all spaces and3168* hyphens in the canonical name with underscores.3169* </ol>3170* Finally, character case is ignored for all of the valid block name forms.3171* For example, "BASIC_LATIN" and "basic_latin" are both valid block names.3172* The en_US locale's case mapping rules are used to provide case-insensitive3173* string comparisons for block name validation.3174* <p>3175* If the Unicode Standard changes block names, both the previous and3176* current names will be accepted.3177*3178* @param blockName A {@code UnicodeBlock} name.3179* @return The {@code UnicodeBlock} instance identified3180* by {@code blockName}3181* @throws IllegalArgumentException if {@code blockName} is an3182* invalid name3183* @throws NullPointerException if {@code blockName} is null3184* @since 1.53185*/3186public static final UnicodeBlock forName(String blockName) {3187UnicodeBlock block = map.get(blockName.toUpperCase(Locale.US));3188if (block == null) {3189throw new IllegalArgumentException();3190}3191return block;3192}3193}319431953196/**3197* A family of character subsets representing the character scripts3198* defined in the <a href="http://www.unicode.org/reports/tr24/">3199* <i>Unicode Standard Annex #24: Script Names</i></a>. Every Unicode3200* character is assigned to a single Unicode script, either a specific3201* script, such as {@link Character.UnicodeScript#LATIN Latin}, or3202* one of the following three special values,3203* {@link Character.UnicodeScript#INHERITED Inherited},3204* {@link Character.UnicodeScript#COMMON Common} or3205* {@link Character.UnicodeScript#UNKNOWN Unknown}.3206*3207* @since 1.73208*/3209public static enum UnicodeScript {3210/**3211* Unicode script "Common".3212*/3213COMMON,32143215/**3216* Unicode script "Latin".3217*/3218LATIN,32193220/**3221* Unicode script "Greek".3222*/3223GREEK,32243225/**3226* Unicode script "Cyrillic".3227*/3228CYRILLIC,32293230/**3231* Unicode script "Armenian".3232*/3233ARMENIAN,32343235/**3236* Unicode script "Hebrew".3237*/3238HEBREW,32393240/**3241* Unicode script "Arabic".3242*/3243ARABIC,32443245/**3246* Unicode script "Syriac".3247*/3248SYRIAC,32493250/**3251* Unicode script "Thaana".3252*/3253THAANA,32543255/**3256* Unicode script "Devanagari".3257*/3258DEVANAGARI,32593260/**3261* Unicode script "Bengali".3262*/3263BENGALI,32643265/**3266* Unicode script "Gurmukhi".3267*/3268GURMUKHI,32693270/**3271* Unicode script "Gujarati".3272*/3273GUJARATI,32743275/**3276* Unicode script "Oriya".3277*/3278ORIYA,32793280/**3281* Unicode script "Tamil".3282*/3283TAMIL,32843285/**3286* Unicode script "Telugu".3287*/3288TELUGU,32893290/**3291* Unicode script "Kannada".3292*/3293KANNADA,32943295/**3296* Unicode script "Malayalam".3297*/3298MALAYALAM,32993300/**3301* Unicode script "Sinhala".3302*/3303SINHALA,33043305/**3306* Unicode script "Thai".3307*/3308THAI,33093310/**3311* Unicode script "Lao".3312*/3313LAO,33143315/**3316* Unicode script "Tibetan".3317*/3318TIBETAN,33193320/**3321* Unicode script "Myanmar".3322*/3323MYANMAR,33243325/**3326* Unicode script "Georgian".3327*/3328GEORGIAN,33293330/**3331* Unicode script "Hangul".3332*/3333HANGUL,33343335/**3336* Unicode script "Ethiopic".3337*/3338ETHIOPIC,33393340/**3341* Unicode script "Cherokee".3342*/3343CHEROKEE,33443345/**3346* Unicode script "Canadian_Aboriginal".3347*/3348CANADIAN_ABORIGINAL,33493350/**3351* Unicode script "Ogham".3352*/3353OGHAM,33543355/**3356* Unicode script "Runic".3357*/3358RUNIC,33593360/**3361* Unicode script "Khmer".3362*/3363KHMER,33643365/**3366* Unicode script "Mongolian".3367*/3368MONGOLIAN,33693370/**3371* Unicode script "Hiragana".3372*/3373HIRAGANA,33743375/**3376* Unicode script "Katakana".3377*/3378KATAKANA,33793380/**3381* Unicode script "Bopomofo".3382*/3383BOPOMOFO,33843385/**3386* Unicode script "Han".3387*/3388HAN,33893390/**3391* Unicode script "Yi".3392*/3393YI,33943395/**3396* Unicode script "Old_Italic".3397*/3398OLD_ITALIC,33993400/**3401* Unicode script "Gothic".3402*/3403GOTHIC,34043405/**3406* Unicode script "Deseret".3407*/3408DESERET,34093410/**3411* Unicode script "Inherited".3412*/3413INHERITED,34143415/**3416* Unicode script "Tagalog".3417*/3418TAGALOG,34193420/**3421* Unicode script "Hanunoo".3422*/3423HANUNOO,34243425/**3426* Unicode script "Buhid".3427*/3428BUHID,34293430/**3431* Unicode script "Tagbanwa".3432*/3433TAGBANWA,34343435/**3436* Unicode script "Limbu".3437*/3438LIMBU,34393440/**3441* Unicode script "Tai_Le".3442*/3443TAI_LE,34443445/**3446* Unicode script "Linear_B".3447*/3448LINEAR_B,34493450/**3451* Unicode script "Ugaritic".3452*/3453UGARITIC,34543455/**3456* Unicode script "Shavian".3457*/3458SHAVIAN,34593460/**3461* Unicode script "Osmanya".3462*/3463OSMANYA,34643465/**3466* Unicode script "Cypriot".3467*/3468CYPRIOT,34693470/**3471* Unicode script "Braille".3472*/3473BRAILLE,34743475/**3476* Unicode script "Buginese".3477*/3478BUGINESE,34793480/**3481* Unicode script "Coptic".3482*/3483COPTIC,34843485/**3486* Unicode script "New_Tai_Lue".3487*/3488NEW_TAI_LUE,34893490/**3491* Unicode script "Glagolitic".3492*/3493GLAGOLITIC,34943495/**3496* Unicode script "Tifinagh".3497*/3498TIFINAGH,34993500/**3501* Unicode script "Syloti_Nagri".3502*/3503SYLOTI_NAGRI,35043505/**3506* Unicode script "Old_Persian".3507*/3508OLD_PERSIAN,35093510/**3511* Unicode script "Kharoshthi".3512*/3513KHAROSHTHI,35143515/**3516* Unicode script "Balinese".3517*/3518BALINESE,35193520/**3521* Unicode script "Cuneiform".3522*/3523CUNEIFORM,35243525/**3526* Unicode script "Phoenician".3527*/3528PHOENICIAN,35293530/**3531* Unicode script "Phags_Pa".3532*/3533PHAGS_PA,35343535/**3536* Unicode script "Nko".3537*/3538NKO,35393540/**3541* Unicode script "Sundanese".3542*/3543SUNDANESE,35443545/**3546* Unicode script "Batak".3547*/3548BATAK,35493550/**3551* Unicode script "Lepcha".3552*/3553LEPCHA,35543555/**3556* Unicode script "Ol_Chiki".3557*/3558OL_CHIKI,35593560/**3561* Unicode script "Vai".3562*/3563VAI,35643565/**3566* Unicode script "Saurashtra".3567*/3568SAURASHTRA,35693570/**3571* Unicode script "Kayah_Li".3572*/3573KAYAH_LI,35743575/**3576* Unicode script "Rejang".3577*/3578REJANG,35793580/**3581* Unicode script "Lycian".3582*/3583LYCIAN,35843585/**3586* Unicode script "Carian".3587*/3588CARIAN,35893590/**3591* Unicode script "Lydian".3592*/3593LYDIAN,35943595/**3596* Unicode script "Cham".3597*/3598CHAM,35993600/**3601* Unicode script "Tai_Tham".3602*/3603TAI_THAM,36043605/**3606* Unicode script "Tai_Viet".3607*/3608TAI_VIET,36093610/**3611* Unicode script "Avestan".3612*/3613AVESTAN,36143615/**3616* Unicode script "Egyptian_Hieroglyphs".3617*/3618EGYPTIAN_HIEROGLYPHS,36193620/**3621* Unicode script "Samaritan".3622*/3623SAMARITAN,36243625/**3626* Unicode script "Mandaic".3627*/3628MANDAIC,36293630/**3631* Unicode script "Lisu".3632*/3633LISU,36343635/**3636* Unicode script "Bamum".3637*/3638BAMUM,36393640/**3641* Unicode script "Javanese".3642*/3643JAVANESE,36443645/**3646* Unicode script "Meetei_Mayek".3647*/3648MEETEI_MAYEK,36493650/**3651* Unicode script "Imperial_Aramaic".3652*/3653IMPERIAL_ARAMAIC,36543655/**3656* Unicode script "Old_South_Arabian".3657*/3658OLD_SOUTH_ARABIAN,36593660/**3661* Unicode script "Inscriptional_Parthian".3662*/3663INSCRIPTIONAL_PARTHIAN,36643665/**3666* Unicode script "Inscriptional_Pahlavi".3667*/3668INSCRIPTIONAL_PAHLAVI,36693670/**3671* Unicode script "Old_Turkic".3672*/3673OLD_TURKIC,36743675/**3676* Unicode script "Brahmi".3677*/3678BRAHMI,36793680/**3681* Unicode script "Kaithi".3682*/3683KAITHI,36843685/**3686* Unicode script "Meroitic Hieroglyphs".3687*/3688MEROITIC_HIEROGLYPHS,36893690/**3691* Unicode script "Meroitic Cursive".3692*/3693MEROITIC_CURSIVE,36943695/**3696* Unicode script "Sora Sompeng".3697*/3698SORA_SOMPENG,36993700/**3701* Unicode script "Chakma".3702*/3703CHAKMA,37043705/**3706* Unicode script "Sharada".3707*/3708SHARADA,37093710/**3711* Unicode script "Takri".3712*/3713TAKRI,37143715/**3716* Unicode script "Miao".3717*/3718MIAO,37193720/**3721* Unicode script "Unknown".3722*/3723UNKNOWN;37243725private static final int[] scriptStarts = {37260x0000, // 0000..0040; COMMON37270x0041, // 0041..005A; LATIN37280x005B, // 005B..0060; COMMON37290x0061, // 0061..007A; LATIN37300x007B, // 007B..00A9; COMMON37310x00AA, // 00AA..00AA; LATIN37320x00AB, // 00AB..00B9; COMMON37330x00BA, // 00BA..00BA; LATIN37340x00BB, // 00BB..00BF; COMMON37350x00C0, // 00C0..00D6; LATIN37360x00D7, // 00D7..00D7; COMMON37370x00D8, // 00D8..00F6; LATIN37380x00F7, // 00F7..00F7; COMMON37390x00F8, // 00F8..02B8; LATIN37400x02B9, // 02B9..02DF; COMMON37410x02E0, // 02E0..02E4; LATIN37420x02E5, // 02E5..02E9; COMMON37430x02EA, // 02EA..02EB; BOPOMOFO37440x02EC, // 02EC..02FF; COMMON37450x0300, // 0300..036F; INHERITED37460x0370, // 0370..0373; GREEK37470x0374, // 0374..0374; COMMON37480x0375, // 0375..037D; GREEK37490x037E, // 037E..0383; COMMON37500x0384, // 0384..0384; GREEK37510x0385, // 0385..0385; COMMON37520x0386, // 0386..0386; GREEK37530x0387, // 0387..0387; COMMON37540x0388, // 0388..03E1; GREEK37550x03E2, // 03E2..03EF; COPTIC37560x03F0, // 03F0..03FF; GREEK37570x0400, // 0400..0484; CYRILLIC37580x0485, // 0485..0486; INHERITED37590x0487, // 0487..0530; CYRILLIC37600x0531, // 0531..0588; ARMENIAN37610x0589, // 0589..0589; COMMON37620x058A, // 058A..0590; ARMENIAN37630x0591, // 0591..05FF; HEBREW37640x0600, // 0600..060B; ARABIC37650x060C, // 060C..060C; COMMON37660x060D, // 060D..061A; ARABIC37670x061B, // 061B..061D; COMMON37680x061E, // 061E..061E; ARABIC37690x061F, // 061F..061F; COMMON37700x0620, // 0620..063F; ARABIC37710x0640, // 0640..0640; COMMON37720x0641, // 0641..064A; ARABIC37730x064B, // 064B..0655; INHERITED37740x0656, // 0656..065F; ARABIC37750x0660, // 0660..0669; COMMON37760x066A, // 066A..066F; ARABIC37770x0670, // 0670..0670; INHERITED37780x0671, // 0671..06DC; ARABIC37790x06DD, // 06DD..06DD; COMMON37800x06DE, // 06DE..06FF; ARABIC37810x0700, // 0700..074F; SYRIAC37820x0750, // 0750..077F; ARABIC37830x0780, // 0780..07BF; THAANA37840x07C0, // 07C0..07FF; NKO37850x0800, // 0800..083F; SAMARITAN37860x0840, // 0840..089F; MANDAIC37870x08A0, // 08A0..08FF; ARABIC37880x0900, // 0900..0950; DEVANAGARI37890x0951, // 0951..0952; INHERITED37900x0953, // 0953..0963; DEVANAGARI37910x0964, // 0964..0965; COMMON37920x0966, // 0966..0980; DEVANAGARI37930x0981, // 0981..0A00; BENGALI37940x0A01, // 0A01..0A80; GURMUKHI37950x0A81, // 0A81..0B00; GUJARATI37960x0B01, // 0B01..0B81; ORIYA37970x0B82, // 0B82..0C00; TAMIL37980x0C01, // 0C01..0C81; TELUGU37990x0C82, // 0C82..0CF0; KANNADA38000x0D02, // 0D02..0D81; MALAYALAM38010x0D82, // 0D82..0E00; SINHALA38020x0E01, // 0E01..0E3E; THAI38030x0E3F, // 0E3F..0E3F; COMMON38040x0E40, // 0E40..0E80; THAI38050x0E81, // 0E81..0EFF; LAO38060x0F00, // 0F00..0FD4; TIBETAN38070x0FD5, // 0FD5..0FD8; COMMON38080x0FD9, // 0FD9..0FFF; TIBETAN38090x1000, // 1000..109F; MYANMAR38100x10A0, // 10A0..10FA; GEORGIAN38110x10FB, // 10FB..10FB; COMMON38120x10FC, // 10FC..10FF; GEORGIAN38130x1100, // 1100..11FF; HANGUL38140x1200, // 1200..139F; ETHIOPIC38150x13A0, // 13A0..13FF; CHEROKEE38160x1400, // 1400..167F; CANADIAN_ABORIGINAL38170x1680, // 1680..169F; OGHAM38180x16A0, // 16A0..16EA; RUNIC38190x16EB, // 16EB..16ED; COMMON38200x16EE, // 16EE..16FF; RUNIC38210x1700, // 1700..171F; TAGALOG38220x1720, // 1720..1734; HANUNOO38230x1735, // 1735..173F; COMMON38240x1740, // 1740..175F; BUHID38250x1760, // 1760..177F; TAGBANWA38260x1780, // 1780..17FF; KHMER38270x1800, // 1800..1801; MONGOLIAN38280x1802, // 1802..1803; COMMON38290x1804, // 1804..1804; MONGOLIAN38300x1805, // 1805..1805; COMMON38310x1806, // 1806..18AF; MONGOLIAN38320x18B0, // 18B0..18FF; CANADIAN_ABORIGINAL38330x1900, // 1900..194F; LIMBU38340x1950, // 1950..197F; TAI_LE38350x1980, // 1980..19DF; NEW_TAI_LUE38360x19E0, // 19E0..19FF; KHMER38370x1A00, // 1A00..1A1F; BUGINESE38380x1A20, // 1A20..1AFF; TAI_THAM38390x1B00, // 1B00..1B7F; BALINESE38400x1B80, // 1B80..1BBF; SUNDANESE38410x1BC0, // 1BC0..1BFF; BATAK38420x1C00, // 1C00..1C4F; LEPCHA38430x1C50, // 1C50..1CBF; OL_CHIKI38440x1CC0, // 1CC0..1CCF; SUNDANESE38450x1CD0, // 1CD0..1CD2; INHERITED38460x1CD3, // 1CD3..1CD3; COMMON38470x1CD4, // 1CD4..1CE0; INHERITED38480x1CE1, // 1CE1..1CE1; COMMON38490x1CE2, // 1CE2..1CE8; INHERITED38500x1CE9, // 1CE9..1CEC; COMMON38510x1CED, // 1CED..1CED; INHERITED38520x1CEE, // 1CEE..1CF3; COMMON38530x1CF4, // 1CF4..1CF4; INHERITED38540x1CF5, // 1CF5..1CFF; COMMON38550x1D00, // 1D00..1D25; LATIN38560x1D26, // 1D26..1D2A; GREEK38570x1D2B, // 1D2B..1D2B; CYRILLIC38580x1D2C, // 1D2C..1D5C; LATIN38590x1D5D, // 1D5D..1D61; GREEK38600x1D62, // 1D62..1D65; LATIN38610x1D66, // 1D66..1D6A; GREEK38620x1D6B, // 1D6B..1D77; LATIN38630x1D78, // 1D78..1D78; CYRILLIC38640x1D79, // 1D79..1DBE; LATIN38650x1DBF, // 1DBF..1DBF; GREEK38660x1DC0, // 1DC0..1DFF; INHERITED38670x1E00, // 1E00..1EFF; LATIN38680x1F00, // 1F00..1FFF; GREEK38690x2000, // 2000..200B; COMMON38700x200C, // 200C..200D; INHERITED38710x200E, // 200E..2070; COMMON38720x2071, // 2071..2073; LATIN38730x2074, // 2074..207E; COMMON38740x207F, // 207F..207F; LATIN38750x2080, // 2080..208F; COMMON38760x2090, // 2090..209F; LATIN38770x20A0, // 20A0..20CF; COMMON38780x20D0, // 20D0..20FF; INHERITED38790x2100, // 2100..2125; COMMON38800x2126, // 2126..2126; GREEK38810x2127, // 2127..2129; COMMON38820x212A, // 212A..212B; LATIN38830x212C, // 212C..2131; COMMON38840x2132, // 2132..2132; LATIN38850x2133, // 2133..214D; COMMON38860x214E, // 214E..214E; LATIN38870x214F, // 214F..215F; COMMON38880x2160, // 2160..2188; LATIN38890x2189, // 2189..27FF; COMMON38900x2800, // 2800..28FF; BRAILLE38910x2900, // 2900..2BFF; COMMON38920x2C00, // 2C00..2C5F; GLAGOLITIC38930x2C60, // 2C60..2C7F; LATIN38940x2C80, // 2C80..2CFF; COPTIC38950x2D00, // 2D00..2D2F; GEORGIAN38960x2D30, // 2D30..2D7F; TIFINAGH38970x2D80, // 2D80..2DDF; ETHIOPIC38980x2DE0, // 2DE0..2DFF; CYRILLIC38990x2E00, // 2E00..2E7F; COMMON39000x2E80, // 2E80..2FEF; HAN39010x2FF0, // 2FF0..3004; COMMON39020x3005, // 3005..3005; HAN39030x3006, // 3006..3006; COMMON39040x3007, // 3007..3007; HAN39050x3008, // 3008..3020; COMMON39060x3021, // 3021..3029; HAN39070x302A, // 302A..302D; INHERITED39080x302E, // 302E..302F; HANGUL39090x3030, // 3030..3037; COMMON39100x3038, // 3038..303B; HAN39110x303C, // 303C..3040; COMMON39120x3041, // 3041..3098; HIRAGANA39130x3099, // 3099..309A; INHERITED39140x309B, // 309B..309C; COMMON39150x309D, // 309D..309F; HIRAGANA39160x30A0, // 30A0..30A0; COMMON39170x30A1, // 30A1..30FA; KATAKANA39180x30FB, // 30FB..30FC; COMMON39190x30FD, // 30FD..3104; KATAKANA39200x3105, // 3105..3130; BOPOMOFO39210x3131, // 3131..318F; HANGUL39220x3190, // 3190..319F; COMMON39230x31A0, // 31A0..31BF; BOPOMOFO39240x31C0, // 31C0..31EF; COMMON39250x31F0, // 31F0..31FF; KATAKANA39260x3200, // 3200..321F; HANGUL39270x3220, // 3220..325F; COMMON39280x3260, // 3260..327E; HANGUL39290x327F, // 327F..32CF; COMMON39300x32D0, // 32D0..32FE; KATAKANA39310x32FF, // 32FF ; COMMON39320x3300, // 3300..3357; KATAKANA39330x3358, // 3358..33FF; COMMON39340x3400, // 3400..4DBF; HAN39350x4DC0, // 4DC0..4DFF; COMMON39360x4E00, // 4E00..9FFF; HAN39370xA000, // A000..A4CF; YI39380xA4D0, // A4D0..A4FF; LISU39390xA500, // A500..A63F; VAI39400xA640, // A640..A69F; CYRILLIC39410xA6A0, // A6A0..A6FF; BAMUM39420xA700, // A700..A721; COMMON39430xA722, // A722..A787; LATIN39440xA788, // A788..A78A; COMMON39450xA78B, // A78B..A7FF; LATIN39460xA800, // A800..A82F; SYLOTI_NAGRI39470xA830, // A830..A83F; COMMON39480xA840, // A840..A87F; PHAGS_PA39490xA880, // A880..A8DF; SAURASHTRA39500xA8E0, // A8E0..A8FF; DEVANAGARI39510xA900, // A900..A92F; KAYAH_LI39520xA930, // A930..A95F; REJANG39530xA960, // A960..A97F; HANGUL39540xA980, // A980..A9FF; JAVANESE39550xAA00, // AA00..AA5F; CHAM39560xAA60, // AA60..AA7F; MYANMAR39570xAA80, // AA80..AADF; TAI_VIET39580xAAE0, // AAE0..AB00; MEETEI_MAYEK39590xAB01, // AB01..ABBF; ETHIOPIC39600xABC0, // ABC0..ABFF; MEETEI_MAYEK39610xAC00, // AC00..D7FB; HANGUL39620xD7FC, // D7FC..F8FF; UNKNOWN39630xF900, // F900..FAFF; HAN39640xFB00, // FB00..FB12; LATIN39650xFB13, // FB13..FB1C; ARMENIAN39660xFB1D, // FB1D..FB4F; HEBREW39670xFB50, // FB50..FD3D; ARABIC39680xFD3E, // FD3E..FD4F; COMMON39690xFD50, // FD50..FDFC; ARABIC39700xFDFD, // FDFD..FDFF; COMMON39710xFE00, // FE00..FE0F; INHERITED39720xFE10, // FE10..FE1F; COMMON39730xFE20, // FE20..FE2F; INHERITED39740xFE30, // FE30..FE6F; COMMON39750xFE70, // FE70..FEFE; ARABIC39760xFEFF, // FEFF..FF20; COMMON39770xFF21, // FF21..FF3A; LATIN39780xFF3B, // FF3B..FF40; COMMON39790xFF41, // FF41..FF5A; LATIN39800xFF5B, // FF5B..FF65; COMMON39810xFF66, // FF66..FF6F; KATAKANA39820xFF70, // FF70..FF70; COMMON39830xFF71, // FF71..FF9D; KATAKANA39840xFF9E, // FF9E..FF9F; COMMON39850xFFA0, // FFA0..FFDF; HANGUL39860xFFE0, // FFE0..FFFF; COMMON39870x10000, // 10000..100FF; LINEAR_B39880x10100, // 10100..1013F; COMMON39890x10140, // 10140..1018F; GREEK39900x10190, // 10190..101FC; COMMON39910x101FD, // 101FD..1027F; INHERITED39920x10280, // 10280..1029F; LYCIAN39930x102A0, // 102A0..102FF; CARIAN39940x10300, // 10300..1032F; OLD_ITALIC39950x10330, // 10330..1037F; GOTHIC39960x10380, // 10380..1039F; UGARITIC39970x103A0, // 103A0..103FF; OLD_PERSIAN39980x10400, // 10400..1044F; DESERET39990x10450, // 10450..1047F; SHAVIAN40000x10480, // 10480..107FF; OSMANYA40010x10800, // 10800..1083F; CYPRIOT40020x10840, // 10840..108FF; IMPERIAL_ARAMAIC40030x10900, // 10900..1091F; PHOENICIAN40040x10920, // 10920..1097F; LYDIAN40050x10980, // 10980..1099F; MEROITIC_HIEROGLYPHS40060x109A0, // 109A0..109FF; MEROITIC_CURSIVE40070x10A00, // 10A00..10A5F; KHAROSHTHI40080x10A60, // 10A60..10AFF; OLD_SOUTH_ARABIAN40090x10B00, // 10B00..10B3F; AVESTAN40100x10B40, // 10B40..10B5F; INSCRIPTIONAL_PARTHIAN40110x10B60, // 10B60..10BFF; INSCRIPTIONAL_PAHLAVI40120x10C00, // 10C00..10E5F; OLD_TURKIC40130x10E60, // 10E60..10FFF; ARABIC40140x11000, // 11000..1107F; BRAHMI40150x11080, // 11080..110CF; KAITHI40160x110D0, // 110D0..110FF; SORA_SOMPENG40170x11100, // 11100..1117F; CHAKMA40180x11180, // 11180..1167F; SHARADA40190x11680, // 11680..116CF; TAKRI40200x12000, // 12000..12FFF; CUNEIFORM40210x13000, // 13000..167FF; EGYPTIAN_HIEROGLYPHS40220x16800, // 16800..16A38; BAMUM40230x16F00, // 16F00..16F9F; MIAO40240x1B000, // 1B000..1B000; KATAKANA40250x1B001, // 1B001..1CFFF; HIRAGANA40260x1D000, // 1D000..1D166; COMMON40270x1D167, // 1D167..1D169; INHERITED40280x1D16A, // 1D16A..1D17A; COMMON40290x1D17B, // 1D17B..1D182; INHERITED40300x1D183, // 1D183..1D184; COMMON40310x1D185, // 1D185..1D18B; INHERITED40320x1D18C, // 1D18C..1D1A9; COMMON40330x1D1AA, // 1D1AA..1D1AD; INHERITED40340x1D1AE, // 1D1AE..1D1FF; COMMON40350x1D200, // 1D200..1D2FF; GREEK40360x1D300, // 1D300..1EDFF; COMMON40370x1EE00, // 1EE00..1EFFF; ARABIC40380x1F000, // 1F000..1F1FF; COMMON40390x1F200, // 1F200..1F200; HIRAGANA40400x1F201, // 1F210..1FFFF; COMMON40410x20000, // 20000..E0000; HAN40420xE0001, // E0001..E00FF; COMMON40430xE0100, // E0100..E01EF; INHERITED40440xE01F0 // E01F0..10FFFF; UNKNOWN40454046};40474048private static final UnicodeScript[] scripts = {4049COMMON,4050LATIN,4051COMMON,4052LATIN,4053COMMON,4054LATIN,4055COMMON,4056LATIN,4057COMMON,4058LATIN,4059COMMON,4060LATIN,4061COMMON,4062LATIN,4063COMMON,4064LATIN,4065COMMON,4066BOPOMOFO,4067COMMON,4068INHERITED,4069GREEK,4070COMMON,4071GREEK,4072COMMON,4073GREEK,4074COMMON,4075GREEK,4076COMMON,4077GREEK,4078COPTIC,4079GREEK,4080CYRILLIC,4081INHERITED,4082CYRILLIC,4083ARMENIAN,4084COMMON,4085ARMENIAN,4086HEBREW,4087ARABIC,4088COMMON,4089ARABIC,4090COMMON,4091ARABIC,4092COMMON,4093ARABIC,4094COMMON,4095ARABIC,4096INHERITED,4097ARABIC,4098COMMON,4099ARABIC,4100INHERITED,4101ARABIC,4102COMMON,4103ARABIC,4104SYRIAC,4105ARABIC,4106THAANA,4107NKO,4108SAMARITAN,4109MANDAIC,4110ARABIC,4111DEVANAGARI,4112INHERITED,4113DEVANAGARI,4114COMMON,4115DEVANAGARI,4116BENGALI,4117GURMUKHI,4118GUJARATI,4119ORIYA,4120TAMIL,4121TELUGU,4122KANNADA,4123MALAYALAM,4124SINHALA,4125THAI,4126COMMON,4127THAI,4128LAO,4129TIBETAN,4130COMMON,4131TIBETAN,4132MYANMAR,4133GEORGIAN,4134COMMON,4135GEORGIAN,4136HANGUL,4137ETHIOPIC,4138CHEROKEE,4139CANADIAN_ABORIGINAL,4140OGHAM,4141RUNIC,4142COMMON,4143RUNIC,4144TAGALOG,4145HANUNOO,4146COMMON,4147BUHID,4148TAGBANWA,4149KHMER,4150MONGOLIAN,4151COMMON,4152MONGOLIAN,4153COMMON,4154MONGOLIAN,4155CANADIAN_ABORIGINAL,4156LIMBU,4157TAI_LE,4158NEW_TAI_LUE,4159KHMER,4160BUGINESE,4161TAI_THAM,4162BALINESE,4163SUNDANESE,4164BATAK,4165LEPCHA,4166OL_CHIKI,4167SUNDANESE,4168INHERITED,4169COMMON,4170INHERITED,4171COMMON,4172INHERITED,4173COMMON,4174INHERITED,4175COMMON,4176INHERITED,4177COMMON,4178LATIN,4179GREEK,4180CYRILLIC,4181LATIN,4182GREEK,4183LATIN,4184GREEK,4185LATIN,4186CYRILLIC,4187LATIN,4188GREEK,4189INHERITED,4190LATIN,4191GREEK,4192COMMON,4193INHERITED,4194COMMON,4195LATIN,4196COMMON,4197LATIN,4198COMMON,4199LATIN,4200COMMON,4201INHERITED,4202COMMON,4203GREEK,4204COMMON,4205LATIN,4206COMMON,4207LATIN,4208COMMON,4209LATIN,4210COMMON,4211LATIN,4212COMMON,4213BRAILLE,4214COMMON,4215GLAGOLITIC,4216LATIN,4217COPTIC,4218GEORGIAN,4219TIFINAGH,4220ETHIOPIC,4221CYRILLIC,4222COMMON,4223HAN,4224COMMON,4225HAN,4226COMMON,4227HAN,4228COMMON,4229HAN,4230INHERITED,4231HANGUL,4232COMMON,4233HAN,4234COMMON,4235HIRAGANA,4236INHERITED,4237COMMON,4238HIRAGANA,4239COMMON,4240KATAKANA,4241COMMON,4242KATAKANA,4243BOPOMOFO,4244HANGUL,4245COMMON,4246BOPOMOFO,4247COMMON,4248KATAKANA,4249HANGUL,4250COMMON,4251HANGUL,4252COMMON,4253KATAKANA, // 32D0..32FE4254COMMON, // 32FF4255KATAKANA, // 3300..33574256COMMON,4257HAN,4258COMMON,4259HAN,4260YI,4261LISU,4262VAI,4263CYRILLIC,4264BAMUM,4265COMMON,4266LATIN,4267COMMON,4268LATIN,4269SYLOTI_NAGRI,4270COMMON,4271PHAGS_PA,4272SAURASHTRA,4273DEVANAGARI,4274KAYAH_LI,4275REJANG,4276HANGUL,4277JAVANESE,4278CHAM,4279MYANMAR,4280TAI_VIET,4281MEETEI_MAYEK,4282ETHIOPIC,4283MEETEI_MAYEK,4284HANGUL,4285UNKNOWN ,4286HAN,4287LATIN,4288ARMENIAN,4289HEBREW,4290ARABIC,4291COMMON,4292ARABIC,4293COMMON,4294INHERITED,4295COMMON,4296INHERITED,4297COMMON,4298ARABIC,4299COMMON,4300LATIN,4301COMMON,4302LATIN,4303COMMON,4304KATAKANA,4305COMMON,4306KATAKANA,4307COMMON,4308HANGUL,4309COMMON,4310LINEAR_B,4311COMMON,4312GREEK,4313COMMON,4314INHERITED,4315LYCIAN,4316CARIAN,4317OLD_ITALIC,4318GOTHIC,4319UGARITIC,4320OLD_PERSIAN,4321DESERET,4322SHAVIAN,4323OSMANYA,4324CYPRIOT,4325IMPERIAL_ARAMAIC,4326PHOENICIAN,4327LYDIAN,4328MEROITIC_HIEROGLYPHS,4329MEROITIC_CURSIVE,4330KHAROSHTHI,4331OLD_SOUTH_ARABIAN,4332AVESTAN,4333INSCRIPTIONAL_PARTHIAN,4334INSCRIPTIONAL_PAHLAVI,4335OLD_TURKIC,4336ARABIC,4337BRAHMI,4338KAITHI,4339SORA_SOMPENG,4340CHAKMA,4341SHARADA,4342TAKRI,4343CUNEIFORM,4344EGYPTIAN_HIEROGLYPHS,4345BAMUM,4346MIAO,4347KATAKANA,4348HIRAGANA,4349COMMON,4350INHERITED,4351COMMON,4352INHERITED,4353COMMON,4354INHERITED,4355COMMON,4356INHERITED,4357COMMON,4358GREEK,4359COMMON,4360ARABIC,4361COMMON,4362HIRAGANA,4363COMMON,4364HAN,4365COMMON,4366INHERITED,4367UNKNOWN4368};43694370private static HashMap<String, Character.UnicodeScript> aliases;4371static {4372aliases = new HashMap<>(128);4373aliases.put("ARAB", ARABIC);4374aliases.put("ARMI", IMPERIAL_ARAMAIC);4375aliases.put("ARMN", ARMENIAN);4376aliases.put("AVST", AVESTAN);4377aliases.put("BALI", BALINESE);4378aliases.put("BAMU", BAMUM);4379aliases.put("BATK", BATAK);4380aliases.put("BENG", BENGALI);4381aliases.put("BOPO", BOPOMOFO);4382aliases.put("BRAI", BRAILLE);4383aliases.put("BRAH", BRAHMI);4384aliases.put("BUGI", BUGINESE);4385aliases.put("BUHD", BUHID);4386aliases.put("CAKM", CHAKMA);4387aliases.put("CANS", CANADIAN_ABORIGINAL);4388aliases.put("CARI", CARIAN);4389aliases.put("CHAM", CHAM);4390aliases.put("CHER", CHEROKEE);4391aliases.put("COPT", COPTIC);4392aliases.put("CPRT", CYPRIOT);4393aliases.put("CYRL", CYRILLIC);4394aliases.put("DEVA", DEVANAGARI);4395aliases.put("DSRT", DESERET);4396aliases.put("EGYP", EGYPTIAN_HIEROGLYPHS);4397aliases.put("ETHI", ETHIOPIC);4398aliases.put("GEOR", GEORGIAN);4399aliases.put("GLAG", GLAGOLITIC);4400aliases.put("GOTH", GOTHIC);4401aliases.put("GREK", GREEK);4402aliases.put("GUJR", GUJARATI);4403aliases.put("GURU", GURMUKHI);4404aliases.put("HANG", HANGUL);4405aliases.put("HANI", HAN);4406aliases.put("HANO", HANUNOO);4407aliases.put("HEBR", HEBREW);4408aliases.put("HIRA", HIRAGANA);4409// it appears we don't have the KATAKANA_OR_HIRAGANA4410//aliases.put("HRKT", KATAKANA_OR_HIRAGANA);4411aliases.put("ITAL", OLD_ITALIC);4412aliases.put("JAVA", JAVANESE);4413aliases.put("KALI", KAYAH_LI);4414aliases.put("KANA", KATAKANA);4415aliases.put("KHAR", KHAROSHTHI);4416aliases.put("KHMR", KHMER);4417aliases.put("KNDA", KANNADA);4418aliases.put("KTHI", KAITHI);4419aliases.put("LANA", TAI_THAM);4420aliases.put("LAOO", LAO);4421aliases.put("LATN", LATIN);4422aliases.put("LEPC", LEPCHA);4423aliases.put("LIMB", LIMBU);4424aliases.put("LINB", LINEAR_B);4425aliases.put("LISU", LISU);4426aliases.put("LYCI", LYCIAN);4427aliases.put("LYDI", LYDIAN);4428aliases.put("MAND", MANDAIC);4429aliases.put("MERC", MEROITIC_CURSIVE);4430aliases.put("MERO", MEROITIC_HIEROGLYPHS);4431aliases.put("MLYM", MALAYALAM);4432aliases.put("MONG", MONGOLIAN);4433aliases.put("MTEI", MEETEI_MAYEK);4434aliases.put("MYMR", MYANMAR);4435aliases.put("NKOO", NKO);4436aliases.put("OGAM", OGHAM);4437aliases.put("OLCK", OL_CHIKI);4438aliases.put("ORKH", OLD_TURKIC);4439aliases.put("ORYA", ORIYA);4440aliases.put("OSMA", OSMANYA);4441aliases.put("PHAG", PHAGS_PA);4442aliases.put("PLRD", MIAO);4443aliases.put("PHLI", INSCRIPTIONAL_PAHLAVI);4444aliases.put("PHNX", PHOENICIAN);4445aliases.put("PRTI", INSCRIPTIONAL_PARTHIAN);4446aliases.put("RJNG", REJANG);4447aliases.put("RUNR", RUNIC);4448aliases.put("SAMR", SAMARITAN);4449aliases.put("SARB", OLD_SOUTH_ARABIAN);4450aliases.put("SAUR", SAURASHTRA);4451aliases.put("SHAW", SHAVIAN);4452aliases.put("SHRD", SHARADA);4453aliases.put("SINH", SINHALA);4454aliases.put("SORA", SORA_SOMPENG);4455aliases.put("SUND", SUNDANESE);4456aliases.put("SYLO", SYLOTI_NAGRI);4457aliases.put("SYRC", SYRIAC);4458aliases.put("TAGB", TAGBANWA);4459aliases.put("TALE", TAI_LE);4460aliases.put("TAKR", TAKRI);4461aliases.put("TALU", NEW_TAI_LUE);4462aliases.put("TAML", TAMIL);4463aliases.put("TAVT", TAI_VIET);4464aliases.put("TELU", TELUGU);4465aliases.put("TFNG", TIFINAGH);4466aliases.put("TGLG", TAGALOG);4467aliases.put("THAA", THAANA);4468aliases.put("THAI", THAI);4469aliases.put("TIBT", TIBETAN);4470aliases.put("UGAR", UGARITIC);4471aliases.put("VAII", VAI);4472aliases.put("XPEO", OLD_PERSIAN);4473aliases.put("XSUX", CUNEIFORM);4474aliases.put("YIII", YI);4475aliases.put("ZINH", INHERITED);4476aliases.put("ZYYY", COMMON);4477aliases.put("ZZZZ", UNKNOWN);4478}44794480/**4481* Returns the enum constant representing the Unicode script of which4482* the given character (Unicode code point) is assigned to.4483*4484* @param codePoint the character (Unicode code point) in question.4485* @return The {@code UnicodeScript} constant representing the4486* Unicode script of which this character is assigned to.4487*4488* @exception IllegalArgumentException if the specified4489* {@code codePoint} is an invalid Unicode code point.4490* @see Character#isValidCodePoint(int)4491*4492*/4493public static UnicodeScript of(int codePoint) {4494if (!isValidCodePoint(codePoint))4495throw new IllegalArgumentException();4496int type = getType(codePoint);4497// leave SURROGATE and PRIVATE_USE for table lookup4498if (type == UNASSIGNED)4499return UNKNOWN;4500int index = Arrays.binarySearch(scriptStarts, codePoint);4501if (index < 0)4502index = -index - 2;4503return scripts[index];4504}45054506/**4507* Returns the UnicodeScript constant with the given Unicode script4508* name or the script name alias. Script names and their aliases are4509* determined by The Unicode Standard. The files Scripts<version>.txt4510* and PropertyValueAliases<version>.txt define script names4511* and the script name aliases for a particular version of the4512* standard. The {@link Character} class specifies the version of4513* the standard that it supports.4514* <p>4515* Character case is ignored for all of the valid script names.4516* The en_US locale's case mapping rules are used to provide4517* case-insensitive string comparisons for script name validation.4518* <p>4519*4520* @param scriptName A {@code UnicodeScript} name.4521* @return The {@code UnicodeScript} constant identified4522* by {@code scriptName}4523* @throws IllegalArgumentException if {@code scriptName} is an4524* invalid name4525* @throws NullPointerException if {@code scriptName} is null4526*/4527public static final UnicodeScript forName(String scriptName) {4528scriptName = scriptName.toUpperCase(Locale.ENGLISH);4529//.replace(' ', '_'));4530UnicodeScript sc = aliases.get(scriptName);4531if (sc != null)4532return sc;4533return valueOf(scriptName);4534}4535}45364537/**4538* The value of the {@code Character}.4539*4540* @serial4541*/4542private final char value;45434544/** use serialVersionUID from JDK 1.0.2 for interoperability */4545private static final long serialVersionUID = 3786198910865385080L;45464547/**4548* Constructs a newly allocated {@code Character} object that4549* represents the specified {@code char} value.4550*4551* @param value the value to be represented by the4552* {@code Character} object.4553*/4554public Character(char value) {4555this.value = value;4556}45574558private static class CharacterCache {4559private CharacterCache(){}45604561static final Character cache[] = new Character[127 + 1];45624563static {4564for (int i = 0; i < cache.length; i++)4565cache[i] = new Character((char)i);4566}4567}45684569/**4570* Returns a <tt>Character</tt> instance representing the specified4571* <tt>char</tt> value.4572* If a new <tt>Character</tt> instance is not required, this method4573* should generally be used in preference to the constructor4574* {@link #Character(char)}, as this method is likely to yield4575* significantly better space and time performance by caching4576* frequently requested values.4577*4578* This method will always cache values in the range {@code4579* '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may4580* cache other values outside of this range.4581*4582* @param c a char value.4583* @return a <tt>Character</tt> instance representing <tt>c</tt>.4584* @since 1.54585*/4586public static Character valueOf(char c) {4587if (c <= 127) { // must cache4588return CharacterCache.cache[(int)c];4589}4590return new Character(c);4591}45924593/**4594* Returns the value of this {@code Character} object.4595* @return the primitive {@code char} value represented by4596* this object.4597*/4598public char charValue() {4599return value;4600}46014602/**4603* Returns a hash code for this {@code Character}; equal to the result4604* of invoking {@code charValue()}.4605*4606* @return a hash code value for this {@code Character}4607*/4608@Override4609public int hashCode() {4610return Character.hashCode(value);4611}46124613/**4614* Returns a hash code for a {@code char} value; compatible with4615* {@code Character.hashCode()}.4616*4617* @since 1.84618*4619* @param value The {@code char} for which to return a hash code.4620* @return a hash code value for a {@code char} value.4621*/4622public static int hashCode(char value) {4623return (int)value;4624}46254626/**4627* Compares this object against the specified object.4628* The result is {@code true} if and only if the argument is not4629* {@code null} and is a {@code Character} object that4630* represents the same {@code char} value as this object.4631*4632* @param obj the object to compare with.4633* @return {@code true} if the objects are the same;4634* {@code false} otherwise.4635*/4636public boolean equals(Object obj) {4637if (obj instanceof Character) {4638return value == ((Character)obj).charValue();4639}4640return false;4641}46424643/**4644* Returns a {@code String} object representing this4645* {@code Character}'s value. The result is a string of4646* length 1 whose sole component is the primitive4647* {@code char} value represented by this4648* {@code Character} object.4649*4650* @return a string representation of this object.4651*/4652public String toString() {4653char buf[] = {value};4654return String.valueOf(buf);4655}46564657/**4658* Returns a {@code String} object representing the4659* specified {@code char}. The result is a string of length4660* 1 consisting solely of the specified {@code char}.4661*4662* @param c the {@code char} to be converted4663* @return the string representation of the specified {@code char}4664* @since 1.44665*/4666public static String toString(char c) {4667return String.valueOf(c);4668}46694670/**4671* Determines whether the specified code point is a valid4672* <a href="http://www.unicode.org/glossary/#code_point">4673* Unicode code point value</a>.4674*4675* @param codePoint the Unicode code point to be tested4676* @return {@code true} if the specified code point value is between4677* {@link #MIN_CODE_POINT} and4678* {@link #MAX_CODE_POINT} inclusive;4679* {@code false} otherwise.4680* @since 1.54681*/4682public static boolean isValidCodePoint(int codePoint) {4683// Optimized form of:4684// codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT4685int plane = codePoint >>> 16;4686return plane < ((MAX_CODE_POINT + 1) >>> 16);4687}46884689/**4690* Determines whether the specified character (Unicode code point)4691* is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>.4692* Such code points can be represented using a single {@code char}.4693*4694* @param codePoint the character (Unicode code point) to be tested4695* @return {@code true} if the specified code point is between4696* {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive;4697* {@code false} otherwise.4698* @since 1.74699*/4700public static boolean isBmpCodePoint(int codePoint) {4701return codePoint >>> 16 == 0;4702// Optimized form of:4703// codePoint >= MIN_VALUE && codePoint <= MAX_VALUE4704// We consistently use logical shift (>>>) to facilitate4705// additional runtime optimizations.4706}47074708/**4709* Determines whether the specified character (Unicode code point)4710* is in the <a href="#supplementary">supplementary character</a> range.4711*4712* @param codePoint the character (Unicode code point) to be tested4713* @return {@code true} if the specified code point is between4714* {@link #MIN_SUPPLEMENTARY_CODE_POINT} and4715* {@link #MAX_CODE_POINT} inclusive;4716* {@code false} otherwise.4717* @since 1.54718*/4719public static boolean isSupplementaryCodePoint(int codePoint) {4720return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT4721&& codePoint < MAX_CODE_POINT + 1;4722}47234724/**4725* Determines if the given {@code char} value is a4726* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">4727* Unicode high-surrogate code unit</a>4728* (also known as <i>leading-surrogate code unit</i>).4729*4730* <p>Such values do not represent characters by themselves,4731* but are used in the representation of4732* <a href="#supplementary">supplementary characters</a>4733* in the UTF-16 encoding.4734*4735* @param ch the {@code char} value to be tested.4736* @return {@code true} if the {@code char} value is between4737* {@link #MIN_HIGH_SURROGATE} and4738* {@link #MAX_HIGH_SURROGATE} inclusive;4739* {@code false} otherwise.4740* @see Character#isLowSurrogate(char)4741* @see Character.UnicodeBlock#of(int)4742* @since 1.54743*/4744public static boolean isHighSurrogate(char ch) {4745// Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE4746return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);4747}47484749/**4750* Determines if the given {@code char} value is a4751* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">4752* Unicode low-surrogate code unit</a>4753* (also known as <i>trailing-surrogate code unit</i>).4754*4755* <p>Such values do not represent characters by themselves,4756* but are used in the representation of4757* <a href="#supplementary">supplementary characters</a>4758* in the UTF-16 encoding.4759*4760* @param ch the {@code char} value to be tested.4761* @return {@code true} if the {@code char} value is between4762* {@link #MIN_LOW_SURROGATE} and4763* {@link #MAX_LOW_SURROGATE} inclusive;4764* {@code false} otherwise.4765* @see Character#isHighSurrogate(char)4766* @since 1.54767*/4768public static boolean isLowSurrogate(char ch) {4769return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);4770}47714772/**4773* Determines if the given {@code char} value is a Unicode4774* <i>surrogate code unit</i>.4775*4776* <p>Such values do not represent characters by themselves,4777* but are used in the representation of4778* <a href="#supplementary">supplementary characters</a>4779* in the UTF-16 encoding.4780*4781* <p>A char value is a surrogate code unit if and only if it is either4782* a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or4783* a {@linkplain #isHighSurrogate(char) high-surrogate code unit}.4784*4785* @param ch the {@code char} value to be tested.4786* @return {@code true} if the {@code char} value is between4787* {@link #MIN_SURROGATE} and4788* {@link #MAX_SURROGATE} inclusive;4789* {@code false} otherwise.4790* @since 1.74791*/4792public static boolean isSurrogate(char ch) {4793return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1);4794}47954796/**4797* Determines whether the specified pair of {@code char}4798* values is a valid4799* <a href="http://www.unicode.org/glossary/#surrogate_pair">4800* Unicode surrogate pair</a>.48014802* <p>This method is equivalent to the expression:4803* <blockquote><pre>{@code4804* isHighSurrogate(high) && isLowSurrogate(low)4805* }</pre></blockquote>4806*4807* @param high the high-surrogate code value to be tested4808* @param low the low-surrogate code value to be tested4809* @return {@code true} if the specified high and4810* low-surrogate code values represent a valid surrogate pair;4811* {@code false} otherwise.4812* @since 1.54813*/4814public static boolean isSurrogatePair(char high, char low) {4815return isHighSurrogate(high) && isLowSurrogate(low);4816}48174818/**4819* Determines the number of {@code char} values needed to4820* represent the specified character (Unicode code point). If the4821* specified character is equal to or greater than 0x10000, then4822* the method returns 2. Otherwise, the method returns 1.4823*4824* <p>This method doesn't validate the specified character to be a4825* valid Unicode code point. The caller must validate the4826* character value using {@link #isValidCodePoint(int) isValidCodePoint}4827* if necessary.4828*4829* @param codePoint the character (Unicode code point) to be tested.4830* @return 2 if the character is a valid supplementary character; 1 otherwise.4831* @see Character#isSupplementaryCodePoint(int)4832* @since 1.54833*/4834public static int charCount(int codePoint) {4835return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;4836}48374838/**4839* Converts the specified surrogate pair to its supplementary code4840* point value. This method does not validate the specified4841* surrogate pair. The caller must validate it using {@link4842* #isSurrogatePair(char, char) isSurrogatePair} if necessary.4843*4844* @param high the high-surrogate code unit4845* @param low the low-surrogate code unit4846* @return the supplementary code point composed from the4847* specified surrogate pair.4848* @since 1.54849*/4850public static int toCodePoint(char high, char low) {4851// Optimized form of:4852// return ((high - MIN_HIGH_SURROGATE) << 10)4853// + (low - MIN_LOW_SURROGATE)4854// + MIN_SUPPLEMENTARY_CODE_POINT;4855return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT4856- (MIN_HIGH_SURROGATE << 10)4857- MIN_LOW_SURROGATE);4858}48594860/**4861* Returns the code point at the given index of the4862* {@code CharSequence}. If the {@code char} value at4863* the given index in the {@code CharSequence} is in the4864* high-surrogate range, the following index is less than the4865* length of the {@code CharSequence}, and the4866* {@code char} value at the following index is in the4867* low-surrogate range, then the supplementary code point4868* corresponding to this surrogate pair is returned. Otherwise,4869* the {@code char} value at the given index is returned.4870*4871* @param seq a sequence of {@code char} values (Unicode code4872* units)4873* @param index the index to the {@code char} values (Unicode4874* code units) in {@code seq} to be converted4875* @return the Unicode code point at the given index4876* @exception NullPointerException if {@code seq} is null.4877* @exception IndexOutOfBoundsException if the value4878* {@code index} is negative or not less than4879* {@link CharSequence#length() seq.length()}.4880* @since 1.54881*/4882public static int codePointAt(CharSequence seq, int index) {4883char c1 = seq.charAt(index);4884if (isHighSurrogate(c1) && ++index < seq.length()) {4885char c2 = seq.charAt(index);4886if (isLowSurrogate(c2)) {4887return toCodePoint(c1, c2);4888}4889}4890return c1;4891}48924893/**4894* Returns the code point at the given index of the4895* {@code char} array. If the {@code char} value at4896* the given index in the {@code char} array is in the4897* high-surrogate range, the following index is less than the4898* length of the {@code char} array, and the4899* {@code char} value at the following index is in the4900* low-surrogate range, then the supplementary code point4901* corresponding to this surrogate pair is returned. Otherwise,4902* the {@code char} value at the given index is returned.4903*4904* @param a the {@code char} array4905* @param index the index to the {@code char} values (Unicode4906* code units) in the {@code char} array to be converted4907* @return the Unicode code point at the given index4908* @exception NullPointerException if {@code a} is null.4909* @exception IndexOutOfBoundsException if the value4910* {@code index} is negative or not less than4911* the length of the {@code char} array.4912* @since 1.54913*/4914public static int codePointAt(char[] a, int index) {4915return codePointAtImpl(a, index, a.length);4916}49174918/**4919* Returns the code point at the given index of the4920* {@code char} array, where only array elements with4921* {@code index} less than {@code limit} can be used. If4922* the {@code char} value at the given index in the4923* {@code char} array is in the high-surrogate range, the4924* following index is less than the {@code limit}, and the4925* {@code char} value at the following index is in the4926* low-surrogate range, then the supplementary code point4927* corresponding to this surrogate pair is returned. Otherwise,4928* the {@code char} value at the given index is returned.4929*4930* @param a the {@code char} array4931* @param index the index to the {@code char} values (Unicode4932* code units) in the {@code char} array to be converted4933* @param limit the index after the last array element that4934* can be used in the {@code char} array4935* @return the Unicode code point at the given index4936* @exception NullPointerException if {@code a} is null.4937* @exception IndexOutOfBoundsException if the {@code index}4938* argument is negative or not less than the {@code limit}4939* argument, or if the {@code limit} argument is negative or4940* greater than the length of the {@code char} array.4941* @since 1.54942*/4943public static int codePointAt(char[] a, int index, int limit) {4944if (index >= limit || limit < 0 || limit > a.length) {4945throw new IndexOutOfBoundsException();4946}4947return codePointAtImpl(a, index, limit);4948}49494950// throws ArrayIndexOutOfBoundsException if index out of bounds4951static int codePointAtImpl(char[] a, int index, int limit) {4952char c1 = a[index];4953if (isHighSurrogate(c1) && ++index < limit) {4954char c2 = a[index];4955if (isLowSurrogate(c2)) {4956return toCodePoint(c1, c2);4957}4958}4959return c1;4960}49614962/**4963* Returns the code point preceding the given index of the4964* {@code CharSequence}. If the {@code char} value at4965* {@code (index - 1)} in the {@code CharSequence} is in4966* the low-surrogate range, {@code (index - 2)} is not4967* negative, and the {@code char} value at {@code (index - 2)}4968* in the {@code CharSequence} is in the4969* high-surrogate range, then the supplementary code point4970* corresponding to this surrogate pair is returned. Otherwise,4971* the {@code char} value at {@code (index - 1)} is4972* returned.4973*4974* @param seq the {@code CharSequence} instance4975* @param index the index following the code point that should be returned4976* @return the Unicode code point value before the given index.4977* @exception NullPointerException if {@code seq} is null.4978* @exception IndexOutOfBoundsException if the {@code index}4979* argument is less than 1 or greater than {@link4980* CharSequence#length() seq.length()}.4981* @since 1.54982*/4983public static int codePointBefore(CharSequence seq, int index) {4984char c2 = seq.charAt(--index);4985if (isLowSurrogate(c2) && index > 0) {4986char c1 = seq.charAt(--index);4987if (isHighSurrogate(c1)) {4988return toCodePoint(c1, c2);4989}4990}4991return c2;4992}49934994/**4995* Returns the code point preceding the given index of the4996* {@code char} array. If the {@code char} value at4997* {@code (index - 1)} in the {@code char} array is in4998* the low-surrogate range, {@code (index - 2)} is not4999* negative, and the {@code char} value at {@code (index - 2)}5000* in the {@code char} array is in the5001* high-surrogate range, then the supplementary code point5002* corresponding to this surrogate pair is returned. Otherwise,5003* the {@code char} value at {@code (index - 1)} is5004* returned.5005*5006* @param a the {@code char} array5007* @param index the index following the code point that should be returned5008* @return the Unicode code point value before the given index.5009* @exception NullPointerException if {@code a} is null.5010* @exception IndexOutOfBoundsException if the {@code index}5011* argument is less than 1 or greater than the length of the5012* {@code char} array5013* @since 1.55014*/5015public static int codePointBefore(char[] a, int index) {5016return codePointBeforeImpl(a, index, 0);5017}50185019/**5020* Returns the code point preceding the given index of the5021* {@code char} array, where only array elements with5022* {@code index} greater than or equal to {@code start}5023* can be used. If the {@code char} value at {@code (index - 1)}5024* in the {@code char} array is in the5025* low-surrogate range, {@code (index - 2)} is not less than5026* {@code start}, and the {@code char} value at5027* {@code (index - 2)} in the {@code char} array is in5028* the high-surrogate range, then the supplementary code point5029* corresponding to this surrogate pair is returned. Otherwise,5030* the {@code char} value at {@code (index - 1)} is5031* returned.5032*5033* @param a the {@code char} array5034* @param index the index following the code point that should be returned5035* @param start the index of the first array element in the5036* {@code char} array5037* @return the Unicode code point value before the given index.5038* @exception NullPointerException if {@code a} is null.5039* @exception IndexOutOfBoundsException if the {@code index}5040* argument is not greater than the {@code start} argument or5041* is greater than the length of the {@code char} array, or5042* if the {@code start} argument is negative or not less than5043* the length of the {@code char} array.5044* @since 1.55045*/5046public static int codePointBefore(char[] a, int index, int start) {5047if (index <= start || start < 0 || start >= a.length) {5048throw new IndexOutOfBoundsException();5049}5050return codePointBeforeImpl(a, index, start);5051}50525053// throws ArrayIndexOutOfBoundsException if index-1 out of bounds5054static int codePointBeforeImpl(char[] a, int index, int start) {5055char c2 = a[--index];5056if (isLowSurrogate(c2) && index > start) {5057char c1 = a[--index];5058if (isHighSurrogate(c1)) {5059return toCodePoint(c1, c2);5060}5061}5062return c2;5063}50645065/**5066* Returns the leading surrogate (a5067* <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit">5068* high surrogate code unit</a>) of the5069* <a href="http://www.unicode.org/glossary/#surrogate_pair">5070* surrogate pair</a>5071* representing the specified supplementary character (Unicode5072* code point) in the UTF-16 encoding. If the specified character5073* is not a5074* <a href="Character.html#supplementary">supplementary character</a>,5075* an unspecified {@code char} is returned.5076*5077* <p>If5078* {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}5079* is {@code true}, then5080* {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and5081* {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x}5082* are also always {@code true}.5083*5084* @param codePoint a supplementary character (Unicode code point)5085* @return the leading surrogate code unit used to represent the5086* character in the UTF-16 encoding5087* @since 1.75088*/5089public static char highSurrogate(int codePoint) {5090return (char) ((codePoint >>> 10)5091+ (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));5092}50935094/**5095* Returns the trailing surrogate (a5096* <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit">5097* low surrogate code unit</a>) of the5098* <a href="http://www.unicode.org/glossary/#surrogate_pair">5099* surrogate pair</a>5100* representing the specified supplementary character (Unicode5101* code point) in the UTF-16 encoding. If the specified character5102* is not a5103* <a href="Character.html#supplementary">supplementary character</a>,5104* an unspecified {@code char} is returned.5105*5106* <p>If5107* {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)}5108* is {@code true}, then5109* {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and5110* {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x}5111* are also always {@code true}.5112*5113* @param codePoint a supplementary character (Unicode code point)5114* @return the trailing surrogate code unit used to represent the5115* character in the UTF-16 encoding5116* @since 1.75117*/5118public static char lowSurrogate(int codePoint) {5119return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);5120}51215122/**5123* Converts the specified character (Unicode code point) to its5124* UTF-16 representation. If the specified code point is a BMP5125* (Basic Multilingual Plane or Plane 0) value, the same value is5126* stored in {@code dst[dstIndex]}, and 1 is returned. If the5127* specified code point is a supplementary character, its5128* surrogate values are stored in {@code dst[dstIndex]}5129* (high-surrogate) and {@code dst[dstIndex+1]}5130* (low-surrogate), and 2 is returned.5131*5132* @param codePoint the character (Unicode code point) to be converted.5133* @param dst an array of {@code char} in which the5134* {@code codePoint}'s UTF-16 value is stored.5135* @param dstIndex the start index into the {@code dst}5136* array where the converted value is stored.5137* @return 1 if the code point is a BMP code point, 2 if the5138* code point is a supplementary code point.5139* @exception IllegalArgumentException if the specified5140* {@code codePoint} is not a valid Unicode code point.5141* @exception NullPointerException if the specified {@code dst} is null.5142* @exception IndexOutOfBoundsException if {@code dstIndex}5143* is negative or not less than {@code dst.length}, or if5144* {@code dst} at {@code dstIndex} doesn't have enough5145* array element(s) to store the resulting {@code char}5146* value(s). (If {@code dstIndex} is equal to5147* {@code dst.length-1} and the specified5148* {@code codePoint} is a supplementary character, the5149* high-surrogate value is not stored in5150* {@code dst[dstIndex]}.)5151* @since 1.55152*/5153public static int toChars(int codePoint, char[] dst, int dstIndex) {5154if (isBmpCodePoint(codePoint)) {5155dst[dstIndex] = (char) codePoint;5156return 1;5157} else if (isValidCodePoint(codePoint)) {5158toSurrogates(codePoint, dst, dstIndex);5159return 2;5160} else {5161throw new IllegalArgumentException();5162}5163}51645165/**5166* Converts the specified character (Unicode code point) to its5167* UTF-16 representation stored in a {@code char} array. If5168* the specified code point is a BMP (Basic Multilingual Plane or5169* Plane 0) value, the resulting {@code char} array has5170* the same value as {@code codePoint}. If the specified code5171* point is a supplementary code point, the resulting5172* {@code char} array has the corresponding surrogate pair.5173*5174* @param codePoint a Unicode code point5175* @return a {@code char} array having5176* {@code codePoint}'s UTF-16 representation.5177* @exception IllegalArgumentException if the specified5178* {@code codePoint} is not a valid Unicode code point.5179* @since 1.55180*/5181public static char[] toChars(int codePoint) {5182if (isBmpCodePoint(codePoint)) {5183return new char[] { (char) codePoint };5184} else if (isValidCodePoint(codePoint)) {5185char[] result = new char[2];5186toSurrogates(codePoint, result, 0);5187return result;5188} else {5189throw new IllegalArgumentException();5190}5191}51925193static void toSurrogates(int codePoint, char[] dst, int index) {5194// We write elements "backwards" to guarantee all-or-nothing5195dst[index+1] = lowSurrogate(codePoint);5196dst[index] = highSurrogate(codePoint);5197}51985199/**5200* Returns the number of Unicode code points in the text range of5201* the specified char sequence. The text range begins at the5202* specified {@code beginIndex} and extends to the5203* {@code char} at index {@code endIndex - 1}. Thus the5204* length (in {@code char}s) of the text range is5205* {@code endIndex-beginIndex}. Unpaired surrogates within5206* the text range count as one code point each.5207*5208* @param seq the char sequence5209* @param beginIndex the index to the first {@code char} of5210* the text range.5211* @param endIndex the index after the last {@code char} of5212* the text range.5213* @return the number of Unicode code points in the specified text5214* range5215* @exception NullPointerException if {@code seq} is null.5216* @exception IndexOutOfBoundsException if the5217* {@code beginIndex} is negative, or {@code endIndex}5218* is larger than the length of the given sequence, or5219* {@code beginIndex} is larger than {@code endIndex}.5220* @since 1.55221*/5222public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) {5223int length = seq.length();5224if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) {5225throw new IndexOutOfBoundsException();5226}5227int n = endIndex - beginIndex;5228for (int i = beginIndex; i < endIndex; ) {5229if (isHighSurrogate(seq.charAt(i++)) && i < endIndex &&5230isLowSurrogate(seq.charAt(i))) {5231n--;5232i++;5233}5234}5235return n;5236}52375238/**5239* Returns the number of Unicode code points in a subarray of the5240* {@code char} array argument. The {@code offset}5241* argument is the index of the first {@code char} of the5242* subarray and the {@code count} argument specifies the5243* length of the subarray in {@code char}s. Unpaired5244* surrogates within the subarray count as one code point each.5245*5246* @param a the {@code char} array5247* @param offset the index of the first {@code char} in the5248* given {@code char} array5249* @param count the length of the subarray in {@code char}s5250* @return the number of Unicode code points in the specified subarray5251* @exception NullPointerException if {@code a} is null.5252* @exception IndexOutOfBoundsException if {@code offset} or5253* {@code count} is negative, or if {@code offset +5254* count} is larger than the length of the given array.5255* @since 1.55256*/5257public static int codePointCount(char[] a, int offset, int count) {5258if (count > a.length - offset || offset < 0 || count < 0) {5259throw new IndexOutOfBoundsException();5260}5261return codePointCountImpl(a, offset, count);5262}52635264static int codePointCountImpl(char[] a, int offset, int count) {5265int endIndex = offset + count;5266int n = count;5267for (int i = offset; i < endIndex; ) {5268if (isHighSurrogate(a[i++]) && i < endIndex &&5269isLowSurrogate(a[i])) {5270n--;5271i++;5272}5273}5274return n;5275}52765277/**5278* Returns the index within the given char sequence that is offset5279* from the given {@code index} by {@code codePointOffset}5280* code points. Unpaired surrogates within the text range given by5281* {@code index} and {@code codePointOffset} count as5282* one code point each.5283*5284* @param seq the char sequence5285* @param index the index to be offset5286* @param codePointOffset the offset in code points5287* @return the index within the char sequence5288* @exception NullPointerException if {@code seq} is null.5289* @exception IndexOutOfBoundsException if {@code index}5290* is negative or larger then the length of the char sequence,5291* or if {@code codePointOffset} is positive and the5292* subsequence starting with {@code index} has fewer than5293* {@code codePointOffset} code points, or if5294* {@code codePointOffset} is negative and the subsequence5295* before {@code index} has fewer than the absolute value5296* of {@code codePointOffset} code points.5297* @since 1.55298*/5299public static int offsetByCodePoints(CharSequence seq, int index,5300int codePointOffset) {5301int length = seq.length();5302if (index < 0 || index > length) {5303throw new IndexOutOfBoundsException();5304}53055306int x = index;5307if (codePointOffset >= 0) {5308int i;5309for (i = 0; x < length && i < codePointOffset; i++) {5310if (isHighSurrogate(seq.charAt(x++)) && x < length &&5311isLowSurrogate(seq.charAt(x))) {5312x++;5313}5314}5315if (i < codePointOffset) {5316throw new IndexOutOfBoundsException();5317}5318} else {5319int i;5320for (i = codePointOffset; x > 0 && i < 0; i++) {5321if (isLowSurrogate(seq.charAt(--x)) && x > 0 &&5322isHighSurrogate(seq.charAt(x-1))) {5323x--;5324}5325}5326if (i < 0) {5327throw new IndexOutOfBoundsException();5328}5329}5330return x;5331}53325333/**5334* Returns the index within the given {@code char} subarray5335* that is offset from the given {@code index} by5336* {@code codePointOffset} code points. The5337* {@code start} and {@code count} arguments specify a5338* subarray of the {@code char} array. Unpaired surrogates5339* within the text range given by {@code index} and5340* {@code codePointOffset} count as one code point each.5341*5342* @param a the {@code char} array5343* @param start the index of the first {@code char} of the5344* subarray5345* @param count the length of the subarray in {@code char}s5346* @param index the index to be offset5347* @param codePointOffset the offset in code points5348* @return the index within the subarray5349* @exception NullPointerException if {@code a} is null.5350* @exception IndexOutOfBoundsException5351* if {@code start} or {@code count} is negative,5352* or if {@code start + count} is larger than the length of5353* the given array,5354* or if {@code index} is less than {@code start} or5355* larger then {@code start + count},5356* or if {@code codePointOffset} is positive and the text range5357* starting with {@code index} and ending with {@code start + count - 1}5358* has fewer than {@code codePointOffset} code5359* points,5360* or if {@code codePointOffset} is negative and the text range5361* starting with {@code start} and ending with {@code index - 1}5362* has fewer than the absolute value of5363* {@code codePointOffset} code points.5364* @since 1.55365*/5366public static int offsetByCodePoints(char[] a, int start, int count,5367int index, int codePointOffset) {5368if (count > a.length-start || start < 0 || count < 05369|| index < start || index > start+count) {5370throw new IndexOutOfBoundsException();5371}5372return offsetByCodePointsImpl(a, start, count, index, codePointOffset);5373}53745375static int offsetByCodePointsImpl(char[]a, int start, int count,5376int index, int codePointOffset) {5377int x = index;5378if (codePointOffset >= 0) {5379int limit = start + count;5380int i;5381for (i = 0; x < limit && i < codePointOffset; i++) {5382if (isHighSurrogate(a[x++]) && x < limit &&5383isLowSurrogate(a[x])) {5384x++;5385}5386}5387if (i < codePointOffset) {5388throw new IndexOutOfBoundsException();5389}5390} else {5391int i;5392for (i = codePointOffset; x > start && i < 0; i++) {5393if (isLowSurrogate(a[--x]) && x > start &&5394isHighSurrogate(a[x-1])) {5395x--;5396}5397}5398if (i < 0) {5399throw new IndexOutOfBoundsException();5400}5401}5402return x;5403}54045405/**5406* Determines if the specified character is a lowercase character.5407* <p>5408* A character is lowercase if its general category type, provided5409* by {@code Character.getType(ch)}, is5410* {@code LOWERCASE_LETTER}, or it has contributory property5411* Other_Lowercase as defined by the Unicode Standard.5412* <p>5413* The following are examples of lowercase characters:5414* <blockquote><pre>5415* a b c d e f g h i j k l m n o p q r s t u v w x y z5416* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'5417* '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE'5418* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'5419* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'5420* </pre></blockquote>5421* <p> Many other Unicode characters are lowercase too.5422*5423* <p><b>Note:</b> This method cannot handle <a5424* href="#supplementary"> supplementary characters</a>. To support5425* all Unicode characters, including supplementary characters, use5426* the {@link #isLowerCase(int)} method.5427*5428* @param ch the character to be tested.5429* @return {@code true} if the character is lowercase;5430* {@code false} otherwise.5431* @see Character#isLowerCase(char)5432* @see Character#isTitleCase(char)5433* @see Character#toLowerCase(char)5434* @see Character#getType(char)5435*/5436public static boolean isLowerCase(char ch) {5437return isLowerCase((int)ch);5438}54395440/**5441* Determines if the specified character (Unicode code point) is a5442* lowercase character.5443* <p>5444* A character is lowercase if its general category type, provided5445* by {@link Character#getType getType(codePoint)}, is5446* {@code LOWERCASE_LETTER}, or it has contributory property5447* Other_Lowercase as defined by the Unicode Standard.5448* <p>5449* The following are examples of lowercase characters:5450* <blockquote><pre>5451* a b c d e f g h i j k l m n o p q r s t u v w x y z5452* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'5453* '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE'5454* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'5455* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'5456* </pre></blockquote>5457* <p> Many other Unicode characters are lowercase too.5458*5459* @param codePoint the character (Unicode code point) to be tested.5460* @return {@code true} if the character is lowercase;5461* {@code false} otherwise.5462* @see Character#isLowerCase(int)5463* @see Character#isTitleCase(int)5464* @see Character#toLowerCase(int)5465* @see Character#getType(int)5466* @since 1.55467*/5468public static boolean isLowerCase(int codePoint) {5469return getType(codePoint) == Character.LOWERCASE_LETTER ||5470CharacterData.of(codePoint).isOtherLowercase(codePoint);5471}54725473/**5474* Determines if the specified character is an uppercase character.5475* <p>5476* A character is uppercase if its general category type, provided by5477* {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}.5478* or it has contributory property Other_Uppercase as defined by the Unicode Standard.5479* <p>5480* The following are examples of uppercase characters:5481* <blockquote><pre>5482* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z5483* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'5484* '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF'5485* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'5486* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'5487* </pre></blockquote>5488* <p> Many other Unicode characters are uppercase too.5489*5490* <p><b>Note:</b> This method cannot handle <a5491* href="#supplementary"> supplementary characters</a>. To support5492* all Unicode characters, including supplementary characters, use5493* the {@link #isUpperCase(int)} method.5494*5495* @param ch the character to be tested.5496* @return {@code true} if the character is uppercase;5497* {@code false} otherwise.5498* @see Character#isLowerCase(char)5499* @see Character#isTitleCase(char)5500* @see Character#toUpperCase(char)5501* @see Character#getType(char)5502* @since 1.05503*/5504public static boolean isUpperCase(char ch) {5505return isUpperCase((int)ch);5506}55075508/**5509* Determines if the specified character (Unicode code point) is an uppercase character.5510* <p>5511* A character is uppercase if its general category type, provided by5512* {@link Character#getType(int) getType(codePoint)}, is {@code UPPERCASE_LETTER},5513* or it has contributory property Other_Uppercase as defined by the Unicode Standard.5514* <p>5515* The following are examples of uppercase characters:5516* <blockquote><pre>5517* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z5518* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'5519* '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF'5520* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'5521* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'5522* </pre></blockquote>5523* <p> Many other Unicode characters are uppercase too.<p>5524*5525* @param codePoint the character (Unicode code point) to be tested.5526* @return {@code true} if the character is uppercase;5527* {@code false} otherwise.5528* @see Character#isLowerCase(int)5529* @see Character#isTitleCase(int)5530* @see Character#toUpperCase(int)5531* @see Character#getType(int)5532* @since 1.55533*/5534public static boolean isUpperCase(int codePoint) {5535return getType(codePoint) == Character.UPPERCASE_LETTER ||5536CharacterData.of(codePoint).isOtherUppercase(codePoint);5537}55385539/**5540* Determines if the specified character is a titlecase character.5541* <p>5542* A character is a titlecase character if its general5543* category type, provided by {@code Character.getType(ch)},5544* is {@code TITLECASE_LETTER}.5545* <p>5546* Some characters look like pairs of Latin letters. For example, there5547* is an uppercase letter that looks like "LJ" and has a corresponding5548* lowercase letter that looks like "lj". A third form, which looks like "Lj",5549* is the appropriate form to use when rendering a word in lowercase5550* with initial capitals, as for a book title.5551* <p>5552* These are some of the Unicode characters for which this method returns5553* {@code true}:5554* <ul>5555* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}5556* <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}5557* <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}5558* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}5559* </ul>5560* <p> Many other Unicode characters are titlecase too.5561*5562* <p><b>Note:</b> This method cannot handle <a5563* href="#supplementary"> supplementary characters</a>. To support5564* all Unicode characters, including supplementary characters, use5565* the {@link #isTitleCase(int)} method.5566*5567* @param ch the character to be tested.5568* @return {@code true} if the character is titlecase;5569* {@code false} otherwise.5570* @see Character#isLowerCase(char)5571* @see Character#isUpperCase(char)5572* @see Character#toTitleCase(char)5573* @see Character#getType(char)5574* @since 1.0.25575*/5576public static boolean isTitleCase(char ch) {5577return isTitleCase((int)ch);5578}55795580/**5581* Determines if the specified character (Unicode code point) is a titlecase character.5582* <p>5583* A character is a titlecase character if its general5584* category type, provided by {@link Character#getType(int) getType(codePoint)},5585* is {@code TITLECASE_LETTER}.5586* <p>5587* Some characters look like pairs of Latin letters. For example, there5588* is an uppercase letter that looks like "LJ" and has a corresponding5589* lowercase letter that looks like "lj". A third form, which looks like "Lj",5590* is the appropriate form to use when rendering a word in lowercase5591* with initial capitals, as for a book title.5592* <p>5593* These are some of the Unicode characters for which this method returns5594* {@code true}:5595* <ul>5596* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}5597* <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J}5598* <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J}5599* <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z}5600* </ul>5601* <p> Many other Unicode characters are titlecase too.<p>5602*5603* @param codePoint the character (Unicode code point) to be tested.5604* @return {@code true} if the character is titlecase;5605* {@code false} otherwise.5606* @see Character#isLowerCase(int)5607* @see Character#isUpperCase(int)5608* @see Character#toTitleCase(int)5609* @see Character#getType(int)5610* @since 1.55611*/5612public static boolean isTitleCase(int codePoint) {5613return getType(codePoint) == Character.TITLECASE_LETTER;5614}56155616/**5617* Determines if the specified character is a digit.5618* <p>5619* A character is a digit if its general category type, provided5620* by {@code Character.getType(ch)}, is5621* {@code DECIMAL_DIGIT_NUMBER}.5622* <p>5623* Some Unicode character ranges that contain digits:5624* <ul>5625* <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},5626* ISO-LATIN-1 digits ({@code '0'} through {@code '9'})5627* <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},5628* Arabic-Indic digits5629* <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},5630* Extended Arabic-Indic digits5631* <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},5632* Devanagari digits5633* <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},5634* Fullwidth digits5635* </ul>5636*5637* Many other character ranges contain digits as well.5638*5639* <p><b>Note:</b> This method cannot handle <a5640* href="#supplementary"> supplementary characters</a>. To support5641* all Unicode characters, including supplementary characters, use5642* the {@link #isDigit(int)} method.5643*5644* @param ch the character to be tested.5645* @return {@code true} if the character is a digit;5646* {@code false} otherwise.5647* @see Character#digit(char, int)5648* @see Character#forDigit(int, int)5649* @see Character#getType(char)5650*/5651public static boolean isDigit(char ch) {5652return isDigit((int)ch);5653}56545655/**5656* Determines if the specified character (Unicode code point) is a digit.5657* <p>5658* A character is a digit if its general category type, provided5659* by {@link Character#getType(int) getType(codePoint)}, is5660* {@code DECIMAL_DIGIT_NUMBER}.5661* <p>5662* Some Unicode character ranges that contain digits:5663* <ul>5664* <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'},5665* ISO-LATIN-1 digits ({@code '0'} through {@code '9'})5666* <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'},5667* Arabic-Indic digits5668* <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'},5669* Extended Arabic-Indic digits5670* <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'},5671* Devanagari digits5672* <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'},5673* Fullwidth digits5674* </ul>5675*5676* Many other character ranges contain digits as well.5677*5678* @param codePoint the character (Unicode code point) to be tested.5679* @return {@code true} if the character is a digit;5680* {@code false} otherwise.5681* @see Character#forDigit(int, int)5682* @see Character#getType(int)5683* @since 1.55684*/5685public static boolean isDigit(int codePoint) {5686return getType(codePoint) == Character.DECIMAL_DIGIT_NUMBER;5687}56885689/**5690* Determines if a character is defined in Unicode.5691* <p>5692* A character is defined if at least one of the following is true:5693* <ul>5694* <li>It has an entry in the UnicodeData file.5695* <li>It has a value in a range defined by the UnicodeData file.5696* </ul>5697*5698* <p><b>Note:</b> This method cannot handle <a5699* href="#supplementary"> supplementary characters</a>. To support5700* all Unicode characters, including supplementary characters, use5701* the {@link #isDefined(int)} method.5702*5703* @param ch the character to be tested5704* @return {@code true} if the character has a defined meaning5705* in Unicode; {@code false} otherwise.5706* @see Character#isDigit(char)5707* @see Character#isLetter(char)5708* @see Character#isLetterOrDigit(char)5709* @see Character#isLowerCase(char)5710* @see Character#isTitleCase(char)5711* @see Character#isUpperCase(char)5712* @since 1.0.25713*/5714public static boolean isDefined(char ch) {5715return isDefined((int)ch);5716}57175718/**5719* Determines if a character (Unicode code point) is defined in Unicode.5720* <p>5721* A character is defined if at least one of the following is true:5722* <ul>5723* <li>It has an entry in the UnicodeData file.5724* <li>It has a value in a range defined by the UnicodeData file.5725* </ul>5726*5727* @param codePoint the character (Unicode code point) to be tested.5728* @return {@code true} if the character has a defined meaning5729* in Unicode; {@code false} otherwise.5730* @see Character#isDigit(int)5731* @see Character#isLetter(int)5732* @see Character#isLetterOrDigit(int)5733* @see Character#isLowerCase(int)5734* @see Character#isTitleCase(int)5735* @see Character#isUpperCase(int)5736* @since 1.55737*/5738public static boolean isDefined(int codePoint) {5739return getType(codePoint) != Character.UNASSIGNED;5740}57415742/**5743* Determines if the specified character is a letter.5744* <p>5745* A character is considered to be a letter if its general5746* category type, provided by {@code Character.getType(ch)},5747* is any of the following:5748* <ul>5749* <li> {@code UPPERCASE_LETTER}5750* <li> {@code LOWERCASE_LETTER}5751* <li> {@code TITLECASE_LETTER}5752* <li> {@code MODIFIER_LETTER}5753* <li> {@code OTHER_LETTER}5754* </ul>5755*5756* Not all letters have case. Many characters are5757* letters but are neither uppercase nor lowercase nor titlecase.5758*5759* <p><b>Note:</b> This method cannot handle <a5760* href="#supplementary"> supplementary characters</a>. To support5761* all Unicode characters, including supplementary characters, use5762* the {@link #isLetter(int)} method.5763*5764* @param ch the character to be tested.5765* @return {@code true} if the character is a letter;5766* {@code false} otherwise.5767* @see Character#isDigit(char)5768* @see Character#isJavaIdentifierStart(char)5769* @see Character#isJavaLetter(char)5770* @see Character#isJavaLetterOrDigit(char)5771* @see Character#isLetterOrDigit(char)5772* @see Character#isLowerCase(char)5773* @see Character#isTitleCase(char)5774* @see Character#isUnicodeIdentifierStart(char)5775* @see Character#isUpperCase(char)5776*/5777public static boolean isLetter(char ch) {5778return isLetter((int)ch);5779}57805781/**5782* Determines if the specified character (Unicode code point) is a letter.5783* <p>5784* A character is considered to be a letter if its general5785* category type, provided by {@link Character#getType(int) getType(codePoint)},5786* is any of the following:5787* <ul>5788* <li> {@code UPPERCASE_LETTER}5789* <li> {@code LOWERCASE_LETTER}5790* <li> {@code TITLECASE_LETTER}5791* <li> {@code MODIFIER_LETTER}5792* <li> {@code OTHER_LETTER}5793* </ul>5794*5795* Not all letters have case. Many characters are5796* letters but are neither uppercase nor lowercase nor titlecase.5797*5798* @param codePoint the character (Unicode code point) to be tested.5799* @return {@code true} if the character is a letter;5800* {@code false} otherwise.5801* @see Character#isDigit(int)5802* @see Character#isJavaIdentifierStart(int)5803* @see Character#isLetterOrDigit(int)5804* @see Character#isLowerCase(int)5805* @see Character#isTitleCase(int)5806* @see Character#isUnicodeIdentifierStart(int)5807* @see Character#isUpperCase(int)5808* @since 1.55809*/5810public static boolean isLetter(int codePoint) {5811return ((((1 << Character.UPPERCASE_LETTER) |5812(1 << Character.LOWERCASE_LETTER) |5813(1 << Character.TITLECASE_LETTER) |5814(1 << Character.MODIFIER_LETTER) |5815(1 << Character.OTHER_LETTER)) >> getType(codePoint)) & 1)5816!= 0;5817}58185819/**5820* Determines if the specified character is a letter or digit.5821* <p>5822* A character is considered to be a letter or digit if either5823* {@code Character.isLetter(char ch)} or5824* {@code Character.isDigit(char ch)} returns5825* {@code true} for the character.5826*5827* <p><b>Note:</b> This method cannot handle <a5828* href="#supplementary"> supplementary characters</a>. To support5829* all Unicode characters, including supplementary characters, use5830* the {@link #isLetterOrDigit(int)} method.5831*5832* @param ch the character to be tested.5833* @return {@code true} if the character is a letter or digit;5834* {@code false} otherwise.5835* @see Character#isDigit(char)5836* @see Character#isJavaIdentifierPart(char)5837* @see Character#isJavaLetter(char)5838* @see Character#isJavaLetterOrDigit(char)5839* @see Character#isLetter(char)5840* @see Character#isUnicodeIdentifierPart(char)5841* @since 1.0.25842*/5843public static boolean isLetterOrDigit(char ch) {5844return isLetterOrDigit((int)ch);5845}58465847/**5848* Determines if the specified character (Unicode code point) is a letter or digit.5849* <p>5850* A character is considered to be a letter or digit if either5851* {@link #isLetter(int) isLetter(codePoint)} or5852* {@link #isDigit(int) isDigit(codePoint)} returns5853* {@code true} for the character.5854*5855* @param codePoint the character (Unicode code point) to be tested.5856* @return {@code true} if the character is a letter or digit;5857* {@code false} otherwise.5858* @see Character#isDigit(int)5859* @see Character#isJavaIdentifierPart(int)5860* @see Character#isLetter(int)5861* @see Character#isUnicodeIdentifierPart(int)5862* @since 1.55863*/5864public static boolean isLetterOrDigit(int codePoint) {5865return ((((1 << Character.UPPERCASE_LETTER) |5866(1 << Character.LOWERCASE_LETTER) |5867(1 << Character.TITLECASE_LETTER) |5868(1 << Character.MODIFIER_LETTER) |5869(1 << Character.OTHER_LETTER) |5870(1 << Character.DECIMAL_DIGIT_NUMBER)) >> getType(codePoint)) & 1)5871!= 0;5872}58735874/**5875* Determines if the specified character is permissible as the first5876* character in a Java identifier.5877* <p>5878* A character may start a Java identifier if and only if5879* one of the following conditions is true:5880* <ul>5881* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}5882* <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}5883* <li> {@code ch} is a currency symbol (such as {@code '$'})5884* <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).5885* </ul>5886*5887* These conditions are tested against the character information from version5888* 6.2 of the Unicode Standard.5889*5890* @param ch the character to be tested.5891* @return {@code true} if the character may start a Java5892* identifier; {@code false} otherwise.5893* @see Character#isJavaLetterOrDigit(char)5894* @see Character#isJavaIdentifierStart(char)5895* @see Character#isJavaIdentifierPart(char)5896* @see Character#isLetter(char)5897* @see Character#isLetterOrDigit(char)5898* @see Character#isUnicodeIdentifierStart(char)5899* @since 1.025900* @deprecated Replaced by isJavaIdentifierStart(char).5901*/5902@Deprecated5903public static boolean isJavaLetter(char ch) {5904return isJavaIdentifierStart(ch);5905}59065907/**5908* Determines if the specified character may be part of a Java5909* identifier as other than the first character.5910* <p>5911* A character may be part of a Java identifier if and only if any5912* of the following conditions are true:5913* <ul>5914* <li> it is a letter5915* <li> it is a currency symbol (such as {@code '$'})5916* <li> it is a connecting punctuation character (such as {@code '_'})5917* <li> it is a digit5918* <li> it is a numeric letter (such as a Roman numeral character)5919* <li> it is a combining mark5920* <li> it is a non-spacing mark5921* <li> {@code isIdentifierIgnorable} returns5922* {@code true} for the character.5923* </ul>5924*5925* These conditions are tested against the character information from version5926* 6.2 of the Unicode Standard.5927*5928* @param ch the character to be tested.5929* @return {@code true} if the character may be part of a5930* Java identifier; {@code false} otherwise.5931* @see Character#isJavaLetter(char)5932* @see Character#isJavaIdentifierStart(char)5933* @see Character#isJavaIdentifierPart(char)5934* @see Character#isLetter(char)5935* @see Character#isLetterOrDigit(char)5936* @see Character#isUnicodeIdentifierPart(char)5937* @see Character#isIdentifierIgnorable(char)5938* @since 1.025939* @deprecated Replaced by isJavaIdentifierPart(char).5940*/5941@Deprecated5942public static boolean isJavaLetterOrDigit(char ch) {5943return isJavaIdentifierPart(ch);5944}59455946/**5947* Determines if the specified character (Unicode code point) is an alphabet.5948* <p>5949* A character is considered to be alphabetic if its general category type,5950* provided by {@link Character#getType(int) getType(codePoint)}, is any of5951* the following:5952* <ul>5953* <li> <code>UPPERCASE_LETTER</code>5954* <li> <code>LOWERCASE_LETTER</code>5955* <li> <code>TITLECASE_LETTER</code>5956* <li> <code>MODIFIER_LETTER</code>5957* <li> <code>OTHER_LETTER</code>5958* <li> <code>LETTER_NUMBER</code>5959* </ul>5960* or it has contributory property Other_Alphabetic as defined by the5961* Unicode Standard.5962*5963* @param codePoint the character (Unicode code point) to be tested.5964* @return <code>true</code> if the character is a Unicode alphabet5965* character, <code>false</code> otherwise.5966* @since 1.75967*/5968public static boolean isAlphabetic(int codePoint) {5969return (((((1 << Character.UPPERCASE_LETTER) |5970(1 << Character.LOWERCASE_LETTER) |5971(1 << Character.TITLECASE_LETTER) |5972(1 << Character.MODIFIER_LETTER) |5973(1 << Character.OTHER_LETTER) |5974(1 << Character.LETTER_NUMBER)) >> getType(codePoint)) & 1) != 0) ||5975CharacterData.of(codePoint).isOtherAlphabetic(codePoint);5976}59775978/**5979* Determines if the specified character (Unicode code point) is a CJKV5980* (Chinese, Japanese, Korean and Vietnamese) ideograph, as defined by5981* the Unicode Standard.5982*5983* @param codePoint the character (Unicode code point) to be tested.5984* @return <code>true</code> if the character is a Unicode ideograph5985* character, <code>false</code> otherwise.5986* @since 1.75987*/5988public static boolean isIdeographic(int codePoint) {5989return CharacterData.of(codePoint).isIdeographic(codePoint);5990}59915992/**5993* Determines if the specified character is5994* permissible as the first character in a Java identifier.5995* <p>5996* A character may start a Java identifier if and only if5997* one of the following conditions is true:5998* <ul>5999* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}6000* <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}6001* <li> {@code ch} is a currency symbol (such as {@code '$'})6002* <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).6003* </ul>6004*6005* These conditions are tested against the character information from version6006* 6.2 of the Unicode Standard.6007*6008* <p><b>Note:</b> This method cannot handle <a6009* href="#supplementary"> supplementary characters</a>. To support6010* all Unicode characters, including supplementary characters, use6011* the {@link #isJavaIdentifierStart(int)} method.6012*6013* @param ch the character to be tested.6014* @return {@code true} if the character may start a Java identifier;6015* {@code false} otherwise.6016* @see Character#isJavaIdentifierPart(char)6017* @see Character#isLetter(char)6018* @see Character#isUnicodeIdentifierStart(char)6019* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6020* @since 1.16021*/6022public static boolean isJavaIdentifierStart(char ch) {6023return isJavaIdentifierStart((int)ch);6024}60256026/**6027* Determines if the character (Unicode code point) is6028* permissible as the first character in a Java identifier.6029* <p>6030* A character may start a Java identifier if and only if6031* one of the following conditions is true:6032* <ul>6033* <li> {@link #isLetter(int) isLetter(codePoint)}6034* returns {@code true}6035* <li> {@link #getType(int) getType(codePoint)}6036* returns {@code LETTER_NUMBER}6037* <li> the referenced character is a currency symbol (such as {@code '$'})6038* <li> the referenced character is a connecting punctuation character6039* (such as {@code '_'}).6040* </ul>6041*6042* These conditions are tested against the character information from version6043* 6.2 of the Unicode Standard.6044*6045* @param codePoint the character (Unicode code point) to be tested.6046* @return {@code true} if the character may start a Java identifier;6047* {@code false} otherwise.6048* @see Character#isJavaIdentifierPart(int)6049* @see Character#isLetter(int)6050* @see Character#isUnicodeIdentifierStart(int)6051* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6052* @since 1.56053*/6054public static boolean isJavaIdentifierStart(int codePoint) {6055return CharacterData.of(codePoint).isJavaIdentifierStart(codePoint);6056}60576058/**6059* Determines if the specified character may be part of a Java6060* identifier as other than the first character.6061* <p>6062* A character may be part of a Java identifier if any of the following6063* conditions are true:6064* <ul>6065* <li> it is a letter6066* <li> it is a currency symbol (such as {@code '$'})6067* <li> it is a connecting punctuation character (such as {@code '_'})6068* <li> it is a digit6069* <li> it is a numeric letter (such as a Roman numeral character)6070* <li> it is a combining mark6071* <li> it is a non-spacing mark6072* <li> {@code isIdentifierIgnorable} returns6073* {@code true} for the character6074* </ul>6075*6076* These conditions are tested against the character information from version6077* 6.2 of the Unicode Standard.6078*6079* <p><b>Note:</b> This method cannot handle <a6080* href="#supplementary"> supplementary characters</a>. To support6081* all Unicode characters, including supplementary characters, use6082* the {@link #isJavaIdentifierPart(int)} method.6083*6084* @param ch the character to be tested.6085* @return {@code true} if the character may be part of a6086* Java identifier; {@code false} otherwise.6087* @see Character#isIdentifierIgnorable(char)6088* @see Character#isJavaIdentifierStart(char)6089* @see Character#isLetterOrDigit(char)6090* @see Character#isUnicodeIdentifierPart(char)6091* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6092* @since 1.16093*/6094public static boolean isJavaIdentifierPart(char ch) {6095return isJavaIdentifierPart((int)ch);6096}60976098/**6099* Determines if the character (Unicode code point) may be part of a Java6100* identifier as other than the first character.6101* <p>6102* A character may be part of a Java identifier if any of the following6103* conditions are true:6104* <ul>6105* <li> it is a letter6106* <li> it is a currency symbol (such as {@code '$'})6107* <li> it is a connecting punctuation character (such as {@code '_'})6108* <li> it is a digit6109* <li> it is a numeric letter (such as a Roman numeral character)6110* <li> it is a combining mark6111* <li> it is a non-spacing mark6112* <li> {@link #isIdentifierIgnorable(int)6113* isIdentifierIgnorable(codePoint)} returns {@code true} for6114* the code point6115* </ul>6116*6117* These conditions are tested against the character information from version6118* 6.2 of the Unicode Standard.6119*6120* @param codePoint the character (Unicode code point) to be tested.6121* @return {@code true} if the character may be part of a6122* Java identifier; {@code false} otherwise.6123* @see Character#isIdentifierIgnorable(int)6124* @see Character#isJavaIdentifierStart(int)6125* @see Character#isLetterOrDigit(int)6126* @see Character#isUnicodeIdentifierPart(int)6127* @see javax.lang.model.SourceVersion#isIdentifier(CharSequence)6128* @since 1.56129*/6130public static boolean isJavaIdentifierPart(int codePoint) {6131return CharacterData.of(codePoint).isJavaIdentifierPart(codePoint);6132}61336134/**6135* Determines if the specified character is permissible as the6136* first character in a Unicode identifier.6137* <p>6138* A character may start a Unicode identifier if and only if6139* one of the following conditions is true:6140* <ul>6141* <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}6142* <li> {@link #getType(char) getType(ch)} returns6143* {@code LETTER_NUMBER}.6144* </ul>6145*6146* <p><b>Note:</b> This method cannot handle <a6147* href="#supplementary"> supplementary characters</a>. To support6148* all Unicode characters, including supplementary characters, use6149* the {@link #isUnicodeIdentifierStart(int)} method.6150*6151* @param ch the character to be tested.6152* @return {@code true} if the character may start a Unicode6153* identifier; {@code false} otherwise.6154* @see Character#isJavaIdentifierStart(char)6155* @see Character#isLetter(char)6156* @see Character#isUnicodeIdentifierPart(char)6157* @since 1.16158*/6159public static boolean isUnicodeIdentifierStart(char ch) {6160return isUnicodeIdentifierStart((int)ch);6161}61626163/**6164* Determines if the specified character (Unicode code point) is permissible as the6165* first character in a Unicode identifier.6166* <p>6167* A character may start a Unicode identifier if and only if6168* one of the following conditions is true:6169* <ul>6170* <li> {@link #isLetter(int) isLetter(codePoint)}6171* returns {@code true}6172* <li> {@link #getType(int) getType(codePoint)}6173* returns {@code LETTER_NUMBER}.6174* </ul>6175* @param codePoint the character (Unicode code point) to be tested.6176* @return {@code true} if the character may start a Unicode6177* identifier; {@code false} otherwise.6178* @see Character#isJavaIdentifierStart(int)6179* @see Character#isLetter(int)6180* @see Character#isUnicodeIdentifierPart(int)6181* @since 1.56182*/6183public static boolean isUnicodeIdentifierStart(int codePoint) {6184return CharacterData.of(codePoint).isUnicodeIdentifierStart(codePoint);6185}61866187/**6188* Determines if the specified character may be part of a Unicode6189* identifier as other than the first character.6190* <p>6191* A character may be part of a Unicode identifier if and only if6192* one of the following statements is true:6193* <ul>6194* <li> it is a letter6195* <li> it is a connecting punctuation character (such as {@code '_'})6196* <li> it is a digit6197* <li> it is a numeric letter (such as a Roman numeral character)6198* <li> it is a combining mark6199* <li> it is a non-spacing mark6200* <li> {@code isIdentifierIgnorable} returns6201* {@code true} for this character.6202* </ul>6203*6204* <p><b>Note:</b> This method cannot handle <a6205* href="#supplementary"> supplementary characters</a>. To support6206* all Unicode characters, including supplementary characters, use6207* the {@link #isUnicodeIdentifierPart(int)} method.6208*6209* @param ch the character to be tested.6210* @return {@code true} if the character may be part of a6211* Unicode identifier; {@code false} otherwise.6212* @see Character#isIdentifierIgnorable(char)6213* @see Character#isJavaIdentifierPart(char)6214* @see Character#isLetterOrDigit(char)6215* @see Character#isUnicodeIdentifierStart(char)6216* @since 1.16217*/6218public static boolean isUnicodeIdentifierPart(char ch) {6219return isUnicodeIdentifierPart((int)ch);6220}62216222/**6223* Determines if the specified character (Unicode code point) may be part of a Unicode6224* identifier as other than the first character.6225* <p>6226* A character may be part of a Unicode identifier if and only if6227* one of the following statements is true:6228* <ul>6229* <li> it is a letter6230* <li> it is a connecting punctuation character (such as {@code '_'})6231* <li> it is a digit6232* <li> it is a numeric letter (such as a Roman numeral character)6233* <li> it is a combining mark6234* <li> it is a non-spacing mark6235* <li> {@code isIdentifierIgnorable} returns6236* {@code true} for this character.6237* </ul>6238* @param codePoint the character (Unicode code point) to be tested.6239* @return {@code true} if the character may be part of a6240* Unicode identifier; {@code false} otherwise.6241* @see Character#isIdentifierIgnorable(int)6242* @see Character#isJavaIdentifierPart(int)6243* @see Character#isLetterOrDigit(int)6244* @see Character#isUnicodeIdentifierStart(int)6245* @since 1.56246*/6247public static boolean isUnicodeIdentifierPart(int codePoint) {6248return CharacterData.of(codePoint).isUnicodeIdentifierPart(codePoint);6249}62506251/**6252* Determines if the specified character should be regarded as6253* an ignorable character in a Java identifier or a Unicode identifier.6254* <p>6255* The following Unicode characters are ignorable in a Java identifier6256* or a Unicode identifier:6257* <ul>6258* <li>ISO control characters that are not whitespace6259* <ul>6260* <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'}6261* <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'}6262* <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'}6263* </ul>6264*6265* <li>all characters that have the {@code FORMAT} general6266* category value6267* </ul>6268*6269* <p><b>Note:</b> This method cannot handle <a6270* href="#supplementary"> supplementary characters</a>. To support6271* all Unicode characters, including supplementary characters, use6272* the {@link #isIdentifierIgnorable(int)} method.6273*6274* @param ch the character to be tested.6275* @return {@code true} if the character is an ignorable control6276* character that may be part of a Java or Unicode identifier;6277* {@code false} otherwise.6278* @see Character#isJavaIdentifierPart(char)6279* @see Character#isUnicodeIdentifierPart(char)6280* @since 1.16281*/6282public static boolean isIdentifierIgnorable(char ch) {6283return isIdentifierIgnorable((int)ch);6284}62856286/**6287* Determines if the specified character (Unicode code point) should be regarded as6288* an ignorable character in a Java identifier or a Unicode identifier.6289* <p>6290* The following Unicode characters are ignorable in a Java identifier6291* or a Unicode identifier:6292* <ul>6293* <li>ISO control characters that are not whitespace6294* <ul>6295* <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'}6296* <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'}6297* <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'}6298* </ul>6299*6300* <li>all characters that have the {@code FORMAT} general6301* category value6302* </ul>6303*6304* @param codePoint the character (Unicode code point) to be tested.6305* @return {@code true} if the character is an ignorable control6306* character that may be part of a Java or Unicode identifier;6307* {@code false} otherwise.6308* @see Character#isJavaIdentifierPart(int)6309* @see Character#isUnicodeIdentifierPart(int)6310* @since 1.56311*/6312public static boolean isIdentifierIgnorable(int codePoint) {6313return CharacterData.of(codePoint).isIdentifierIgnorable(codePoint);6314}63156316/**6317* Converts the character argument to lowercase using case6318* mapping information from the UnicodeData file.6319* <p>6320* Note that6321* {@code Character.isLowerCase(Character.toLowerCase(ch))}6322* does not always return {@code true} for some ranges of6323* characters, particularly those that are symbols or ideographs.6324*6325* <p>In general, {@link String#toLowerCase()} should be used to map6326* characters to lowercase. {@code String} case mapping methods6327* have several benefits over {@code Character} case mapping methods.6328* {@code String} case mapping methods can perform locale-sensitive6329* mappings, context-sensitive mappings, and 1:M character mappings, whereas6330* the {@code Character} case mapping methods cannot.6331*6332* <p><b>Note:</b> This method cannot handle <a6333* href="#supplementary"> supplementary characters</a>. To support6334* all Unicode characters, including supplementary characters, use6335* the {@link #toLowerCase(int)} method.6336*6337* @param ch the character to be converted.6338* @return the lowercase equivalent of the character, if any;6339* otherwise, the character itself.6340* @see Character#isLowerCase(char)6341* @see String#toLowerCase()6342*/6343public static char toLowerCase(char ch) {6344return (char)toLowerCase((int)ch);6345}63466347/**6348* Converts the character (Unicode code point) argument to6349* lowercase using case mapping information from the UnicodeData6350* file.6351*6352* <p> Note that6353* {@code Character.isLowerCase(Character.toLowerCase(codePoint))}6354* does not always return {@code true} for some ranges of6355* characters, particularly those that are symbols or ideographs.6356*6357* <p>In general, {@link String#toLowerCase()} should be used to map6358* characters to lowercase. {@code String} case mapping methods6359* have several benefits over {@code Character} case mapping methods.6360* {@code String} case mapping methods can perform locale-sensitive6361* mappings, context-sensitive mappings, and 1:M character mappings, whereas6362* the {@code Character} case mapping methods cannot.6363*6364* @param codePoint the character (Unicode code point) to be converted.6365* @return the lowercase equivalent of the character (Unicode code6366* point), if any; otherwise, the character itself.6367* @see Character#isLowerCase(int)6368* @see String#toLowerCase()6369*6370* @since 1.56371*/6372public static int toLowerCase(int codePoint) {6373return CharacterData.of(codePoint).toLowerCase(codePoint);6374}63756376/**6377* Converts the character argument to uppercase using case mapping6378* information from the UnicodeData file.6379* <p>6380* Note that6381* {@code Character.isUpperCase(Character.toUpperCase(ch))}6382* does not always return {@code true} for some ranges of6383* characters, particularly those that are symbols or ideographs.6384*6385* <p>In general, {@link String#toUpperCase()} should be used to map6386* characters to uppercase. {@code String} case mapping methods6387* have several benefits over {@code Character} case mapping methods.6388* {@code String} case mapping methods can perform locale-sensitive6389* mappings, context-sensitive mappings, and 1:M character mappings, whereas6390* the {@code Character} case mapping methods cannot.6391*6392* <p><b>Note:</b> This method cannot handle <a6393* href="#supplementary"> supplementary characters</a>. To support6394* all Unicode characters, including supplementary characters, use6395* the {@link #toUpperCase(int)} method.6396*6397* @param ch the character to be converted.6398* @return the uppercase equivalent of the character, if any;6399* otherwise, the character itself.6400* @see Character#isUpperCase(char)6401* @see String#toUpperCase()6402*/6403public static char toUpperCase(char ch) {6404return (char)toUpperCase((int)ch);6405}64066407/**6408* Converts the character (Unicode code point) argument to6409* uppercase using case mapping information from the UnicodeData6410* file.6411*6412* <p>Note that6413* {@code Character.isUpperCase(Character.toUpperCase(codePoint))}6414* does not always return {@code true} for some ranges of6415* characters, particularly those that are symbols or ideographs.6416*6417* <p>In general, {@link String#toUpperCase()} should be used to map6418* characters to uppercase. {@code String} case mapping methods6419* have several benefits over {@code Character} case mapping methods.6420* {@code String} case mapping methods can perform locale-sensitive6421* mappings, context-sensitive mappings, and 1:M character mappings, whereas6422* the {@code Character} case mapping methods cannot.6423*6424* @param codePoint the character (Unicode code point) to be converted.6425* @return the uppercase equivalent of the character, if any;6426* otherwise, the character itself.6427* @see Character#isUpperCase(int)6428* @see String#toUpperCase()6429*6430* @since 1.56431*/6432public static int toUpperCase(int codePoint) {6433return CharacterData.of(codePoint).toUpperCase(codePoint);6434}64356436/**6437* Converts the character argument to titlecase using case mapping6438* information from the UnicodeData file. If a character has no6439* explicit titlecase mapping and is not itself a titlecase char6440* according to UnicodeData, then the uppercase mapping is6441* returned as an equivalent titlecase mapping. If the6442* {@code char} argument is already a titlecase6443* {@code char}, the same {@code char} value will be6444* returned.6445* <p>6446* Note that6447* {@code Character.isTitleCase(Character.toTitleCase(ch))}6448* does not always return {@code true} for some ranges of6449* characters.6450*6451* <p><b>Note:</b> This method cannot handle <a6452* href="#supplementary"> supplementary characters</a>. To support6453* all Unicode characters, including supplementary characters, use6454* the {@link #toTitleCase(int)} method.6455*6456* @param ch the character to be converted.6457* @return the titlecase equivalent of the character, if any;6458* otherwise, the character itself.6459* @see Character#isTitleCase(char)6460* @see Character#toLowerCase(char)6461* @see Character#toUpperCase(char)6462* @since 1.0.26463*/6464public static char toTitleCase(char ch) {6465return (char)toTitleCase((int)ch);6466}64676468/**6469* Converts the character (Unicode code point) argument to titlecase using case mapping6470* information from the UnicodeData file. If a character has no6471* explicit titlecase mapping and is not itself a titlecase char6472* according to UnicodeData, then the uppercase mapping is6473* returned as an equivalent titlecase mapping. If the6474* character argument is already a titlecase6475* character, the same character value will be6476* returned.6477*6478* <p>Note that6479* {@code Character.isTitleCase(Character.toTitleCase(codePoint))}6480* does not always return {@code true} for some ranges of6481* characters.6482*6483* @param codePoint the character (Unicode code point) to be converted.6484* @return the titlecase equivalent of the character, if any;6485* otherwise, the character itself.6486* @see Character#isTitleCase(int)6487* @see Character#toLowerCase(int)6488* @see Character#toUpperCase(int)6489* @since 1.56490*/6491public static int toTitleCase(int codePoint) {6492return CharacterData.of(codePoint).toTitleCase(codePoint);6493}64946495/**6496* Returns the numeric value of the character {@code ch} in the6497* specified radix.6498* <p>6499* If the radix is not in the range {@code MIN_RADIX} ≤6500* {@code radix} ≤ {@code MAX_RADIX} or if the6501* value of {@code ch} is not a valid digit in the specified6502* radix, {@code -1} is returned. A character is a valid digit6503* if at least one of the following is true:6504* <ul>6505* <li>The method {@code isDigit} is {@code true} of the character6506* and the Unicode decimal digit value of the character (or its6507* single-character decomposition) is less than the specified radix.6508* In this case the decimal digit value is returned.6509* <li>The character is one of the uppercase Latin letters6510* {@code 'A'} through {@code 'Z'} and its code is less than6511* {@code radix + 'A' - 10}.6512* In this case, {@code ch - 'A' + 10}6513* is returned.6514* <li>The character is one of the lowercase Latin letters6515* {@code 'a'} through {@code 'z'} and its code is less than6516* {@code radix + 'a' - 10}.6517* In this case, {@code ch - 'a' + 10}6518* is returned.6519* <li>The character is one of the fullwidth uppercase Latin letters A6520* ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})6521* and its code is less than6522* {@code radix + '\u005CuFF21' - 10}.6523* In this case, {@code ch - '\u005CuFF21' + 10}6524* is returned.6525* <li>The character is one of the fullwidth lowercase Latin letters a6526* ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})6527* and its code is less than6528* {@code radix + '\u005CuFF41' - 10}.6529* In this case, {@code ch - '\u005CuFF41' + 10}6530* is returned.6531* </ul>6532*6533* <p><b>Note:</b> This method cannot handle <a6534* href="#supplementary"> supplementary characters</a>. To support6535* all Unicode characters, including supplementary characters, use6536* the {@link #digit(int, int)} method.6537*6538* @param ch the character to be converted.6539* @param radix the radix.6540* @return the numeric value represented by the character in the6541* specified radix.6542* @see Character#forDigit(int, int)6543* @see Character#isDigit(char)6544*/6545public static int digit(char ch, int radix) {6546return digit((int)ch, radix);6547}65486549/**6550* Returns the numeric value of the specified character (Unicode6551* code point) in the specified radix.6552*6553* <p>If the radix is not in the range {@code MIN_RADIX} ≤6554* {@code radix} ≤ {@code MAX_RADIX} or if the6555* character is not a valid digit in the specified6556* radix, {@code -1} is returned. A character is a valid digit6557* if at least one of the following is true:6558* <ul>6559* <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character6560* and the Unicode decimal digit value of the character (or its6561* single-character decomposition) is less than the specified radix.6562* In this case the decimal digit value is returned.6563* <li>The character is one of the uppercase Latin letters6564* {@code 'A'} through {@code 'Z'} and its code is less than6565* {@code radix + 'A' - 10}.6566* In this case, {@code codePoint - 'A' + 10}6567* is returned.6568* <li>The character is one of the lowercase Latin letters6569* {@code 'a'} through {@code 'z'} and its code is less than6570* {@code radix + 'a' - 10}.6571* In this case, {@code codePoint - 'a' + 10}6572* is returned.6573* <li>The character is one of the fullwidth uppercase Latin letters A6574* ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'})6575* and its code is less than6576* {@code radix + '\u005CuFF21' - 10}.6577* In this case,6578* {@code codePoint - '\u005CuFF21' + 10}6579* is returned.6580* <li>The character is one of the fullwidth lowercase Latin letters a6581* ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'})6582* and its code is less than6583* {@code radix + '\u005CuFF41'- 10}.6584* In this case,6585* {@code codePoint - '\u005CuFF41' + 10}6586* is returned.6587* </ul>6588*6589* @param codePoint the character (Unicode code point) to be converted.6590* @param radix the radix.6591* @return the numeric value represented by the character in the6592* specified radix.6593* @see Character#forDigit(int, int)6594* @see Character#isDigit(int)6595* @since 1.56596*/6597public static int digit(int codePoint, int radix) {6598return CharacterData.of(codePoint).digit(codePoint, radix);6599}66006601/**6602* Returns the {@code int} value that the specified Unicode6603* character represents. For example, the character6604* {@code '\u005Cu216C'} (the roman numeral fifty) will return6605* an int with a value of 50.6606* <p>6607* The letters A-Z in their uppercase ({@code '\u005Cu0041'} through6608* {@code '\u005Cu005A'}), lowercase6609* ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and6610* full width variant ({@code '\u005CuFF21'} through6611* {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through6612* {@code '\u005CuFF5A'}) forms have numeric values from 106613* through 35. This is independent of the Unicode specification,6614* which does not assign numeric values to these {@code char}6615* values.6616* <p>6617* If the character does not have a numeric value, then -1 is returned.6618* If the character has a numeric value that cannot be represented as a6619* nonnegative integer (for example, a fractional value), then -26620* is returned.6621*6622* <p><b>Note:</b> This method cannot handle <a6623* href="#supplementary"> supplementary characters</a>. To support6624* all Unicode characters, including supplementary characters, use6625* the {@link #getNumericValue(int)} method.6626*6627* @param ch the character to be converted.6628* @return the numeric value of the character, as a nonnegative {@code int}6629* value; -2 if the character has a numeric value that is not a6630* nonnegative integer; -1 if the character has no numeric value.6631* @see Character#forDigit(int, int)6632* @see Character#isDigit(char)6633* @since 1.16634*/6635public static int getNumericValue(char ch) {6636return getNumericValue((int)ch);6637}66386639/**6640* Returns the {@code int} value that the specified6641* character (Unicode code point) represents. For example, the character6642* {@code '\u005Cu216C'} (the Roman numeral fifty) will return6643* an {@code int} with a value of 50.6644* <p>6645* The letters A-Z in their uppercase ({@code '\u005Cu0041'} through6646* {@code '\u005Cu005A'}), lowercase6647* ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and6648* full width variant ({@code '\u005CuFF21'} through6649* {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through6650* {@code '\u005CuFF5A'}) forms have numeric values from 106651* through 35. This is independent of the Unicode specification,6652* which does not assign numeric values to these {@code char}6653* values.6654* <p>6655* If the character does not have a numeric value, then -1 is returned.6656* If the character has a numeric value that cannot be represented as a6657* nonnegative integer (for example, a fractional value), then -26658* is returned.6659*6660* @param codePoint the character (Unicode code point) to be converted.6661* @return the numeric value of the character, as a nonnegative {@code int}6662* value; -2 if the character has a numeric value that is not a6663* nonnegative integer; -1 if the character has no numeric value.6664* @see Character#forDigit(int, int)6665* @see Character#isDigit(int)6666* @since 1.56667*/6668public static int getNumericValue(int codePoint) {6669return CharacterData.of(codePoint).getNumericValue(codePoint);6670}66716672/**6673* Determines if the specified character is ISO-LATIN-1 white space.6674* This method returns {@code true} for the following five6675* characters only:6676* <table summary="truechars">6677* <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td>6678* <td>{@code HORIZONTAL TABULATION}</td></tr>6679* <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td>6680* <td>{@code NEW LINE}</td></tr>6681* <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td>6682* <td>{@code FORM FEED}</td></tr>6683* <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td>6684* <td>{@code CARRIAGE RETURN}</td></tr>6685* <tr><td>{@code ' '}</td> <td>{@code U+0020}</td>6686* <td>{@code SPACE}</td></tr>6687* </table>6688*6689* @param ch the character to be tested.6690* @return {@code true} if the character is ISO-LATIN-1 white6691* space; {@code false} otherwise.6692* @see Character#isSpaceChar(char)6693* @see Character#isWhitespace(char)6694* @deprecated Replaced by isWhitespace(char).6695*/6696@Deprecated6697public static boolean isSpace(char ch) {6698return (ch <= 0x0020) &&6699(((((1L << 0x0009) |6700(1L << 0x000A) |6701(1L << 0x000C) |6702(1L << 0x000D) |6703(1L << 0x0020)) >> ch) & 1L) != 0);6704}670567066707/**6708* Determines if the specified character is a Unicode space character.6709* A character is considered to be a space character if and only if6710* it is specified to be a space character by the Unicode Standard. This6711* method returns true if the character's general category type is any of6712* the following:6713* <ul>6714* <li> {@code SPACE_SEPARATOR}6715* <li> {@code LINE_SEPARATOR}6716* <li> {@code PARAGRAPH_SEPARATOR}6717* </ul>6718*6719* <p><b>Note:</b> This method cannot handle <a6720* href="#supplementary"> supplementary characters</a>. To support6721* all Unicode characters, including supplementary characters, use6722* the {@link #isSpaceChar(int)} method.6723*6724* @param ch the character to be tested.6725* @return {@code true} if the character is a space character;6726* {@code false} otherwise.6727* @see Character#isWhitespace(char)6728* @since 1.16729*/6730public static boolean isSpaceChar(char ch) {6731return isSpaceChar((int)ch);6732}67336734/**6735* Determines if the specified character (Unicode code point) is a6736* Unicode space character. A character is considered to be a6737* space character if and only if it is specified to be a space6738* character by the Unicode Standard. This method returns true if6739* the character's general category type is any of the following:6740*6741* <ul>6742* <li> {@link #SPACE_SEPARATOR}6743* <li> {@link #LINE_SEPARATOR}6744* <li> {@link #PARAGRAPH_SEPARATOR}6745* </ul>6746*6747* @param codePoint the character (Unicode code point) to be tested.6748* @return {@code true} if the character is a space character;6749* {@code false} otherwise.6750* @see Character#isWhitespace(int)6751* @since 1.56752*/6753public static boolean isSpaceChar(int codePoint) {6754return ((((1 << Character.SPACE_SEPARATOR) |6755(1 << Character.LINE_SEPARATOR) |6756(1 << Character.PARAGRAPH_SEPARATOR)) >> getType(codePoint)) & 1)6757!= 0;6758}67596760/**6761* Determines if the specified character is white space according to Java.6762* A character is a Java whitespace character if and only if it satisfies6763* one of the following criteria:6764* <ul>6765* <li> It is a Unicode space character ({@code SPACE_SEPARATOR},6766* {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR})6767* but is not also a non-breaking space ({@code '\u005Cu00A0'},6768* {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).6769* <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.6770* <li> It is {@code '\u005Cn'}, U+000A LINE FEED.6771* <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.6772* <li> It is {@code '\u005Cf'}, U+000C FORM FEED.6773* <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.6774* <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.6775* <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.6776* <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.6777* <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.6778* </ul>6779*6780* <p><b>Note:</b> This method cannot handle <a6781* href="#supplementary"> supplementary characters</a>. To support6782* all Unicode characters, including supplementary characters, use6783* the {@link #isWhitespace(int)} method.6784*6785* @param ch the character to be tested.6786* @return {@code true} if the character is a Java whitespace6787* character; {@code false} otherwise.6788* @see Character#isSpaceChar(char)6789* @since 1.16790*/6791public static boolean isWhitespace(char ch) {6792return isWhitespace((int)ch);6793}67946795/**6796* Determines if the specified character (Unicode code point) is6797* white space according to Java. A character is a Java6798* whitespace character if and only if it satisfies one of the6799* following criteria:6800* <ul>6801* <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},6802* {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})6803* but is not also a non-breaking space ({@code '\u005Cu00A0'},6804* {@code '\u005Cu2007'}, {@code '\u005Cu202F'}).6805* <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION.6806* <li> It is {@code '\u005Cn'}, U+000A LINE FEED.6807* <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION.6808* <li> It is {@code '\u005Cf'}, U+000C FORM FEED.6809* <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN.6810* <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR.6811* <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR.6812* <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR.6813* <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR.6814* </ul>6815* <p>6816*6817* @param codePoint the character (Unicode code point) to be tested.6818* @return {@code true} if the character is a Java whitespace6819* character; {@code false} otherwise.6820* @see Character#isSpaceChar(int)6821* @since 1.56822*/6823public static boolean isWhitespace(int codePoint) {6824return CharacterData.of(codePoint).isWhitespace(codePoint);6825}68266827/**6828* Determines if the specified character is an ISO control6829* character. A character is considered to be an ISO control6830* character if its code is in the range {@code '\u005Cu0000'}6831* through {@code '\u005Cu001F'} or in the range6832* {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.6833*6834* <p><b>Note:</b> This method cannot handle <a6835* href="#supplementary"> supplementary characters</a>. To support6836* all Unicode characters, including supplementary characters, use6837* the {@link #isISOControl(int)} method.6838*6839* @param ch the character to be tested.6840* @return {@code true} if the character is an ISO control character;6841* {@code false} otherwise.6842*6843* @see Character#isSpaceChar(char)6844* @see Character#isWhitespace(char)6845* @since 1.16846*/6847public static boolean isISOControl(char ch) {6848return isISOControl((int)ch);6849}68506851/**6852* Determines if the referenced character (Unicode code point) is an ISO control6853* character. A character is considered to be an ISO control6854* character if its code is in the range {@code '\u005Cu0000'}6855* through {@code '\u005Cu001F'} or in the range6856* {@code '\u005Cu007F'} through {@code '\u005Cu009F'}.6857*6858* @param codePoint the character (Unicode code point) to be tested.6859* @return {@code true} if the character is an ISO control character;6860* {@code false} otherwise.6861* @see Character#isSpaceChar(int)6862* @see Character#isWhitespace(int)6863* @since 1.56864*/6865public static boolean isISOControl(int codePoint) {6866// Optimized form of:6867// (codePoint >= 0x00 && codePoint <= 0x1F) ||6868// (codePoint >= 0x7F && codePoint <= 0x9F);6869return codePoint <= 0x9F &&6870(codePoint >= 0x7F || (codePoint >>> 5 == 0));6871}68726873/**6874* Returns a value indicating a character's general category.6875*6876* <p><b>Note:</b> This method cannot handle <a6877* href="#supplementary"> supplementary characters</a>. To support6878* all Unicode characters, including supplementary characters, use6879* the {@link #getType(int)} method.6880*6881* @param ch the character to be tested.6882* @return a value of type {@code int} representing the6883* character's general category.6884* @see Character#COMBINING_SPACING_MARK6885* @see Character#CONNECTOR_PUNCTUATION6886* @see Character#CONTROL6887* @see Character#CURRENCY_SYMBOL6888* @see Character#DASH_PUNCTUATION6889* @see Character#DECIMAL_DIGIT_NUMBER6890* @see Character#ENCLOSING_MARK6891* @see Character#END_PUNCTUATION6892* @see Character#FINAL_QUOTE_PUNCTUATION6893* @see Character#FORMAT6894* @see Character#INITIAL_QUOTE_PUNCTUATION6895* @see Character#LETTER_NUMBER6896* @see Character#LINE_SEPARATOR6897* @see Character#LOWERCASE_LETTER6898* @see Character#MATH_SYMBOL6899* @see Character#MODIFIER_LETTER6900* @see Character#MODIFIER_SYMBOL6901* @see Character#NON_SPACING_MARK6902* @see Character#OTHER_LETTER6903* @see Character#OTHER_NUMBER6904* @see Character#OTHER_PUNCTUATION6905* @see Character#OTHER_SYMBOL6906* @see Character#PARAGRAPH_SEPARATOR6907* @see Character#PRIVATE_USE6908* @see Character#SPACE_SEPARATOR6909* @see Character#START_PUNCTUATION6910* @see Character#SURROGATE6911* @see Character#TITLECASE_LETTER6912* @see Character#UNASSIGNED6913* @see Character#UPPERCASE_LETTER6914* @since 1.16915*/6916public static int getType(char ch) {6917return getType((int)ch);6918}69196920/**6921* Returns a value indicating a character's general category.6922*6923* @param codePoint the character (Unicode code point) to be tested.6924* @return a value of type {@code int} representing the6925* character's general category.6926* @see Character#COMBINING_SPACING_MARK COMBINING_SPACING_MARK6927* @see Character#CONNECTOR_PUNCTUATION CONNECTOR_PUNCTUATION6928* @see Character#CONTROL CONTROL6929* @see Character#CURRENCY_SYMBOL CURRENCY_SYMBOL6930* @see Character#DASH_PUNCTUATION DASH_PUNCTUATION6931* @see Character#DECIMAL_DIGIT_NUMBER DECIMAL_DIGIT_NUMBER6932* @see Character#ENCLOSING_MARK ENCLOSING_MARK6933* @see Character#END_PUNCTUATION END_PUNCTUATION6934* @see Character#FINAL_QUOTE_PUNCTUATION FINAL_QUOTE_PUNCTUATION6935* @see Character#FORMAT FORMAT6936* @see Character#INITIAL_QUOTE_PUNCTUATION INITIAL_QUOTE_PUNCTUATION6937* @see Character#LETTER_NUMBER LETTER_NUMBER6938* @see Character#LINE_SEPARATOR LINE_SEPARATOR6939* @see Character#LOWERCASE_LETTER LOWERCASE_LETTER6940* @see Character#MATH_SYMBOL MATH_SYMBOL6941* @see Character#MODIFIER_LETTER MODIFIER_LETTER6942* @see Character#MODIFIER_SYMBOL MODIFIER_SYMBOL6943* @see Character#NON_SPACING_MARK NON_SPACING_MARK6944* @see Character#OTHER_LETTER OTHER_LETTER6945* @see Character#OTHER_NUMBER OTHER_NUMBER6946* @see Character#OTHER_PUNCTUATION OTHER_PUNCTUATION6947* @see Character#OTHER_SYMBOL OTHER_SYMBOL6948* @see Character#PARAGRAPH_SEPARATOR PARAGRAPH_SEPARATOR6949* @see Character#PRIVATE_USE PRIVATE_USE6950* @see Character#SPACE_SEPARATOR SPACE_SEPARATOR6951* @see Character#START_PUNCTUATION START_PUNCTUATION6952* @see Character#SURROGATE SURROGATE6953* @see Character#TITLECASE_LETTER TITLECASE_LETTER6954* @see Character#UNASSIGNED UNASSIGNED6955* @see Character#UPPERCASE_LETTER UPPERCASE_LETTER6956* @since 1.56957*/6958public static int getType(int codePoint) {6959return CharacterData.of(codePoint).getType(codePoint);6960}69616962/**6963* Determines the character representation for a specific digit in6964* the specified radix. If the value of {@code radix} is not a6965* valid radix, or the value of {@code digit} is not a valid6966* digit in the specified radix, the null character6967* ({@code '\u005Cu0000'}) is returned.6968* <p>6969* The {@code radix} argument is valid if it is greater than or6970* equal to {@code MIN_RADIX} and less than or equal to6971* {@code MAX_RADIX}. The {@code digit} argument is valid if6972* {@code 0 <= digit < radix}.6973* <p>6974* If the digit is less than 10, then6975* {@code '0' + digit} is returned. Otherwise, the value6976* {@code 'a' + digit - 10} is returned.6977*6978* @param digit the number to convert to a character.6979* @param radix the radix.6980* @return the {@code char} representation of the specified digit6981* in the specified radix.6982* @see Character#MIN_RADIX6983* @see Character#MAX_RADIX6984* @see Character#digit(char, int)6985*/6986public static char forDigit(int digit, int radix) {6987if ((digit >= radix) || (digit < 0)) {6988return '\0';6989}6990if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) {6991return '\0';6992}6993if (digit < 10) {6994return (char)('0' + digit);6995}6996return (char)('a' - 10 + digit);6997}69986999/**7000* Returns the Unicode directionality property for the given7001* character. Character directionality is used to calculate the7002* visual ordering of text. The directionality value of undefined7003* {@code char} values is {@code DIRECTIONALITY_UNDEFINED}.7004*7005* <p><b>Note:</b> This method cannot handle <a7006* href="#supplementary"> supplementary characters</a>. To support7007* all Unicode characters, including supplementary characters, use7008* the {@link #getDirectionality(int)} method.7009*7010* @param ch {@code char} for which the directionality property7011* is requested.7012* @return the directionality property of the {@code char} value.7013*7014* @see Character#DIRECTIONALITY_UNDEFINED7015* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT7016* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT7017* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC7018* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER7019* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR7020* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR7021* @see Character#DIRECTIONALITY_ARABIC_NUMBER7022* @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR7023* @see Character#DIRECTIONALITY_NONSPACING_MARK7024* @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL7025* @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR7026* @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR7027* @see Character#DIRECTIONALITY_WHITESPACE7028* @see Character#DIRECTIONALITY_OTHER_NEUTRALS7029* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING7030* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE7031* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING7032* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE7033* @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT7034* @since 1.47035*/7036public static byte getDirectionality(char ch) {7037return getDirectionality((int)ch);7038}70397040/**7041* Returns the Unicode directionality property for the given7042* character (Unicode code point). Character directionality is7043* used to calculate the visual ordering of text. The7044* directionality value of undefined character is {@link7045* #DIRECTIONALITY_UNDEFINED}.7046*7047* @param codePoint the character (Unicode code point) for which7048* the directionality property is requested.7049* @return the directionality property of the character.7050*7051* @see Character#DIRECTIONALITY_UNDEFINED DIRECTIONALITY_UNDEFINED7052* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT DIRECTIONALITY_LEFT_TO_RIGHT7053* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT DIRECTIONALITY_RIGHT_TO_LEFT7054* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC7055* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER DIRECTIONALITY_EUROPEAN_NUMBER7056* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR7057* @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR7058* @see Character#DIRECTIONALITY_ARABIC_NUMBER DIRECTIONALITY_ARABIC_NUMBER7059* @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR DIRECTIONALITY_COMMON_NUMBER_SEPARATOR7060* @see Character#DIRECTIONALITY_NONSPACING_MARK DIRECTIONALITY_NONSPACING_MARK7061* @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL DIRECTIONALITY_BOUNDARY_NEUTRAL7062* @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR DIRECTIONALITY_PARAGRAPH_SEPARATOR7063* @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR DIRECTIONALITY_SEGMENT_SEPARATOR7064* @see Character#DIRECTIONALITY_WHITESPACE DIRECTIONALITY_WHITESPACE7065* @see Character#DIRECTIONALITY_OTHER_NEUTRALS DIRECTIONALITY_OTHER_NEUTRALS7066* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING7067* @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE7068* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING7069* @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE7070* @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT DIRECTIONALITY_POP_DIRECTIONAL_FORMAT7071* @since 1.57072*/7073public static byte getDirectionality(int codePoint) {7074return CharacterData.of(codePoint).getDirectionality(codePoint);7075}70767077/**7078* Determines whether the character is mirrored according to the7079* Unicode specification. Mirrored characters should have their7080* glyphs horizontally mirrored when displayed in text that is7081* right-to-left. For example, {@code '\u005Cu0028'} LEFT7082* PARENTHESIS is semantically defined to be an <i>opening7083* parenthesis</i>. This will appear as a "(" in text that is7084* left-to-right but as a ")" in text that is right-to-left.7085*7086* <p><b>Note:</b> This method cannot handle <a7087* href="#supplementary"> supplementary characters</a>. To support7088* all Unicode characters, including supplementary characters, use7089* the {@link #isMirrored(int)} method.7090*7091* @param ch {@code char} for which the mirrored property is requested7092* @return {@code true} if the char is mirrored, {@code false}7093* if the {@code char} is not mirrored or is not defined.7094* @since 1.47095*/7096public static boolean isMirrored(char ch) {7097return isMirrored((int)ch);7098}70997100/**7101* Determines whether the specified character (Unicode code point)7102* is mirrored according to the Unicode specification. Mirrored7103* characters should have their glyphs horizontally mirrored when7104* displayed in text that is right-to-left. For example,7105* {@code '\u005Cu0028'} LEFT PARENTHESIS is semantically7106* defined to be an <i>opening parenthesis</i>. This will appear7107* as a "(" in text that is left-to-right but as a ")" in text7108* that is right-to-left.7109*7110* @param codePoint the character (Unicode code point) to be tested.7111* @return {@code true} if the character is mirrored, {@code false}7112* if the character is not mirrored or is not defined.7113* @since 1.57114*/7115public static boolean isMirrored(int codePoint) {7116return CharacterData.of(codePoint).isMirrored(codePoint);7117}71187119/**7120* Compares two {@code Character} objects numerically.7121*7122* @param anotherCharacter the {@code Character} to be compared.71237124* @return the value {@code 0} if the argument {@code Character}7125* is equal to this {@code Character}; a value less than7126* {@code 0} if this {@code Character} is numerically less7127* than the {@code Character} argument; and a value greater than7128* {@code 0} if this {@code Character} is numerically greater7129* than the {@code Character} argument (unsigned comparison).7130* Note that this is strictly a numerical comparison; it is not7131* locale-dependent.7132* @since 1.27133*/7134public int compareTo(Character anotherCharacter) {7135return compare(this.value, anotherCharacter.value);7136}71377138/**7139* Compares two {@code char} values numerically.7140* The value returned is identical to what would be returned by:7141* <pre>7142* Character.valueOf(x).compareTo(Character.valueOf(y))7143* </pre>7144*7145* @param x the first {@code char} to compare7146* @param y the second {@code char} to compare7147* @return the value {@code 0} if {@code x == y};7148* a value less than {@code 0} if {@code x < y}; and7149* a value greater than {@code 0} if {@code x > y}7150* @since 1.77151*/7152public static int compare(char x, char y) {7153return x - y;7154}71557156/**7157* Converts the character (Unicode code point) argument to uppercase using7158* information from the UnicodeData file.7159* <p>7160*7161* @param codePoint the character (Unicode code point) to be converted.7162* @return either the uppercase equivalent of the character, if7163* any, or an error flag ({@code Character.ERROR})7164* that indicates that a 1:M {@code char} mapping exists.7165* @see Character#isLowerCase(char)7166* @see Character#isUpperCase(char)7167* @see Character#toLowerCase(char)7168* @see Character#toTitleCase(char)7169* @since 1.47170*/7171static int toUpperCaseEx(int codePoint) {7172assert isValidCodePoint(codePoint);7173return CharacterData.of(codePoint).toUpperCaseEx(codePoint);7174}71757176/**7177* Converts the character (Unicode code point) argument to uppercase using case7178* mapping information from the SpecialCasing file in the Unicode7179* specification. If a character has no explicit uppercase7180* mapping, then the {@code char} itself is returned in the7181* {@code char[]}.7182*7183* @param codePoint the character (Unicode code point) to be converted.7184* @return a {@code char[]} with the uppercased character.7185* @since 1.47186*/7187static char[] toUpperCaseCharArray(int codePoint) {7188// As of Unicode 6.0, 1:M uppercasings only happen in the BMP.7189assert isBmpCodePoint(codePoint);7190return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint);7191}71927193/**7194* The number of bits used to represent a <tt>char</tt> value in unsigned7195* binary form, constant {@code 16}.7196*7197* @since 1.57198*/7199public static final int SIZE = 16;72007201/**7202* The number of bytes used to represent a {@code char} value in unsigned7203* binary form.7204*7205* @since 1.87206*/7207public static final int BYTES = SIZE / Byte.SIZE;72087209/**7210* Returns the value obtained by reversing the order of the bytes in the7211* specified <tt>char</tt> value.7212*7213* @param ch The {@code char} of which to reverse the byte order.7214* @return the value obtained by reversing (or, equivalently, swapping)7215* the bytes in the specified <tt>char</tt> value.7216* @since 1.57217*/7218public static char reverseBytes(char ch) {7219return (char) (((ch & 0xFF00) >> 8) | (ch << 8));7220}72217222/**7223* Returns the Unicode name of the specified character7224* {@code codePoint}, or null if the code point is7225* {@link #UNASSIGNED unassigned}.7226* <p>7227* Note: if the specified character is not assigned a name by7228* the <i>UnicodeData</i> file (part of the Unicode Character7229* Database maintained by the Unicode Consortium), the returned7230* name is the same as the result of expression.7231*7232* <blockquote>{@code7233* Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ')7234* + " "7235* + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7236*7237* }</blockquote>7238*7239* @param codePoint the character (Unicode code point)7240*7241* @return the Unicode name of the specified character, or null if7242* the code point is unassigned.7243*7244* @exception IllegalArgumentException if the specified7245* {@code codePoint} is not a valid Unicode7246* code point.7247*7248* @since 1.77249*/7250public static String getName(int codePoint) {7251if (!isValidCodePoint(codePoint)) {7252throw new IllegalArgumentException();7253}7254String name = CharacterName.get(codePoint);7255if (name != null)7256return name;7257if (getType(codePoint) == UNASSIGNED)7258return null;7259UnicodeBlock block = UnicodeBlock.of(codePoint);7260if (block != null)7261return block.toString().replace('_', ' ') + " "7262+ Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7263// should never come here7264return Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);7265}7266}726772687269