Path: blob/aarch64-shenandoah-jdk8u272-b10/jdk/src/share/classes/java/nio/charset/Charset.java
38918 views
/*1* Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.2* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.3*4* This code is free software; you can redistribute it and/or modify it5* under the terms of the GNU General Public License version 2 only, as6* published by the Free Software Foundation. Oracle designates this7* particular file as subject to the "Classpath" exception as provided8* by Oracle in the LICENSE file that accompanied this code.9*10* This code is distributed in the hope that it will be useful, but WITHOUT11* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or12* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License13* version 2 for more details (a copy is included in the LICENSE file that14* accompanied this code).15*16* You should have received a copy of the GNU General Public License version17* 2 along with this work; if not, write to the Free Software Foundation,18* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.19*20* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA21* or visit www.oracle.com if you need additional information or have any22* questions.23*/2425package java.nio.charset;2627import java.nio.ByteBuffer;28import java.nio.CharBuffer;29import java.nio.charset.spi.CharsetProvider;30import java.security.AccessController;31import java.security.PrivilegedAction;32import java.util.Collections;33import java.util.HashSet;34import java.util.Iterator;35import java.util.Locale;36import java.util.Map;37import java.util.NoSuchElementException;38import java.util.Set;39import java.util.ServiceLoader;40import java.util.ServiceConfigurationError;41import java.util.SortedMap;42import java.util.TreeMap;43import sun.misc.ASCIICaseInsensitiveComparator;44import sun.nio.cs.StandardCharsets;45import sun.nio.cs.ThreadLocalCoders;46import sun.security.action.GetPropertyAction;474849/**50* A named mapping between sequences of sixteen-bit Unicode <a51* href="../../lang/Character.html#unicode">code units</a> and sequences of52* bytes. This class defines methods for creating decoders and encoders and53* for retrieving the various names associated with a charset. Instances of54* this class are immutable.55*56* <p> This class also defines static methods for testing whether a particular57* charset is supported, for locating charset instances by name, and for58* constructing a map that contains every charset for which support is59* available in the current Java virtual machine. Support for new charsets can60* be added via the service-provider interface defined in the {@link61* java.nio.charset.spi.CharsetProvider} class.62*63* <p> All of the methods defined in this class are safe for use by multiple64* concurrent threads.65*66*67* <a name="names"></a><a name="charenc"></a>68* <h2>Charset names</h2>69*70* <p> Charsets are named by strings composed of the following characters:71*72* <ul>73*74* <li> The uppercase letters <tt>'A'</tt> through <tt>'Z'</tt>75* (<tt>'\u0041'</tt> through <tt>'\u005a'</tt>),76*77* <li> The lowercase letters <tt>'a'</tt> through <tt>'z'</tt>78* (<tt>'\u0061'</tt> through <tt>'\u007a'</tt>),79*80* <li> The digits <tt>'0'</tt> through <tt>'9'</tt>81* (<tt>'\u0030'</tt> through <tt>'\u0039'</tt>),82*83* <li> The dash character <tt>'-'</tt>84* (<tt>'\u002d'</tt>, <small>HYPHEN-MINUS</small>),85*86* <li> The plus character <tt>'+'</tt>87* (<tt>'\u002b'</tt>, <small>PLUS SIGN</small>),88*89* <li> The period character <tt>'.'</tt>90* (<tt>'\u002e'</tt>, <small>FULL STOP</small>),91*92* <li> The colon character <tt>':'</tt>93* (<tt>'\u003a'</tt>, <small>COLON</small>), and94*95* <li> The underscore character <tt>'_'</tt>96* (<tt>'\u005f'</tt>, <small>LOW LINE</small>).97*98* </ul>99*100* A charset name must begin with either a letter or a digit. The empty string101* is not a legal charset name. Charset names are not case-sensitive; that is,102* case is always ignored when comparing charset names. Charset names103* generally follow the conventions documented in <a104* href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278: IANA Charset105* Registration Procedures</i></a>.106*107* <p> Every charset has a <i>canonical name</i> and may also have one or more108* <i>aliases</i>. The canonical name is returned by the {@link #name() name} method109* of this class. Canonical names are, by convention, usually in upper case.110* The aliases of a charset are returned by the {@link #aliases() aliases}111* method.112*113* <p><a name="hn">Some charsets have an <i>historical name</i> that is defined for114* compatibility with previous versions of the Java platform.</a> A charset's115* historical name is either its canonical name or one of its aliases. The116* historical name is returned by the <tt>getEncoding()</tt> methods of the117* {@link java.io.InputStreamReader#getEncoding InputStreamReader} and {@link118* java.io.OutputStreamWriter#getEncoding OutputStreamWriter} classes.119*120* <p><a name="iana"> </a>If a charset listed in the <a121* href="http://www.iana.org/assignments/character-sets"><i>IANA Charset122* Registry</i></a> is supported by an implementation of the Java platform then123* its canonical name must be the name listed in the registry. Many charsets124* are given more than one name in the registry, in which case the registry125* identifies one of the names as <i>MIME-preferred</i>. If a charset has more126* than one registry name then its canonical name must be the MIME-preferred127* name and the other names in the registry must be valid aliases. If a128* supported charset is not listed in the IANA registry then its canonical name129* must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.130*131* <p> The IANA charset registry does change over time, and so the canonical132* name and the aliases of a particular charset may also change over time. To133* ensure compatibility it is recommended that no alias ever be removed from a134* charset, and that if the canonical name of a charset is changed then its135* previous canonical name be made into an alias.136*137*138* <h2>Standard charsets</h2>139*140*141*142* <p><a name="standard">Every implementation of the Java platform is required to support the143* following standard charsets.</a> Consult the release documentation for your144* implementation to see if any other charsets are supported. The behavior145* of such optional charsets may differ between implementations.146*147* <blockquote><table width="80%" summary="Description of standard charsets">148* <tr><th align="left">Charset</th><th align="left">Description</th></tr>149* <tr><td valign=top><tt>US-ASCII</tt></td>150* <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,151* a.k.a. the Basic Latin block of the Unicode character set</td></tr>152* <tr><td valign=top><tt>ISO-8859-1 </tt></td>153* <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>154* <tr><td valign=top><tt>UTF-8</tt></td>155* <td>Eight-bit UCS Transformation Format</td></tr>156* <tr><td valign=top><tt>UTF-16BE</tt></td>157* <td>Sixteen-bit UCS Transformation Format,158* big-endian byte order</td></tr>159* <tr><td valign=top><tt>UTF-16LE</tt></td>160* <td>Sixteen-bit UCS Transformation Format,161* little-endian byte order</td></tr>162* <tr><td valign=top><tt>UTF-16</tt></td>163* <td>Sixteen-bit UCS Transformation Format,164* byte order identified by an optional byte-order mark</td></tr>165* </table></blockquote>166*167* <p> The <tt>UTF-8</tt> charset is specified by <a168* href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC 2279</i></a>; the169* transformation format upon which it is based is specified in170* Amendment 2 of ISO 10646-1 and is also described in the <a171* href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode172* Standard</i></a>.173*174* <p> The <tt>UTF-16</tt> charsets are specified by <a175* href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC 2781</i></a>; the176* transformation formats upon which they are based are specified in177* Amendment 1 of ISO 10646-1 and are also described in the <a178* href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode179* Standard</i></a>.180*181* <p> The <tt>UTF-16</tt> charsets use sixteen-bit quantities and are182* therefore sensitive to byte order. In these encodings the byte order of a183* stream may be indicated by an initial <i>byte-order mark</i> represented by184* the Unicode character <tt>'\uFEFF'</tt>. Byte-order marks are handled185* as follows:186*187* <ul>188*189* <li><p> When decoding, the <tt>UTF-16BE</tt> and <tt>UTF-16LE</tt>190* charsets interpret the initial byte-order marks as a <small>ZERO-WIDTH191* NON-BREAKING SPACE</small>; when encoding, they do not write192* byte-order marks. </p></li>193194*195* <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the196* byte-order mark at the beginning of the input stream to indicate the197* byte-order of the stream but defaults to big-endian if there is no198* byte-order mark; when encoding, it uses big-endian byte order and writes199* a big-endian byte-order mark. </p></li>200*201* </ul>202*203* In any case, byte order marks occurring after the first element of an204* input sequence are not omitted since the same code is used to represent205* <small>ZERO-WIDTH NON-BREAKING SPACE</small>.206*207* <p> Every instance of the Java virtual machine has a default charset, which208* may or may not be one of the standard charsets. The default charset is209* determined during virtual-machine startup and typically depends upon the210* locale and charset being used by the underlying operating system. </p>211*212* <p>The {@link StandardCharsets} class defines constants for each of the213* standard charsets.214*215* <h2>Terminology</h2>216*217* <p> The name of this class is taken from the terms used in218* <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278</i></a>.219* In that document a <i>charset</i> is defined as the combination of220* one or more coded character sets and a character-encoding scheme.221* (This definition is confusing; some other software systems define222* <i>charset</i> as a synonym for <i>coded character set</i>.)223*224* <p> A <i>coded character set</i> is a mapping between a set of abstract225* characters and a set of integers. US-ASCII, ISO 8859-1,226* JIS X 0201, and Unicode are examples of coded character sets.227*228* <p> Some standards have defined a <i>character set</i> to be simply a229* set of abstract characters without an associated assigned numbering.230* An alphabet is an example of such a character set. However, the subtle231* distinction between <i>character set</i> and <i>coded character set</i>232* is rarely used in practice; the former has become a short form for the233* latter, including in the Java API specification.234*235* <p> A <i>character-encoding scheme</i> is a mapping between one or more236* coded character sets and a set of octet (eight-bit byte) sequences.237* UTF-8, UTF-16, ISO 2022, and EUC are examples of238* character-encoding schemes. Encoding schemes are often associated with239* a particular coded character set; UTF-8, for example, is used only to240* encode Unicode. Some schemes, however, are associated with multiple241* coded character sets; EUC, for example, can be used to encode242* characters in a variety of Asian coded character sets.243*244* <p> When a coded character set is used exclusively with a single245* character-encoding scheme then the corresponding charset is usually246* named for the coded character set; otherwise a charset is usually named247* for the encoding scheme and, possibly, the locale of the coded248* character sets that it supports. Hence <tt>US-ASCII</tt> is both the249* name of a coded character set and of the charset that encodes it, while250* <tt>EUC-JP</tt> is the name of the charset that encodes the251* JIS X 0201, JIS X 0208, and JIS X 0212252* coded character sets for the Japanese language.253*254* <p> The native character encoding of the Java programming language is255* UTF-16. A charset in the Java platform therefore defines a mapping256* between sequences of sixteen-bit UTF-16 code units (that is, sequences257* of chars) and sequences of bytes. </p>258*259*260* @author Mark Reinhold261* @author JSR-51 Expert Group262* @since 1.4263*264* @see CharsetDecoder265* @see CharsetEncoder266* @see java.nio.charset.spi.CharsetProvider267* @see java.lang.Character268*/269270public abstract class Charset271implements Comparable<Charset>272{273274/* -- Static methods -- */275276private static volatile String bugLevel = null;277278static boolean atBugLevel(String bl) { // package-private279String level = bugLevel;280if (level == null) {281if (!sun.misc.VM.isBooted())282return false;283bugLevel = level = AccessController.doPrivileged(284new GetPropertyAction("sun.nio.cs.bugLevel", ""));285}286return level.equals(bl);287}288289/**290* Checks that the given string is a legal charset name. </p>291*292* @param s293* A purported charset name294*295* @throws IllegalCharsetNameException296* If the given name is not a legal charset name297*/298private static void checkName(String s) {299int n = s.length();300if (!atBugLevel("1.4")) {301if (n == 0)302throw new IllegalCharsetNameException(s);303}304for (int i = 0; i < n; i++) {305char c = s.charAt(i);306if (c >= 'A' && c <= 'Z') continue;307if (c >= 'a' && c <= 'z') continue;308if (c >= '0' && c <= '9') continue;309if (c == '-' && i != 0) continue;310if (c == '+' && i != 0) continue;311if (c == ':' && i != 0) continue;312if (c == '_' && i != 0) continue;313if (c == '.' && i != 0) continue;314throw new IllegalCharsetNameException(s);315}316}317318/* The standard set of charsets */319private static CharsetProvider standardProvider = new StandardCharsets();320321// Cache of the most-recently-returned charsets,322// along with the names that were used to find them323//324private static volatile Object[] cache1 = null; // "Level 1" cache325private static volatile Object[] cache2 = null; // "Level 2" cache326327private static void cache(String charsetName, Charset cs) {328cache2 = cache1;329cache1 = new Object[] { charsetName, cs };330}331332// Creates an iterator that walks over the available providers, ignoring333// those whose lookup or instantiation causes a security exception to be334// thrown. Should be invoked with full privileges.335//336private static Iterator<CharsetProvider> providers() {337return new Iterator<CharsetProvider>() {338339ClassLoader cl = ClassLoader.getSystemClassLoader();340ServiceLoader<CharsetProvider> sl =341ServiceLoader.load(CharsetProvider.class, cl);342Iterator<CharsetProvider> i = sl.iterator();343344CharsetProvider next = null;345346private boolean getNext() {347while (next == null) {348try {349if (!i.hasNext())350return false;351next = i.next();352} catch (ServiceConfigurationError sce) {353if (sce.getCause() instanceof SecurityException) {354// Ignore security exceptions355continue;356}357throw sce;358}359}360return true;361}362363public boolean hasNext() {364return getNext();365}366367public CharsetProvider next() {368if (!getNext())369throw new NoSuchElementException();370CharsetProvider n = next;371next = null;372return n;373}374375public void remove() {376throw new UnsupportedOperationException();377}378379};380}381382// Thread-local gate to prevent recursive provider lookups383private static ThreadLocal<ThreadLocal<?>> gate =384new ThreadLocal<ThreadLocal<?>>();385386private static Charset lookupViaProviders(final String charsetName) {387388// The runtime startup sequence looks up standard charsets as a389// consequence of the VM's invocation of System.initializeSystemClass390// in order to, e.g., set system properties and encode filenames. At391// that point the application class loader has not been initialized,392// however, so we can't look for providers because doing so will cause393// that loader to be prematurely initialized with incomplete394// information.395//396if (!sun.misc.VM.isBooted())397return null;398399if (gate.get() != null)400// Avoid recursive provider lookups401return null;402try {403gate.set(gate);404405return AccessController.doPrivileged(406new PrivilegedAction<Charset>() {407public Charset run() {408for (Iterator<CharsetProvider> i = providers();409i.hasNext();) {410CharsetProvider cp = i.next();411Charset cs = cp.charsetForName(charsetName);412if (cs != null)413return cs;414}415return null;416}417});418419} finally {420gate.set(null);421}422}423424/* The extended set of charsets */425private static class ExtendedProviderHolder {426static final CharsetProvider extendedProvider = extendedProvider();427// returns ExtendedProvider, if installed428private static CharsetProvider extendedProvider() {429return AccessController.doPrivileged(430new PrivilegedAction<CharsetProvider>() {431public CharsetProvider run() {432try {433Class<?> epc434= Class.forName("sun.nio.cs.ext.ExtendedCharsets");435return (CharsetProvider)epc.newInstance();436} catch (ClassNotFoundException x) {437// Extended charsets not available438// (charsets.jar not present)439} catch (InstantiationException |440IllegalAccessException x) {441throw new Error(x);442}443return null;444}445});446}447}448449private static Charset lookupExtendedCharset(String charsetName) {450CharsetProvider ecp = ExtendedProviderHolder.extendedProvider;451return (ecp != null) ? ecp.charsetForName(charsetName) : null;452}453454private static Charset lookup(String charsetName) {455if (charsetName == null)456throw new IllegalArgumentException("Null charset name");457Object[] a;458if ((a = cache1) != null && charsetName.equals(a[0]))459return (Charset)a[1];460// We expect most programs to use one Charset repeatedly.461// We convey a hint to this effect to the VM by putting the462// level 1 cache miss code in a separate method.463return lookup2(charsetName);464}465466private static Charset lookup2(String charsetName) {467Object[] a;468if ((a = cache2) != null && charsetName.equals(a[0])) {469cache2 = cache1;470cache1 = a;471return (Charset)a[1];472}473Charset cs;474if ((cs = standardProvider.charsetForName(charsetName)) != null ||475(cs = lookupExtendedCharset(charsetName)) != null ||476(cs = lookupViaProviders(charsetName)) != null)477{478cache(charsetName, cs);479return cs;480}481482/* Only need to check the name if we didn't find a charset for it */483checkName(charsetName);484return null;485}486487/**488* Tells whether the named charset is supported.489*490* @param charsetName491* The name of the requested charset; may be either492* a canonical name or an alias493*494* @return <tt>true</tt> if, and only if, support for the named charset495* is available in the current Java virtual machine496*497* @throws IllegalCharsetNameException498* If the given charset name is illegal499*500* @throws IllegalArgumentException501* If the given <tt>charsetName</tt> is null502*/503public static boolean isSupported(String charsetName) {504return (lookup(charsetName) != null);505}506507/**508* Returns a charset object for the named charset.509*510* @param charsetName511* The name of the requested charset; may be either512* a canonical name or an alias513*514* @return A charset object for the named charset515*516* @throws IllegalCharsetNameException517* If the given charset name is illegal518*519* @throws IllegalArgumentException520* If the given <tt>charsetName</tt> is null521*522* @throws UnsupportedCharsetException523* If no support for the named charset is available524* in this instance of the Java virtual machine525*/526public static Charset forName(String charsetName) {527Charset cs = lookup(charsetName);528if (cs != null)529return cs;530throw new UnsupportedCharsetException(charsetName);531}532533// Fold charsets from the given iterator into the given map, ignoring534// charsets whose names already have entries in the map.535//536private static void put(Iterator<Charset> i, Map<String,Charset> m) {537while (i.hasNext()) {538Charset cs = i.next();539if (!m.containsKey(cs.name()))540m.put(cs.name(), cs);541}542}543544/**545* Constructs a sorted map from canonical charset names to charset objects.546*547* <p> The map returned by this method will have one entry for each charset548* for which support is available in the current Java virtual machine. If549* two or more supported charsets have the same canonical name then the550* resulting map will contain just one of them; which one it will contain551* is not specified. </p>552*553* <p> The invocation of this method, and the subsequent use of the554* resulting map, may cause time-consuming disk or network I/O operations555* to occur. This method is provided for applications that need to556* enumerate all of the available charsets, for example to allow user557* charset selection. This method is not used by the {@link #forName558* forName} method, which instead employs an efficient incremental lookup559* algorithm.560*561* <p> This method may return different results at different times if new562* charset providers are dynamically made available to the current Java563* virtual machine. In the absence of such changes, the charsets returned564* by this method are exactly those that can be retrieved via the {@link565* #forName forName} method. </p>566*567* @return An immutable, case-insensitive map from canonical charset names568* to charset objects569*/570public static SortedMap<String,Charset> availableCharsets() {571return AccessController.doPrivileged(572new PrivilegedAction<SortedMap<String,Charset>>() {573public SortedMap<String,Charset> run() {574TreeMap<String,Charset> m =575new TreeMap<String,Charset>(576ASCIICaseInsensitiveComparator.CASE_INSENSITIVE_ORDER);577put(standardProvider.charsets(), m);578CharsetProvider ecp = ExtendedProviderHolder.extendedProvider;579if (ecp != null)580put(ecp.charsets(), m);581for (Iterator<CharsetProvider> i = providers(); i.hasNext();) {582CharsetProvider cp = i.next();583put(cp.charsets(), m);584}585return Collections.unmodifiableSortedMap(m);586}587});588}589590private static volatile Charset defaultCharset;591592/**593* Returns the default charset of this Java virtual machine.594*595* <p> The default charset is determined during virtual-machine startup and596* typically depends upon the locale and charset of the underlying597* operating system.598*599* @return A charset object for the default charset600*601* @since 1.5602*/603public static Charset defaultCharset() {604if (defaultCharset == null) {605synchronized (Charset.class) {606String csn = AccessController.doPrivileged(607new GetPropertyAction("file.encoding"));608Charset cs = lookup(csn);609if (cs != null)610defaultCharset = cs;611else612defaultCharset = forName("UTF-8");613}614}615return defaultCharset;616}617618619/* -- Instance fields and methods -- */620621private final String name; // tickles a bug in oldjavac622private final String[] aliases; // tickles a bug in oldjavac623private Set<String> aliasSet = null;624625/**626* Initializes a new charset with the given canonical name and alias627* set.628*629* @param canonicalName630* The canonical name of this charset631*632* @param aliases633* An array of this charset's aliases, or null if it has no aliases634*635* @throws IllegalCharsetNameException636* If the canonical name or any of the aliases are illegal637*/638protected Charset(String canonicalName, String[] aliases) {639checkName(canonicalName);640String[] as = (aliases == null) ? new String[0] : aliases;641for (int i = 0; i < as.length; i++)642checkName(as[i]);643this.name = canonicalName;644this.aliases = as;645}646647/**648* Returns this charset's canonical name.649*650* @return The canonical name of this charset651*/652public final String name() {653return name;654}655656/**657* Returns a set containing this charset's aliases.658*659* @return An immutable set of this charset's aliases660*/661public final Set<String> aliases() {662if (aliasSet != null)663return aliasSet;664int n = aliases.length;665HashSet<String> hs = new HashSet<String>(n);666for (int i = 0; i < n; i++)667hs.add(aliases[i]);668aliasSet = Collections.unmodifiableSet(hs);669return aliasSet;670}671672/**673* Returns this charset's human-readable name for the default locale.674*675* <p> The default implementation of this method simply returns this676* charset's canonical name. Concrete subclasses of this class may677* override this method in order to provide a localized display name. </p>678*679* @return The display name of this charset in the default locale680*/681public String displayName() {682return name;683}684685/**686* Tells whether or not this charset is registered in the <a687* href="http://www.iana.org/assignments/character-sets">IANA Charset688* Registry</a>.689*690* @return <tt>true</tt> if, and only if, this charset is known by its691* implementor to be registered with the IANA692*/693public final boolean isRegistered() {694return !name.startsWith("X-") && !name.startsWith("x-");695}696697/**698* Returns this charset's human-readable name for the given locale.699*700* <p> The default implementation of this method simply returns this701* charset's canonical name. Concrete subclasses of this class may702* override this method in order to provide a localized display name. </p>703*704* @param locale705* The locale for which the display name is to be retrieved706*707* @return The display name of this charset in the given locale708*/709public String displayName(Locale locale) {710return name;711}712713/**714* Tells whether or not this charset contains the given charset.715*716* <p> A charset <i>C</i> is said to <i>contain</i> a charset <i>D</i> if,717* and only if, every character representable in <i>D</i> is also718* representable in <i>C</i>. If this relationship holds then it is719* guaranteed that every string that can be encoded in <i>D</i> can also be720* encoded in <i>C</i> without performing any replacements.721*722* <p> That <i>C</i> contains <i>D</i> does not imply that each character723* representable in <i>C</i> by a particular byte sequence is represented724* in <i>D</i> by the same byte sequence, although sometimes this is the725* case.726*727* <p> Every charset contains itself.728*729* <p> This method computes an approximation of the containment relation:730* If it returns <tt>true</tt> then the given charset is known to be731* contained by this charset; if it returns <tt>false</tt>, however, then732* it is not necessarily the case that the given charset is not contained733* in this charset.734*735* @param cs736* The given charset737*738* @return <tt>true</tt> if the given charset is contained in this charset739*/740public abstract boolean contains(Charset cs);741742/**743* Constructs a new decoder for this charset.744*745* @return A new decoder for this charset746*/747public abstract CharsetDecoder newDecoder();748749/**750* Constructs a new encoder for this charset.751*752* @return A new encoder for this charset753*754* @throws UnsupportedOperationException755* If this charset does not support encoding756*/757public abstract CharsetEncoder newEncoder();758759/**760* Tells whether or not this charset supports encoding.761*762* <p> Nearly all charsets support encoding. The primary exceptions are763* special-purpose <i>auto-detect</i> charsets whose decoders can determine764* which of several possible encoding schemes is in use by examining the765* input byte sequence. Such charsets do not support encoding because766* there is no way to determine which encoding should be used on output.767* Implementations of such charsets should override this method to return768* <tt>false</tt>. </p>769*770* @return <tt>true</tt> if, and only if, this charset supports encoding771*/772public boolean canEncode() {773return true;774}775776/**777* Convenience method that decodes bytes in this charset into Unicode778* characters.779*780* <p> An invocation of this method upon a charset <tt>cs</tt> returns the781* same result as the expression782*783* <pre>784* cs.newDecoder()785* .onMalformedInput(CodingErrorAction.REPLACE)786* .onUnmappableCharacter(CodingErrorAction.REPLACE)787* .decode(bb); </pre>788*789* except that it is potentially more efficient because it can cache790* decoders between successive invocations.791*792* <p> This method always replaces malformed-input and unmappable-character793* sequences with this charset's default replacement byte array. In order794* to detect such sequences, use the {@link795* CharsetDecoder#decode(java.nio.ByteBuffer)} method directly. </p>796*797* @param bb The byte buffer to be decoded798*799* @return A char buffer containing the decoded characters800*/801public final CharBuffer decode(ByteBuffer bb) {802try {803return ThreadLocalCoders.decoderFor(this)804.onMalformedInput(CodingErrorAction.REPLACE)805.onUnmappableCharacter(CodingErrorAction.REPLACE)806.decode(bb);807} catch (CharacterCodingException x) {808throw new Error(x); // Can't happen809}810}811812/**813* Convenience method that encodes Unicode characters into bytes in this814* charset.815*816* <p> An invocation of this method upon a charset <tt>cs</tt> returns the817* same result as the expression818*819* <pre>820* cs.newEncoder()821* .onMalformedInput(CodingErrorAction.REPLACE)822* .onUnmappableCharacter(CodingErrorAction.REPLACE)823* .encode(bb); </pre>824*825* except that it is potentially more efficient because it can cache826* encoders between successive invocations.827*828* <p> This method always replaces malformed-input and unmappable-character829* sequences with this charset's default replacement string. In order to830* detect such sequences, use the {@link831* CharsetEncoder#encode(java.nio.CharBuffer)} method directly. </p>832*833* @param cb The char buffer to be encoded834*835* @return A byte buffer containing the encoded characters836*/837public final ByteBuffer encode(CharBuffer cb) {838try {839return ThreadLocalCoders.encoderFor(this)840.onMalformedInput(CodingErrorAction.REPLACE)841.onUnmappableCharacter(CodingErrorAction.REPLACE)842.encode(cb);843} catch (CharacterCodingException x) {844throw new Error(x); // Can't happen845}846}847848/**849* Convenience method that encodes a string into bytes in this charset.850*851* <p> An invocation of this method upon a charset <tt>cs</tt> returns the852* same result as the expression853*854* <pre>855* cs.encode(CharBuffer.wrap(s)); </pre>856*857* @param str The string to be encoded858*859* @return A byte buffer containing the encoded characters860*/861public final ByteBuffer encode(String str) {862return encode(CharBuffer.wrap(str));863}864865/**866* Compares this charset to another.867*868* <p> Charsets are ordered by their canonical names, without regard to869* case. </p>870*871* @param that872* The charset to which this charset is to be compared873*874* @return A negative integer, zero, or a positive integer as this charset875* is less than, equal to, or greater than the specified charset876*/877public final int compareTo(Charset that) {878return (name().compareToIgnoreCase(that.name()));879}880881/**882* Computes a hashcode for this charset.883*884* @return An integer hashcode885*/886public final int hashCode() {887return name().hashCode();888}889890/**891* Tells whether or not this object is equal to another.892*893* <p> Two charsets are equal if, and only if, they have the same canonical894* names. A charset is never equal to any other type of object. </p>895*896* @return <tt>true</tt> if, and only if, this charset is equal to the897* given object898*/899public final boolean equals(Object ob) {900if (!(ob instanceof Charset))901return false;902if (this == ob)903return true;904return name.equals(((Charset)ob).name());905}906907/**908* Returns a string describing this charset.909*910* @return A string describing this charset911*/912public final String toString() {913return name();914}915916}917918919