Java Tutorial
Operators
Flow Control
String
Number and Date
Built-in Classes
Array
Class and Object
Inheritance and Polymorphism
Exception Handling
Collections, Generics and Enumerations
Reflection
Input/Output Stream
Annotation
Encoding is the process of converting data from one form to another, usually to facilitate data transmission or storage. In Java, character encoding refers to converting characters (text) to bytes using a specified character set, such as UTF-8, UTF-16, or ISO-8859-1. Decoding is the reverse process of converting bytes back to characters.
In this tutorial, we'll explore how to perform character encoding and decoding using Java's Charset
, CharsetEncoder
, CharsetDecoder
, and related classes.
The Charset
class in the java.nio.charset
package represents a character set that can be used for encoding and decoding. You can obtain a Charset
instance using the forName()
static method:
import java.nio.charset.Charset; public class EncodingExample { public static void main(String[] args) { Charset utf8 = Charset.forName("UTF-8"); System.out.println("Charset: " + utf8); } }
To encode characters (text) to bytes using a specific character set, you can use the encode()
method of the Charset
class:
import java.nio.ByteBuffer; import java.nio.charset.Charset; public class EncodingExample { public static void main(String[] args) { Charset utf8 = Charset.forName("UTF-8"); String text = "Hello, World!"; ByteBuffer byteBuffer = utf8.encode(text); byte[] bytes = byteBuffer.array(); for (byte b : bytes) { System.out.printf("%02X ", b); } } }
In the example above, we encoded the text
string to bytes using the UTF-8 character set.
To decode bytes back to characters using a specific character set, you can use the decode()
method of the Charset
class:
import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.Charset; public class EncodingExample { public static void main(String[] args) { Charset utf8 = Charset.forName("UTF-8"); byte[] bytes = new byte[]{48, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33}; ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); CharBuffer charBuffer = utf8.decode(byteBuffer); String text = charBuffer.toString(); System.out.println("Decoded text: " + text); } }
In the example above, we decoded the bytes
array back to a String
using the UTF-8 character set.
Java charset and encoding overview
The Charset
class in Java represents character encodings. It provides constants for standard charsets and methods to obtain Charset
instances.
Charset utf8Charset = Charset.forName("UTF-8");
UTF-8 encoding in Java
UTF-8 is a widely used character encoding. You can encode and decode strings using UTF-8 in Java.
String text = "Hello, UTF-8!"; byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8); String decodedText = new String(utf8Bytes, StandardCharsets.UTF_8);
Encoding and decoding base64 in Java
Base64 encoding is often used for encoding binary data as text. Java provides Base64
class for encoding and decoding.
String originalText = "Base64 encoding in Java"; String encodedText = Base64.getEncoder().encodeToString(originalText.getBytes(StandardCharsets.UTF_8)); byte[] decodedBytes = Base64.getDecoder().decode(encodedText); String decodedText = new String(decodedBytes, StandardCharsets.UTF_8);
Handling different character encodings in Java
When dealing with different character encodings, it's important to specify the encoding explicitly to avoid unexpected behavior.
String text = "Hello, Encoding!"; byte[] iso8859Bytes = text.getBytes(StandardCharsets.ISO_8859_1); String decodedText = new String(iso8859Bytes, StandardCharsets.ISO_8859_1);
Java InputStreamReader and OutputStreamWriter encoding
When working with streams, you can use InputStreamReader
and OutputStreamWriter
to specify the character encoding.
FileInputStream fis = new FileInputStream("input.txt"); InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8); FileOutputStream fos = new FileOutputStream("output.txt"); OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8);
Java encoding and decoding URL parameters
When dealing with URL parameters, it's essential to encode and decode them to handle special characters.
String originalParam = "Hello, World!"; String encodedParam = URLEncoder.encode(originalParam, StandardCharsets.UTF_8); String decodedParam = URLDecoder.decode(encodedParam, StandardCharsets.UTF_8);
Default character encoding in Java
The default character encoding in Java is determined by the system. You can obtain it using Charset.defaultCharset()
.
Charset defaultCharset = Charset.defaultCharset(); System.out.println("Default Charset: " + defaultCharset);