Java Tutorial

Operators

Flow Control

String

Number and Date

Built-in Classes

Array

Class and Object

Inheritance and Polymorphism

Exception Handling

Collections, Generics and Enumerations

Reflection

Input/Output Stream

Annotation

Java Encoding

Encoding is the process of converting data from one form to another, usually to facilitate data transmission or storage. In Java, character encoding refers to converting characters (text) to bytes using a specified character set, such as UTF-8, UTF-16, or ISO-8859-1. Decoding is the reverse process of converting bytes back to characters.

In this tutorial, we'll explore how to perform character encoding and decoding using Java's Charset, CharsetEncoder, CharsetDecoder, and related classes.

  • Charset

The Charset class in the java.nio.charset package represents a character set that can be used for encoding and decoding. You can obtain a Charset instance using the forName() static method:

import java.nio.charset.Charset;

public class EncodingExample {
    public static void main(String[] args) {
        Charset utf8 = Charset.forName("UTF-8");
        System.out.println("Charset: " + utf8);
    }
}
  • Encoding characters to bytes

To encode characters (text) to bytes using a specific character set, you can use the encode() method of the Charset class:

import java.nio.ByteBuffer;
import java.nio.charset.Charset;

public class EncodingExample {
    public static void main(String[] args) {
        Charset utf8 = Charset.forName("UTF-8");
        String text = "Hello, World!";

        ByteBuffer byteBuffer = utf8.encode(text);
        byte[] bytes = byteBuffer.array();

        for (byte b : bytes) {
            System.out.printf("%02X ", b);
        }
    }
}

In the example above, we encoded the text string to bytes using the UTF-8 character set.

  • Decoding bytes to characters

To decode bytes back to characters using a specific character set, you can use the decode() method of the Charset class:

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;

public class EncodingExample {
    public static void main(String[] args) {
        Charset utf8 = Charset.forName("UTF-8");
        byte[] bytes = new byte[]{48, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33};

        ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
        CharBuffer charBuffer = utf8.decode(byteBuffer);
        String text = charBuffer.toString();

        System.out.println("Decoded text: " + text);
    }
}

In the example above, we decoded the bytes array back to a String using the UTF-8 character set.

  1. Java charset and encoding overview

    The Charset class in Java represents character encodings. It provides constants for standard charsets and methods to obtain Charset instances.

    Charset utf8Charset = Charset.forName("UTF-8");
    
  2. UTF-8 encoding in Java

    UTF-8 is a widely used character encoding. You can encode and decode strings using UTF-8 in Java.

    String text = "Hello, UTF-8!";
    byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8);
    String decodedText = new String(utf8Bytes, StandardCharsets.UTF_8);
    
  3. Encoding and decoding base64 in Java

    Base64 encoding is often used for encoding binary data as text. Java provides Base64 class for encoding and decoding.

    String originalText = "Base64 encoding in Java";
    String encodedText = Base64.getEncoder().encodeToString(originalText.getBytes(StandardCharsets.UTF_8));
    byte[] decodedBytes = Base64.getDecoder().decode(encodedText);
    String decodedText = new String(decodedBytes, StandardCharsets.UTF_8);
    
  4. Handling different character encodings in Java

    When dealing with different character encodings, it's important to specify the encoding explicitly to avoid unexpected behavior.

    String text = "Hello, Encoding!";
    byte[] iso8859Bytes = text.getBytes(StandardCharsets.ISO_8859_1);
    String decodedText = new String(iso8859Bytes, StandardCharsets.ISO_8859_1);
    
  5. Java InputStreamReader and OutputStreamWriter encoding

    When working with streams, you can use InputStreamReader and OutputStreamWriter to specify the character encoding.

    FileInputStream fis = new FileInputStream("input.txt");
    InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
    
    FileOutputStream fos = new FileOutputStream("output.txt");
    OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8);
    
  6. Java encoding and decoding URL parameters

    When dealing with URL parameters, it's essential to encode and decode them to handle special characters.

    String originalParam = "Hello, World!";
    String encodedParam = URLEncoder.encode(originalParam, StandardCharsets.UTF_8);
    String decodedParam = URLDecoder.decode(encodedParam, StandardCharsets.UTF_8);
    
  7. Default character encoding in Java

    The default character encoding in Java is determined by the system. You can obtain it using Charset.defaultCharset().

    Charset defaultCharset = Charset.defaultCharset();
    System.out.println("Default Charset: " + defaultCharset);