Java Tutorial

How to Count Characters in Java

Java strings are UTF-16 sequences — like JavaScript. String.length() returns code units, which for emoji means surrogate pair surprises. Use codePointCount() for actual Unicode characters and BreakIterator for visible graphemes. To see a working browser-based version of the same logic, try our Character Counter; for the equivalent JavaScript pattern, see the JavaScript tutorial.

Method 1: String.length() (code units)

Returns the number of UTF-16 code units. Reliable for ASCII; surrogate pairs for emoji count as 2.

java
String text = "Hello, world!";
System.out.println(text.length()); // 13

System.out.println("😀".length());  // 2 — surrogate pair
System.out.println("中".length());   // 1 — BMP character

Method 2: codePointCount() (Unicode characters)

Counts actual Unicode code points, treating surrogate pairs as one.

java
String text = "Hi 😀!";
System.out.println(text.length());                              // 6
System.out.println(text.codePointCount(0, text.length()));      // 5

Method 3: BreakIterator for grapheme clusters

The most accurate way to count user-perceived characters in Java.

java
import java.text.BreakIterator;

public static int graphemeCount(String text) {
    BreakIterator iterator = BreakIterator.getCharacterInstance();
    iterator.setText(text);
    int count = 0;
    while (iterator.next() != BreakIterator.DONE) {
        count++;
    }
    return count;
}

// "👨‍👩‍👧‍👦" — family emoji
System.out.println(graphemeCount("👨‍👩‍👧‍👦"));  // 1 — correct!

Method 4: Byte counting (UTF-8)

For storage and API sizing — count bytes after encoding.

java
import java.nio.charset.StandardCharsets;

String text = "Hello, 世界 😀";

int utf8Bytes = text.getBytes(StandardCharsets.UTF_8).length;
System.out.println(utf8Bytes);  // 17

int utf16Bytes = text.getBytes(StandardCharsets.UTF_16LE).length;
System.out.println(utf16Bytes); // 24

Method 5: Specific character counts

Common patterns for counting categories or specific characters.

java
String text = "Hello, world!";

// Characters without spaces
long noSpaces = text.chars().filter(c -> !Character.isWhitespace(c)).count();

// Letter count
long letters = text.chars().filter(Character::isLetter).count();

// Count occurrences of 'l'
long ls = text.chars().filter(c -> c == 'l').count();

Common Pitfalls

String.length() on emoji is misleading

Emoji are usually surrogate pairs in UTF-16. 'A'.length() == 1; '😀'.length() == 2. Always use codePointCount() or BreakIterator for user-facing character limits.

Char arrays don't match Unicode

char[] in Java is UTF-16 code units, not Unicode characters. Iterating with text.toCharArray() splits surrogate pairs.

Stream chars() returns ints

text.chars() gives you ints (code units, not code points). Use text.codePoints() if you need code points.

See a Working Character Counter

Our Character Counter is built using the patterns from this tutorial. Open the dev tools to inspect the live implementation.

📊Open Character Counter

FAQ

String.length() counts UTF-16 code units. Emoji outside the Basic Multilingual Plane are surrogate pairs, counting as 2 code units. Use codePointCount() or BreakIterator for visible character counts.

Tutorials in Other Languages