JS
JavaScript Tutorial

How to Count Characters in JavaScript

Counting characters in JavaScript is deceptively tricky. The simple str.length works for ASCII but breaks on emoji and combining characters. This tutorial covers five methods — each with its own trade-offs. For a working live implementation, see our Character Counter; for the React version of this code, jump to the React tutorial.

Method 1: The basic .length property (code units)

JavaScript strings are sequences of UTF-16 code units. str.length counts code units, not visible characters. Fast and reliable for plain ASCII.

javascript
const text = "Hello, world!";
console.log(text.length); // 13

// But: emojis are usually 2 code units
console.log("😀".length); // 2 — surprises beginners

// And combining characters break it
console.log("é".length); // 1 (precomposed) or 2 (decomposed)

Method 2: Code points (Array.from)

Use Array.from() to split a string into code points instead of code units. Handles surrogate pairs (emoji) correctly, but still doesn't merge combining marks.

javascript
const text = "Hi 😀!";
console.log(Array.from(text).length); // 5 — correct visual count
console.log([...text].length);         // 5 — same thing, spread syntax

// Compare:
console.log(text.length); // 6 (emoji counts as 2 code units)

Method 3: Grapheme clusters (Intl.Segmenter)

The gold standard. Intl.Segmenter implements Unicode's Extended Grapheme Cluster algorithm — matches what users perceive as one character, including combining marks and family emoji.

javascript
function graphemeCount(text) {
  const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
  let count = 0;
  for (const _ of segmenter.segment(text)) count++;
  return count;
}

console.log(graphemeCount("👨‍👩‍👧‍👦")); // 1 — true visual count
console.log("👨‍👩‍👧‍👦".length);          // 11 — code units
console.log(Array.from("👨‍👩‍👧‍👦").length); // 7 — code points

Method 4: Bytes in UTF-8 (TextEncoder)

For storage, API limits, or network sizing — count bytes, not characters. TextEncoder converts strings to UTF-8 byte arrays.

javascript
function byteLength(text) {
  return new TextEncoder().encode(text).length;
}

console.log(byteLength("hello"));   // 5  (ASCII = 1 byte each)
console.log(byteLength("héllo"));   // 6  (é = 2 bytes)
console.log(byteLength("中文"));     // 6  (Chinese = 3 bytes each)
console.log(byteLength("😀"));      // 4  (emoji = 4 bytes)

Method 5: Characters minus whitespace

Common requirement for word processors and assignments — count visible non-space characters.

javascript
function charactersNoSpaces(text) {
  return text.replace(/\s/g, '').length;
}

console.log(charactersNoSpaces("Hello, world!")); // 12 (excludes the space)

Common Pitfalls

Don't use .length for emoji

If your app accepts emoji and enforces a character limit, use grapheme counting (Intl.Segmenter). Otherwise users see 'too long' errors for what looks like short text.

Combining marks vary by source

'é' may be one code point (U+00E9, precomposed) or two (U+0065 + U+0301, decomposed). Always normalize first with str.normalize('NFC') for consistency.

Bytes ≠ characters

Database VARCHAR(255) is 255 bytes, not 255 characters. A 100-character emoji-heavy string can hit 400+ bytes. Plan column sizes accordingly.

Intl.Segmenter is recent

Available in modern Node.js (16+) and all current browsers. For older environments, use a polyfill like grapheme-splitter.

See a Working Character Counter

Our Character Counter is built using the patterns from this tutorial. Open the dev tools to inspect the live implementation.

📊Open Character Counter

FAQ

Code units are UTF-16 units that JavaScript stores. Code points are Unicode characters (some take 2 code units). Graphemes are what users see as one character (some take multiple code points). 'Visible character count' = grapheme count.

Tutorials in Other Languages