🐍
Python Tutorial

How to Count Characters in Python

Python's len() works well for counting characters because Python 3 strings are Unicode code-point sequences β€” no surrogate-pair surprises like JavaScript. But there are still gotchas around combining marks and byte counting. For a browser-based reference implementation, see our Character Counter; for emoji-specific edge cases, try our Emoji Counter.

Method 1: len() returns code points

Python 3 strings are sequences of Unicode code points. len() returns the count directly β€” no UTF-16 surprises.

python
text = "Hello, world!"
print(len(text))         # 13

print(len("πŸ˜€"))          # 1 β€” correct in Python 3
print(len("δΈ­ζ–‡"))         # 2

# Combining marks still count separately:
print(len("é"))     # 2 β€” e + combining acute accent

Method 2: Normalize before counting (combining marks)

If your text mixes precomposed and decomposed accented characters, normalize first.

python
import unicodedata

text_decomposed = "éllo"   # uses combining acute
text_normalized = unicodedata.normalize('NFC', text_decomposed)

print(len(text_decomposed))  # 5
print(len(text_normalized))  # 4 β€” Γ© now one code point

Method 3: Grapheme clusters

Python's standard library doesn't include grapheme segmentation. Use the third-party 'grapheme' or 'regex' package for accurate visible-character counts.

python
# pip install grapheme
import grapheme

text = "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ family"
print(len(text))                 # 13 β€” code points (the family emoji is 7 cp)
print(grapheme.length(text))     # 8  β€” visible characters

# Alternative: pip install regex
import regex
print(len(regex.findall(r'\X', text)))  # 8

Method 4: Byte counting

For storage, network, and API limits, encode the string and count bytes.

python
text = "Hello, δΈ–η•Œ πŸ˜€"

# UTF-8 byte count
print(len(text.encode('utf-8')))   # 17

# Compare with character count
print(len(text))                    # 11

# UTF-16 (2 bytes per BMP char, 4 per emoji)
print(len(text.encode('utf-16-le')))  # 24

Method 5: Characters without whitespace

Quick filter for non-whitespace characters.

python
text = "Hello, world!"
print(len(text.replace(' ', '')))       # 12
print(sum(1 for c in text if not c.isspace()))  # 12 β€” same result

# Count specific categories
import string
letters_only = sum(1 for c in text if c in string.ascii_letters)
print(letters_only)  # 10

Common Pitfalls

⚠Python 2 was different

Python 2 strings were bytes by default. If you're maintaining legacy code, len() on a Python 2 str gives bytes, not characters. Use u'string' literals or upgrade.

⚠Emoji ZWJ sequences

The family emoji πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ is built from 7 code points joined by zero-width joiners. len() returns 7; users see 1. Use grapheme segmentation for visible count.

⚠Normalization matters for comparison

If you compare strings (e.g., search, deduplication), normalize both with unicodedata.normalize('NFC', s) first. Otherwise visually identical strings won't match.

See a Working Character Counter

Our Character Counter is built using the patterns from this tutorial. Open the dev tools to inspect the live implementation.

πŸ“ŠOpen Character Counter

FAQ

In Python 3, yes β€” len() returns the number of Unicode code points. Each emoji counts as one (unless it's a ZWJ sequence, which counts as multiple code points).

Tutorials in Other Languages