How to Count Characters in Python
Python's len() works well for counting characters because Python 3 strings are Unicode code-point sequences β no surrogate-pair surprises like JavaScript. But there are still gotchas around combining marks and byte counting. For a browser-based reference implementation, see our Character Counter; for emoji-specific edge cases, try our Emoji Counter.
Method 1: len() returns code points
Python 3 strings are sequences of Unicode code points. len() returns the count directly β no UTF-16 surprises.
text = "Hello, world!"
print(len(text)) # 13
print(len("π")) # 1 β correct in Python 3
print(len("δΈζ")) # 2
# Combining marks still count separately:
print(len("eΜ")) # 2 β e + combining acute accentMethod 2: Normalize before counting (combining marks)
If your text mixes precomposed and decomposed accented characters, normalize first.
import unicodedata
text_decomposed = "eΜllo" # uses combining acute
text_normalized = unicodedata.normalize('NFC', text_decomposed)
print(len(text_decomposed)) # 5
print(len(text_normalized)) # 4 β Γ© now one code pointMethod 3: Grapheme clusters
Python's standard library doesn't include grapheme segmentation. Use the third-party 'grapheme' or 'regex' package for accurate visible-character counts.
# pip install grapheme
import grapheme
text = "π¨βπ©βπ§βπ¦ family"
print(len(text)) # 13 β code points (the family emoji is 7 cp)
print(grapheme.length(text)) # 8 β visible characters
# Alternative: pip install regex
import regex
print(len(regex.findall(r'\X', text))) # 8Method 4: Byte counting
For storage, network, and API limits, encode the string and count bytes.
text = "Hello, δΈη π"
# UTF-8 byte count
print(len(text.encode('utf-8'))) # 17
# Compare with character count
print(len(text)) # 11
# UTF-16 (2 bytes per BMP char, 4 per emoji)
print(len(text.encode('utf-16-le'))) # 24Method 5: Characters without whitespace
Quick filter for non-whitespace characters.
text = "Hello, world!"
print(len(text.replace(' ', ''))) # 12
print(sum(1 for c in text if not c.isspace())) # 12 β same result
# Count specific categories
import string
letters_only = sum(1 for c in text if c in string.ascii_letters)
print(letters_only) # 10Common Pitfalls
β Python 2 was different
Python 2 strings were bytes by default. If you're maintaining legacy code, len() on a Python 2 str gives bytes, not characters. Use u'string' literals or upgrade.
β Emoji ZWJ sequences
The family emoji π¨βπ©βπ§βπ¦ is built from 7 code points joined by zero-width joiners. len() returns 7; users see 1. Use grapheme segmentation for visible count.
β Normalization matters for comparison
If you compare strings (e.g., search, deduplication), normalize both with unicodedata.normalize('NFC', s) first. Otherwise visually identical strings won't match.
See a Working Character Counter
Our Character Counter is built using the patterns from this tutorial. Open the dev tools to inspect the live implementation.
πOpen Character Counter