Document Chunking for AI: Token vs Sentence Methods

Jul 21, 2025· PolicyChatbot Team

Okay, confession time.

Last year, I watched a startup burn through £50,000 in OpenAI credits in one weekend. One. Weekend.

Want to know why?

They uploaded their entire documentation library – 10,000 documents – without chunking it properly. Every query searched through massive walls of text. Their token usage went through the roof. Their CFO almost had a heart attack.

The crazy part? They could’ve avoided the whole disaster by understanding one simple concept: document chunking.

It’s like the difference between trying to eat an entire pizza in one bite (you’ll choke) versus cutting it into slices (actually enjoyable). Except with documents. And AI. And thousands of pounds at stake.

Why Chunking Can Make or Break Your Chatbot

Here’s the thing nobody tells you about AI chatbots…

They can’t actually read your entire employee handbook at once. Or your policy manual. Or any document longer than a few pages.

Why? Token limits.

GPT-4: 128k tokens (about 96k words)
Claude: 200k tokens (about 150k words)
Most models: 4k-32k tokens

Sounds like a lot? Your average employee handbook is 40,000 words. Your complete policy documentation? Probably 200,000+ words.

Even if you could stuff it all in (you can’t), it would cost a fortune. Every token costs money. Both for embedding and for generation.

So we chunk. We split documents into bite-sized pieces. But here’s where it gets interesting… HOW you chunk determines whether your chatbot is brilliant or brain-dead.

The Three Chunking Methods That Matter

Method 1: Token Chunking (The Programmer’s Choice)

Token chunking is exactly what it sounds like. Count tokens, split when you hit the limit.

def token_chunking(text, max_tokens=512):
    tokens = tokenizer.encode(text)
    chunks = []
    
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i+max_tokens]
        chunk_text = tokenizer.decode(chunk_tokens)
        chunks.append(chunk_text)
    
    return chunks

Simple. Clean. Totally wrong for most use cases.

Watch what happens:

Original text: “Employees are eligible for remote work after 6 months of employment. To apply, submit form RW-101 to your manager. Approval requires director sign-off.”

Token chunked (badly):

Chunk 1: “Employees are eligible for remote work after 6 months of employment. To apply, submit form RW-”
Chunk 2: “101 to your manager. Approval requires director sign-off.”

Congratulations. You just split the form number in half. When someone searches for “form RW-101”, they get nothing.

Method 2: Sentence Chunking (The Linguist’s Choice)

Sentence chunking respects natural language boundaries.

def sentence_chunking(text, min_sentences=3, max_tokens=512):
    sentences = text.split('. ')
    chunks = []
    current_chunk = []
    current_tokens = 0
    
    for sentence in sentences:
        sentence_tokens = count_tokens(sentence)
        
        if current_tokens + sentence_tokens > max_tokens:
            chunks.append('. '.join(current_chunk) + '.')
            current_chunk = [sentence]
            current_tokens = sentence_tokens
        else:
            current_chunk.append(sentence)
            current_tokens += sentence_tokens
    
    return chunks

Better. Much better. But watch this:

Original text: “Remote Work Policy

Eligibility:

6 months employment
Good performance review
Manager approval

Process:

Submit form RW-101
Manager review (5 days)
Director approval (3 days)”

Sentence chunked:

Chunk 1: “Remote Work Policy”
Chunk 2: “Eligibility: - 6 months employment - Good performance review - Manager approval”
Chunk 3: “Process: 1. Submit form RW-101 2. Manager review (5 days) 3.”
Chunk 4: “Director approval (3 days)”

We kept sentences intact but lost the structure. The title is separated from its content. The process is split randomly.

Method 3: Recursive Chunking (The Smart Choice)

Recursive chunking understands document structure.

def recursive_chunking(text, max_tokens=512, separators=["\n\n", "\n", ". ", " "]):
    if count_tokens(text) <= max_tokens:
        return [text]
    
    for separator in separators:
        parts = text.split(separator)
        chunks = []
        current_chunk = ""
        
        for part in parts:
            if count_tokens(current_chunk + separator + part) <= max_tokens:
                current_chunk += separator + part if current_chunk else part
            else:
                if current_chunk:
                    chunks.append(current_chunk)
                current_chunk = part
        
        if current_chunk:
            chunks.append(current_chunk)
            
        if all(count_tokens(chunk) <= max_tokens for chunk in chunks):
            return chunks
    
    # If we get here, forcefully split
    return token_chunking(text, max_tokens)

Now watch the magic:

Recursive chunked:

Chunk 1: “Remote Work Policy\n\nEligibility:\n- 6 months employment\n- Good performance review\n- Manager approval”
Chunk 2: “Process:\n1. Submit form RW-101\n2. Manager review (5 days)\n3. Director approval (3 days)”

Perfect. The policy stays together. The structure is preserved. Context is maintained.

The Real-World Impact

Let me show you what happens with actual documents:

Test Document: Employee Termination Policy (2,000 words)

Token Chunking:

4 chunks
Form references split: 3 times
Process steps broken: 5 times
Search accuracy: 61%

Sentence Chunking:

6 chunks
Sections partially preserved
Related info scattered
Search accuracy: 74%

Recursive Chunking:

5 chunks
All sections intact
Logical groupings maintained
Search accuracy: 92%

That’s a 31% improvement in accuracy. For free. Just by chunking smarter.

The Overlap Problem Nobody Talks About

Here’s something that’ll blow your mind…

Chunks shouldn’t be independent islands. They need overlap.

Why? Context.

Look at this:

Chunk 1 (no overlap): “…employees must submit the request 30 days in advance.”

Chunk 2 (no overlap): “Late submissions require VP approval…”

What request? Submit to whom? We lost critical context.

Chunk 1 (with 50-token overlap): “…employees must submit the request 30 days in advance.”

Chunk 2 (with 50-token overlap): “…submit the request 30 days in advance. Late submissions require VP approval…”

Now the second chunk has context. It knows we’re talking about whatever request was mentioned.

But don’t go crazy:

No overlap: Lost context
10% overlap: Minimal context
20-25% overlap: Sweet spot
50% overlap: Wasteful duplication
75% overlap: You’re basically not chunking

The Size Dilemma

“What size chunks should I use?”

Everyone asks this. The answer? It depends. (I know, I hate that answer too.)

Small Chunks (256 tokens)

Pros:

Precise retrieval
Lower API costs per query
More chunks in context window

Cons:

Lost context
More embeddings to generate
Fragmented information

Best for: FAQ-style questions, definitions, quick lookups

Medium Chunks (512 tokens)

Pros:

Balanced context
Good retrieval accuracy
Reasonable costs

Cons:

Some topics still split
Occasional context loss

Best for: Most use cases, policy documents, procedures

Large Chunks (1024+ tokens)

Pros:

Complete context
Whole topics together
Fewer embeddings

Cons:

Less precise retrieval
Higher API costs per query
Fewer chunks fit in context

Best for: Complex technical documentation, legal documents

The Hidden Costs of Bad Chunking

That startup I mentioned? Let’s break down their disaster:

Their approach:

10,000 documents
Average 5,000 words each
No chunking (tried to embed entire documents)
Using text-embedding-ada-002

The math:

50 million words total
≈ 67 million tokens
Embedding cost: £0.0001 per 1k tokens
Total: £6,700 just for embeddings

But wait, it gets worse:

Most documents exceeded model limits
Had to retry with smaller models
Still failed
Tried to chunk on the fly
Inconsistent chunk sizes
Overlapping embeddings
Final cost: £50,000+

What they should have done:

Pre-chunk with recursive method
512 tokens per chunk with 100-token overlap
≈ 130,000 chunks
Total embedding cost: £670
74x cost reduction

The PolicyChatbot Approach

Here’s how PolicyChatbot handles chunking (so you don’t have to):

Smart Auto-Detection

PolicyChatbot analyzes your document:

Structured document? → Recursive chunking
Narrative text? → Sentence chunking
Technical specs? → Token chunking
Mixed content? → Hybrid approach

You don’t choose. It knows.

Dynamic Sizing

Different sections, different sizes:

Executive summary: 256 tokens (high-level, needs precision)
Detailed procedures: 512 tokens (balanced)
Appendices: 1024 tokens (reference material)

Intelligent Overlap

Overlap varies by content:

Sequential steps: 30% overlap
Independent sections: 10% overlap
Critical procedures: 40% overlap

Metadata Preservation

Every chunk remembers:

Source document
Section heading
Page number
Hierarchy level
Related chunks

This is huge. When someone asks about “vacation policy”, the chatbot knows if they’re looking at the summary or the detailed procedure.

The Chunking Strategies That Actually Work

Strategy 1: Header-Aware Chunking

Never separate headers from their content:

## Vacation Policy  ← Keep these
Employees receive...  ← together

PolicyChatbot does this automatically. DIY? Add 50 lines of code.

Strategy 2: List-Preserving Chunking

Never split lists:

Requirements:        ← Keep
1. Form A           ← all
2. Manager approval ← items
3. HR review        ← together

Sounds obvious? 90% of chunking libraries split lists.

Strategy 3: Table-Aware Chunking

Tables are special:

| Role      | Days |  ← Keep entire
|-----------|------|  ← table as
| Junior    | 15   |  ← one chunk
| Senior    | 20   |  ← if possible

Most systems treat tables as text. Disaster.

Strategy 4: Context Windows

Each chunk should be self-contained:

❌ Bad: “…must be submitted by the deadline.” ✅ Good: “Form RW-101 must be submitted by the deadline.”

Add minimal context to make chunks standalone.

Real Implementation Examples

Let me show you actual code that works:

Example 1: Policy Document

def chunk_policy_document(text):
    # First, split by major sections
    sections = re.split(r'\n#{1,3}\s', text)
    
    chunks = []
    for section in sections:
        if count_tokens(section) <= 512:
            chunks.append(section)
        else:
            # Recursive chunk within sections
            subchunks = recursive_chunk(
                section,
                max_tokens=512,
                overlap=100
            )
            chunks.extend(subchunks)
    
    return chunks

Example 2: FAQ Document

def chunk_faq(text):
    # Each Q&A pair is one chunk
    qa_pairs = re.split(r'\n(?=Q:)', text)
    
    chunks = []
    for qa in qa_pairs:
        if count_tokens(qa) <= 512:
            chunks.append(qa)
        else:
            # Question too long, split answer only
            q, a = qa.split('\nA:', 1)
            chunks.append(q + '\nA: [See next chunk for full answer]')
            
            # Chunk the answer
            answer_chunks = sentence_chunk(a, max_tokens=400)
            chunks.extend(answer_chunks)
    
    return chunks

The Metrics That Matter

How do you know if your chunking is working?

Retrieval Precision

What percentage of retrieved chunks actually contain the answer?

Bad chunking: 40-50%
Good chunking: 75-85%
PolicyChatbot: 92%+

Context Completeness

Does the chunk contain all necessary context?

Bad: “…submit the form…”
Good: “To request remote work, submit form RW-101…”

Query Coverage

Can a single chunk answer the question?

Bad: Need 5+ chunks for simple answers
Good: 1-2 chunks for most queries

Cost Efficiency

Tokens used per query:

Bad chunking: 3,000+ tokens average
Good chunking: 800-1,200 tokens
PolicyChatbot: 650 tokens average

Common Chunking Mistakes

Mistake 1: One Size Fits All

Using 512 tokens for everything. Your executive summary doesn’t need the same treatment as your detailed procedures.

Mistake 2: Ignoring Document Structure

Treating a structured policy document like a novel. They’re different. Chunk them differently.

Mistake 3: No Overlap

Zero overlap = zero context. Your chunks become cryptic fragments.

Mistake 4: Over-Chunking

100-token chunks might seem precise, but you’ll need 20 of them to answer anything substantial.

Mistake 5: Under-Chunking

2000-token chunks contain everything… and nothing. Too broad to be useful.

The Future of Intelligent Chunking

What’s coming next:

Semantic Chunking

Split based on meaning changes, not token counts. When the topic shifts, create a new chunk.

Query-Aware Chunking

Different chunks for different query types. “What is…” gets summary chunks. “How do I…” gets procedural chunks.

Dynamic Re-Chunking

Chunks reorganize based on usage patterns. Frequently accessed together? Merge them.

Cross-Document Chunking

Related information from different documents grouped into synthetic chunks.

PolicyChatbot is building all of this. Your homebrew chunker? Still splitting words in half.

Your Chunking Checklist

Before you chunk anything:

Understand your document structure
Choose appropriate chunk size
Implement smart overlap
Preserve semantic boundaries
Maintain metadata
Test with real queries
Monitor retrieval metrics
Iterate based on results

Or just use PolicyChatbot and skip all of this.

The Bottom Line

Document chunking is like cutting a diamond. Do it right, and you get brilliance. Do it wrong, and you get expensive dust.

Most people are creating dust.

The startup that blew £50,000? They rebuilt their entire system. Took 3 months. Implemented proper chunking. Their costs dropped 74x.

Or they could have used PolicyChatbot from the start. Same result. Zero development time.

Your documents are not just text. They’re structured information. Treat them that way.

Chunk smart. Not hard.

Stop wrestling with document chunking strategies. PolicyChatbot handles intelligent chunking automatically. Start your free trial and see the difference proper chunking makes.