Deciding What Deserves a Span

Every team discovers OpenTelemetry the same way. First, excitement—finally, visibility into distributed systems! Then comes the instrumentation party. Spans everywhere. Every function. Every validation. Every calculation gets its own span because "more data is better," right?

Three months later, you're staring at a trace with 500 spans trying to figure out why a simple API call took 3 seconds. Your observability bill has grown 10x. And your engineers have given up on traces entirely because they're impossible to read.

There's a better way.

The Problem: Span Explosion

Most teams create spans like this:

func ProcessPayment(ctx context.Context, payment Payment) error {
    ctx, span := tracer.Start(ctx, "process payment")
    defer span.End()

    validateAmount(ctx, payment.Amount)      // Another span
    validateCard(ctx, payment.CardNumber)    // Another span
    calculateFees(ctx, payment.Amount)       // Another span
    formatCurrency(ctx, payment.Total)       // Another span
    // ... 10 more spans for trivial operations
}

At 10,000 requests per minute with 15 spans each, you're generating 6.5 billion spans per month. At $0.20 per million spans, that's $1,300 monthly just for payment processing traces.

But cost isn't the real problem. The real problem is that your traces become unreadable. When everything has a span, nothing stands out. Signal drowns in noise.

The Variability Principle: Your New Mental Model

Here's the principle that changed everything for us:

"Is this operation unpredictable?"

If yes, create a span. If no, don't.

This simple question cuts through all the complexity. It's not about operation importance or business value—it's about performance predictability.

Unpredictable = Create a Span

Operations with unpredictable performance need spans:

Database queries: Could take 5ms or 5 seconds depending on locks, data size, indexes
HTTP calls: Network latency, retries, timeouts are all variable
External APIs: You don't control their performance
Message queues: Depends on queue depth, consumer availability
Cache operations: Network round-trip to Redis/Memcached
File I/O: Disk performance varies, especially with network storage

These operations can surprise you. When they're slow, you need to know.

Predictable = Skip the Span

Operations with predictable performance don't need spans:

Validation logic: Checking if a string contains "@" is always microseconds
Math calculations: CPU-bound operations are consistent
Data transformation: Mapping objects in memory is deterministic
String formatting: Always fast, never the problem
Getters/setters: Not worth measuring

These operations can't surprise you. They're never the bottleneck.

The Pattern in Practice

Let's refactor that payment processing:

func ProcessPayment(ctx context.Context, payment Payment) {
    ctx, span := tracer.Start(ctx, "process payment")
    defer span.End()

    // Add context as attributes, not spans
    span.SetAttributes(
        attribute.Float64("payment.amount", payment.Amount),
        attribute.String("payment.currency", payment.Currency),
    )

    // Validation is predictable - no span needed
    if payment.Amount <= 0 || !isValidCard(payment.CardNumber) {
        span.RecordError(errors.New("invalid payment"))
        return
    }

    // Database operation is unpredictable - needs a span
    ctx, dbSpan := tracer.Start(ctx, "INSERT payments")
    dbSpan.SetAttributes(
        attribute.String("db.system", "postgresql"),
        attribute.String("db.collection.name", "payments"),
        attribute.String("db.operation.name", "INSERT"),
    )
    db.SavePayment(ctx, payment)
    dbSpan.End()

    // External API is unpredictable - needs a span
    ctx, chargeSpan := tracer.Start(ctx, "charge card")
    paymentGateway.Charge(ctx, payment)
    chargeSpan.End()
}

Result: 3 spans instead of 15. Traces are readable. Engineers can actually find problems.

What to Use Instead of Spans

When you skip creating a span, you still need to capture information. That's where attributes and events come in.

Attributes: Context Without Cost

Attributes add metadata to existing spans. They're perfect for:

Request/response data (user ID, order total, currency)
Configuration values (retry count, timeout settings)
Business context (customer tier, feature flags)

span.SetAttributes(
    attribute.String("user.id", userID),
    attribute.Float64("order.total", 157.46),
    attribute.Bool("cache.hit", true),
)

Attributes are indexed and searchable. They let you filter traces without creating separate spans.

Events: Milestones in Time

Events mark important moments within a span's lifecycle. They're perfect for:

Validation checkpoints
State transitions
Progress markers in loops

// Mark validation completion
span.AddEvent("validation completed")

// Track calculation results
span.AddEvent("total calculated",
    trace.WithAttributes(
        attribute.Int("line_items.count", 4),
        attribute.Float64("total", 157.46),
    ))

// Record state changes
span.AddEvent("payment saved")

// Track retry attempts
span.AddEvent("retry attempt",
    trace.WithAttributes(
        attribute.Int("attempt", 3),
        attribute.String("reason", "timeout"),
    ))

Events show you when something happened and provide rich context without the overhead of a full span. When debugging, they help you see the timeline of operations within your parent span.

The Decision Framework

Before creating any span, ask one question:

"Is this operation unpredictable?"

Yes → Create a span

No → Use attributes or events

That's it. This single question replaces complex decision trees and eliminates 80% of unnecessary spans.

Remember This

Your traces should tell a story, not document every CPU cycle. Each span costs money, performance, and clarity.

Create spans only for operations that could surprise you. For everything else, there are attributes and events.

The best observability isn't about having all the data—it's about having the right data.

The Variability Principle: How to Decide What Deserves a Span

The Problem: Span Explosion

The Variability Principle: Your New Mental Model

Unpredictable = Create a Span

Predictable = Skip the Span

The Pattern in Practice

What to Use Instead of Spans

Attributes: Context Without Cost

Events: Milestones in Time

The Decision Framework

Remember This

Comments

More from this blog

Severity-Based Log Routing with the OpenTelemetry Collector

Your telemetry answers yesterday's questions

When to Use Each Telemetry Signal: Logs, Traces, and Metrics

You don't have too much telemetry. You have bad telemetry.

Reducing Log Volume with the OpenTelemetry Log Deduplication Processor

Command Palette

The Problem: Span Explosion

The Variability Principle: Your New Mental Model

Unpredictable = Create a Span

Predictable = Skip the Span

The Pattern in Practice

What to Use Instead of Spans

Attributes: Context Without Cost

Events: Milestones in Time

The Decision Framework

Remember This

Comments

More from this blog