Skip to main content

Command Palette

Search for a command to run...

The Variability Principle: How to Decide What Deserves a Span

Updated
4 min read
The Variability Principle: How to Decide What Deserves a Span

Every team discovers OpenTelemetry the same way. First, excitement—finally, visibility into distributed systems! Then comes the instrumentation party. Spans everywhere. Every function. Every validation. Every calculation gets its own span because "more data is better," right?

Three months later, you're staring at a trace with 500 spans trying to figure out why a simple API call took 3 seconds. Your observability bill has grown 10x. And your engineers have given up on traces entirely because they're impossible to read.

There's a better way.

The Problem: Span Explosion

Most teams create spans like this:

func ProcessPayment(ctx context.Context, payment Payment) error {
    ctx, span := tracer.Start(ctx, "process payment")
    defer span.End()

    validateAmount(ctx, payment.Amount)      // Another span
    validateCard(ctx, payment.CardNumber)    // Another span
    calculateFees(ctx, payment.Amount)       // Another span
    formatCurrency(ctx, payment.Total)       // Another span
    // ... 10 more spans for trivial operations
}

At 10,000 requests per minute with 15 spans each, you're generating 6.5 billion spans per month. At $0.20 per million spans, that's $1,300 monthly just for payment processing traces.

But cost isn't the real problem. The real problem is that your traces become unreadable. When everything has a span, nothing stands out. Signal drowns in noise.

The Variability Principle: Your New Mental Model

Here's the principle that changed everything for us:

"Is this operation unpredictable?"

If yes, create a span. If no, don't.

This simple question cuts through all the complexity. It's not about operation importance or business value—it's about performance predictability.

Unpredictable = Create a Span

Operations with unpredictable performance need spans:

  • Database queries: Could take 5ms or 5 seconds depending on locks, data size, indexes

  • HTTP calls: Network latency, retries, timeouts are all variable

  • External APIs: You don't control their performance

  • Message queues: Depends on queue depth, consumer availability

  • Cache operations: Network round-trip to Redis/Memcached

  • File I/O: Disk performance varies, especially with network storage

These operations can surprise you. When they're slow, you need to know.

Predictable = Skip the Span

Operations with predictable performance don't need spans:

  • Validation logic: Checking if a string contains "@" is always microseconds

  • Math calculations: CPU-bound operations are consistent

  • Data transformation: Mapping objects in memory is deterministic

  • String formatting: Always fast, never the problem

  • Getters/setters: Not worth measuring

These operations can't surprise you. They're never the bottleneck.

The Pattern in Practice

Let's refactor that payment processing:

func ProcessPayment(ctx context.Context, payment Payment) {
    ctx, span := tracer.Start(ctx, "process payment")
    defer span.End()

    // Add context as attributes, not spans
    span.SetAttributes(
        attribute.Float64("payment.amount", payment.Amount),
        attribute.String("payment.currency", payment.Currency),
    )

    // Validation is predictable - no span needed
    if payment.Amount <= 0 || !isValidCard(payment.CardNumber) {
        span.RecordError(errors.New("invalid payment"))
        return
    }

    // Database operation is unpredictable - needs a span
    ctx, dbSpan := tracer.Start(ctx, "INSERT payments")
    dbSpan.SetAttributes(
        attribute.String("db.system", "postgresql"),
        attribute.String("db.collection.name", "payments"),
        attribute.String("db.operation.name", "INSERT"),
    )
    db.SavePayment(ctx, payment)
    dbSpan.End()

    // External API is unpredictable - needs a span
    ctx, chargeSpan := tracer.Start(ctx, "charge card")
    paymentGateway.Charge(ctx, payment)
    chargeSpan.End()
}

Result: 3 spans instead of 15. Traces are readable. Engineers can actually find problems.

What to Use Instead of Spans

When you skip creating a span, you still need to capture information. That's where attributes and events come in.

Attributes: Context Without Cost

Attributes add metadata to existing spans. They're perfect for:

  • Request/response data (user ID, order total, currency)

  • Configuration values (retry count, timeout settings)

  • Business context (customer tier, feature flags)

span.SetAttributes(
    attribute.String("user.id", userID),
    attribute.Float64("order.total", 157.46),
    attribute.Bool("cache.hit", true),
)

Attributes are indexed and searchable. They let you filter traces without creating separate spans.

Events: Milestones in Time

Events mark important moments within a span's lifecycle. They're perfect for:

  • Validation checkpoints

  • State transitions

  • Progress markers in loops

// Mark validation completion
span.AddEvent("validation completed")

// Track calculation results
span.AddEvent("total calculated",
    trace.WithAttributes(
        attribute.Int("line_items.count", 4),
        attribute.Float64("total", 157.46),
    ))

// Record state changes
span.AddEvent("payment saved")

// Track retry attempts
span.AddEvent("retry attempt",
    trace.WithAttributes(
        attribute.Int("attempt", 3),
        attribute.String("reason", "timeout"),
    ))

Events show you when something happened and provide rich context without the overhead of a full span. When debugging, they help you see the timeline of operations within your parent span.

The Decision Framework

Before creating any span, ask one question:

"Is this operation unpredictable?"

Yes → Create a span

No → Use attributes or events

That's it. This single question replaces complex decision trees and eliminates 80% of unnecessary spans.

Remember This

Your traces should tell a story, not document every CPU cycle. Each span costs money, performance, and clarity.

Create spans only for operations that could surprise you. For everything else, there are attributes and events.

The best observability isn't about having all the data—it's about having the right data.

More from this blog

O

OllyGarden

18 posts