Distributed traces
Trace concepts
Properties
Definitions:
Traces—or more precisely, “distributed traces”—are samples of causal chains of events (or transactions) between different components in a microservices ecosystem. And like events and logs, traces are discrete and irregular in occurrence.
Properties:
Traces that are stitched together form special events called “spans”; spans help you track a causal chain through a microservices ecosystem for a single transaction. To accomplish this, each service passes correlation identifiers, known as “trace context,” to each other; this trace context is used to add attributes on the span.
Usecase
Trace data is needed when you care about the relationships between services/entities. If you only had raw events for each service in isolation, you’d have no way of reconstructing a single chain between services for a particular transaction.
Additionally, applications often call multiple other applications depending on the task they’re trying to accomplish; they also often process data in parallel, so the call-chain can be inconsistent and timing can be unreliable for correlation. The only way to ensure a consistent call-chain is to pass trace context between each service to uniquely identify a single transaction through the entire chain.
Optimize the calling chain. For example, if a service calls the other one repeatedly, could these requests being batched? Or could such requests be parallelized?
Locate the bottleneck service.
Optimize the network calls. e.g. Identify whether there are cross region calls
Data model
TraceID
TraceId could be used to concatenate the call logs of a request on each server.
Generation rule
Sample generation rule:
The TraceId is typically generated by the first server that receives the request. The generation rule is: server IP + generated time + incremental sequence + current process ID, such as:
Example: 0ad1348f1403169275002100356696
The first 8 digits 0ad1348f is the IP of the machine that generates TraceId. This is a hexadecimal number, in which every two digits represents a part of IP. Based on the number, we can get a common IP address like 10.209.52.143 by converting every two digits into a decimal number. According to this rule, you can also figure out the first server that the request goes through.
The next 13 digits 1403169275002 is the time to generate the TraceId.
The next 4 digits 1003 is an auto-incrementing sequence that increases from 1000 to 9000. After reaching 9000, it returns to 1000 and then restarts to increase.
The last 5 digits 56696 is the current process ID. Its role in tracerId is to prevent the TraceId conflicts caused by multiple processes in a single machine.
Sample rate
Sampling states applied to the trace ID, not the span ID.
There are four possible values for sample rate:
Accept: Decide to include
Debug: Within certain testing environments, always enable the sample.
Defer: Could not make the decision on whether to trace or not. For example, wait for certain proxy to make the decision.
Deny: Decide to exclude
The most common use of sampling is probablistic: eg, accept 0.01% of traces and deny the rest. Debug is the least common use case.
Reference: https://github.com/openzipkin/b3-propagation
SpanID
Span ID could be used to determine the order of execution for all calls happened within the same Trace ID.
Parent spanId
This is one way of defining parent span Id. More commonly adopted.
Dot spanId
This is another way of defining parent span Id.
Cons: When a trace has too many calling layers, the dot spanId will carry too much redundant information.
Annotation
Basic description info related to the trace
Context propogation
A context will often have information identifying the current span and trace (e.g. SpanId / TraceId), and can contain arbitrary correlations as key-value pairs.
Propagation is the means by which context is bundled and transferred across.
The ability to correlate events across service boundaries is one of the principle concepts behind distributed tracing. To find these correlations, components in a distributed system need to be able to collect, store, and transfer metadata referred to as context.
Across threads
Use threadlocal to pass TraceID / SpanID
Across Restful style service APIs
There are several protocols for context propagation that OpenTelemetry recognizes.
W3C Trace-Context HTTP Propagator
W3C Correlation-Context HTTP Propagator
B3 Zipkin HTTP Propagator
Across components such as message queues / cache / DB
Add the context variables inside message
Cons: temper with message
Change message queue protocol
Cons: challenging
OpenTracing API standards**
Reference:
Architecture
Data collection
Asynchronous processing with bounded buffer queue
No matter what approach the data collector adopts, the threads for sending out telemetry data must be separated from business threads. Call it using a background threads pool.
There should be a queue between business threads and background threads. And this queue should have bounded size to avoid out of memory issue.
Approaches
Manual tracing
Manually add tracing logs
AOP
Bytecode Instrumentation
Please see more in In Chinese
Append to log files
Appender is responsible for outputing formatted logs to destinations such as disk files, console, etc. Then trace files could be processed in the similar way as log files.
When multiple threads use the same appender, there is a chance for resource contention. The append operation needs to be asynchronous. And to fit with asynchornous operation, there must be a buffer queue. Please
Data storage
Requirement analysis
No fixed data model but calling chain has a tree-structure.
Large amounts of data, would better be compressed.
Sample size figures: meituan 100TB per day
Column-family data storage
Data model for a normal trace
Use TraceID as rowKey
Has two columns
Basic info column: Basic info about trace
Calling info column: (Each remote service call has four phases)
P1: Client send
P2: Server receive
P3: Server send
P4: Client receive
Using HBase as an example for an ecommerce website
TraceId | 0001 | 0002 |
---|---|---|
Basic Info Column | Type: buy | Type: refund |
Basic Info Column | Status: finished | Status: processing |
Calling Info Column | SpanId 1 with P1 calling info | SpanId 1 with P1 calling info |
Calling Info Column | SpanId 1 with P2 calling info | SpanId 1 with P2 calling info |
Calling Info Column | SpanId 1 with P3 calling info | SpanId 1 with P3 calling info |
Calling Info Column | SpanId 1 with P4 calling info | SpanId 1 with P4 calling info |
Calling Info Column | SpanId 2 with P1 calling info | SpanId 2 with P1 calling info |
Calling Info Column | SpanId 2 with P2 calling info | empty to be filled when finished |
Calling Info Column | SpanId 2 with P3 calling info | ... ... |
Data model for a buiness trace
Motivation:
The above trace data model covers the case where all spans could be concatenated together with a trace ID. There are cases where multiple trace id needed to be concatenated to form a business chain.
For example, in ecommerce system, a customer could create an order, the revise an exsiting order, and later on cancel the order.
Also needs a column-family storage from traceID -> json blob and the reverse mapping from system transaction id -> trace ID
TraceID | Order system transaction ID | Payment system transaction ID | User system transaction ID |
---|---|---|---|
0001 | 1 | 2 | 3 |
0002 | 4 | 5 | 6 |
0003 | 7 | 8 | 9 |
Distributed file system
Each block needs corresponding 48 bits index data. Based on the trace Id, the index position could be decided.
The trace Id format could be defined in a way to make locating index and block data easier. For example, ShopWeb-0a010680-375030-2 traceId has four segments. The index file name could be defined as the "ShopWeb" + "0a010680" + "375030". And the block position could be inferred from the 4th segment.
ShopWeb: Application name
0a010680: Current machine's IP address
375030: Current time / hour
2: Mono-increasing sequence number in the current unit
Distributed tracing solutions
OpenTracing
Datadog and Opentracing: https://www.datadoghq.com/blog/opentracing-datadog-cncf/
Solution inventory
2014 Google Dapper
Twitter Zipkin: https://zipkin.io/pages/architecture.html
Pinpoint: https://pinpoint-apm.github.io/pinpoint/
DaZhongDianPing CAT (Chinese): https://github.com/dianping/cat
Alibaba EagleEye
Jingdong Hydra
Apache SkyWalking:https://github.com/apache/skywalking
Pinpoint (APM)
OpenZipkin
Pinpoint
Compare Pinpoint and OpenZipkin
Language support:
OpenZipkin has a broad language support, including C#、Go、Java、JavaScript、Ruby、Scala、PHP
PinPoint only support Java
Integration effort:
OpenZipkin's braven trace instrument api needs to be embedded inside business logic
Pinpoint uses Bytecode Instrumentation, Not Requiring Code Modifications.
Trace granularity:
OpenZipkin: Code level
Pinpoint: Granular at bytecode level
美团
美团分布式追踪MTrace:https://zhuanlan.zhihu.com/p/23038157
Ali
阿里eagle eye:
Java instruments API: https://tech.meituan.com/2019/02/28/java-dynamic-trace.html
Last updated