TimeSeries data

Components

Time series = Object + Tag + Metrics + actual data

Object

Monitoring object could be in three categories:
- Machine level: Physical machine, virtual machine, operation system
- Instance level: Container, process
- Service level (logical object): Service, service group, cluster

Metric

Metrics are numeric measurements. Metrics can include:
- A numeric status at a moment in time (like CPU % used)
- Aggregated measurements (like a count of events over a one-minute time, or a rate of events-per-minute)
The types of metric aggregation are diverse (for example, average, total, minimum, maximum, sum-of-squares), but all metrics generally share the following traits:
- A name
- A timestamp
- One or more numeric values

Access patterns

Sequential read: Read by time range
Random write: Different time series data
- Usually each object has a write sampling frequency is per 5s/10s.
Much more write than read
Lots of aggregating dimensions

Storage in HBase

Rowkey

Why rowkey is important

If rowkey could be designed properly, then data could be distributed evenly into HRegions. And different HRegions could be located in different server nodes.

Row key design

ts = (object, tags) + metric + [(timestamp, value), (timestamp, value), …]

// entity_id is hashed result of combination (object, tags)
// metric_id is hashed result of metric
// timebase is the result of Unix timestamp % 3600, 
//             4 byte length, Rowkey represents 1 hour data. 
RowKey = entity_id + metric_id + timebase

Benefits

entity_id and metric_id makes data evenly distributed.
timebase makes continous data next to each other.

Support tag aggregation

HBase does not have native support for index. This makes it impossible to find all entity_ids given a tag.
In the example below:
- Give a tag: K1=V1, it could find all entities containing the tag: entity_id1, entity_id2, entity_id3

Scaling

Vertical sharding

Each product has a different database

Horizontal partitioning

Slice name is Product - data - {starttime}
- startime is the data starting time in table.
- It will help remove data in batch.

Downsampling

Pre-downsampling

The longer the retention period is, the less data could be stored.

Post-downsampling

At query time, dynamically downsample data based on user assigned query range.

References

百度

Write time series DB from scratch

https://fabxc.org/tsdb/

ELK

https://www.twosigma.com/articles/building-a-high-throughput-metrics-system-using-open-source-software/

Uber M3

https://eng.uber.com/m3/

Datadog

https://www.infoq.com/presentations/datadog-metrics-db/

Aggregation

https://www.youtube.com/watch?v=UEJ6xq4frEw&ab_channel=HasgeekTV

PreviousObservability NextDistributed traces

Last updated 4 years ago

Was this helpful?

hashtagComponents

hashtagObject

hashtagMetric

hashtagTag

hashtagAccess patterns

hashtagStorage in HBase

hashtagRowkey

hashtagWhy rowkey is important

hashtagRow key design

hashtagScaling

hashtagReferences

hashtag百度

hashtagWrite time series DB from scratch

hashtagELK

hashtagUber M3

hashtagDatadog

hashtagAggregation

Components

Object

Metric

Tag

Access patterns

Storage in HBase

Rowkey

Why rowkey is important

Row key design

Scaling

References

百度

Write time series DB from scratch

ELK

Uber M3

Datadog

Aggregation