Mess around software system design

CtrlK

Non-shard key queries

Problem

For example, most e-commerce platform uses buyer id as sharding key. If we need to search by order id, what are the options?

1. Embed query key inside sharding key

Similar to what we have done in snowflake id generator, we could embed the query key inside sharding key.
More specificly, we could embed the buyer id inside order id. Then we could extract it from order id.

2. Intermediate table

An intermediate table could be created to store the mapping between order id, buyer id, and seller id.
Furthermore, if the buyer id is embedded inside order id, then we could remove the buyer id column inside intermediate table.

Cons

Write bottleneck

If some columns of intermediate table need to be updated frequently, then intermediate tables themselves have write bottleneck.
Possible remediation options: To have fewer columns inside intermediate table and avoid frequently updated columns.

Poor flexibility

Intermediate table is typically a big table. Updating table structure for a big table is a risky operation.

Consistency

Please refer to the Cons section under redundant sharding

3. Redundant sharding

Same set of data will be sharded by two different keys.
However, the redundant copy doesn't need to clone everything.
- For example, the order details table doesn't need to be cloned.
- Specific columns such as notes, transaction time don't need to be cloned.
When designing the redundant copy of data sharded by seller id, as long as most of queries don't need to trace back to the data sharded by buyer id, it will be great.

Cons

Consistency

How to keep the consistency between original copy and intermediate table / copy sharded by another key?

Double write

Binlog watch and monitor

4. Other middlewares such as ElasticSearch

Introduce search engine for more complex queries.

5. Combined approach

If allowing both buyer and seller to update their shard copy, then concurrent writes need to be controlled by distributed lock and double write.

PreviousCross-shard pagination queries NextCapacity planning

Last updated 1 year ago

Was this helpful?