Non-shard key queries
Problem
For example, most e-commerce platform uses buyer id as sharding key. If we need to search by order id, what are the options?
1. Embed query key inside sharding key
Similar to what we have done in snowflake id generator, we could embed the query key inside sharding key.
More specificly, we could embed the buyer id inside order id. Then we could extract it from order id.
2. Intermediate table
An intermediate table could be created to store the mapping between order id, buyer id, and seller id.
Furthermore, if the buyer id is embedded inside order id, then we could remove the buyer id column inside intermediate table.
Cons
Write bottleneck
If some columns of intermediate table need to be updated frequently, then intermediate tables themselves have write bottleneck.
Possible remediation options: To have fewer columns inside intermediate table and avoid frequently updated columns.
Poor flexibility
Intermediate table is typically a big table. Updating table structure for a big table is a risky operation.
Consistency
Please refer to the Cons section under redundant sharding
3. Redundant sharding
Same set of data will be sharded by two different keys.
However, the redundant copy doesn't need to clone everything.
For example, the order details table doesn't need to be cloned.
Specific columns such as notes, transaction time don't need to be cloned.
When designing the redundant copy of data sharded by seller id, as long as most of queries don't need to trace back to the data sharded by buyer id, it will be great.
Cons
Consistency
How to keep the consistency between original copy and intermediate table / copy sharded by another key?
Double write
Binlog watch and monitor
4. Other middlewares such as ElasticSearch
Introduce search engine for more complex queries.
5. Combined approach
If allowing both buyer and seller to update their shard copy, then concurrent writes need to be controlled by distributed lock and double write.
Last updated