Follow-up
Problem
What if disallowing same long urls mapped to different short urls (similar allow user to define customized urls)
Avoid duplication
Option1: Before insert random url into database. Check whether it exists within database, if not then insert. And build index on shortUrl for faster search. However, index building could be an expensive operation.
Option2: Use bloomfilter to check whether a url already exists within the dictionary.
Option3: Client could cache the short url after the first request.
Option4: Put longUrl => shortUrl map in the cache for several days. This could make sure that the same short url will be returned during this time.
Sharding solutions
Additional considerations: Not only short-to-long mapping needs to be stored. The long-to-short mapping needs to be stored as well.
This will create additional complexity when choosing sharding key
Naive sharding key
Use Long Url as sharding key
Short to long operation will require lots of cross-shard joins
Use Short Url as sharding key
Short to long url: Find database according to short url; Find long url in the corresponding database
Long to short url: Broadcast to N databases to see whether the link exist before. If not, get the next ID and insert into database.
Combine short and long Url as sharding key
Idea
Problem: For "31bJF4", how do we find which physical machine the code locate at?
Our query is
Select original_url from tiny_url where short_url = XXXXX;
The simple one
We query all nodes. But there will be n queries
This will be pretty expensive, even if we fan out database calls.
But the overall QPS to database will be n times QPS
Solution
Add virtual node number to prefix "31bJF4" => "131bJF4"
When querying, we can find virtual nodes from prefix (the tinyUrl length is 6). Then we can directly query node1.
Implementation
Hash(longUrl)%62 + shortkey
Given shortURL, we can get the sharding machine by the first bit of shortened url.
Given longURL, get the sharding machine according to Hash(longURL) % 62. Then take the first bit.
Geo location sharding key
Sharding according to the geographical info.
First know which websites are more popular in which region. Put all websites popular in US in US DB.
Last updated