Last updated
Last updated
Offline job generates keys (Daemon process) and stores into database.
Offline job can tolerate longer query time.
create table keys(Key text primary key, status integer)
status
0: available
1: occupied
Keys database query
Select key from keys where status = 0 limit 1
Update keys set status = 1 where key = a_key
Race condition
Multiple users can get same keys if qps is high
Select / Update is not atomic operation
Database is not suitable for queuing
Solution:
Database lock: Select key from keys where status = 0 limit 1 for update skip locked
Distributed lock: bad performance.
Redis -> LPush and LPop: Redis list data structure is actually queue and it is thread-safe and production ready
After short url is generated, it is stored inside HDFS files consecutively.
1 billion urls per month means 24 billion records in two years and in total 24 * 10 ^ 9 * 6 = 144 GB.
It is stored as ASC code consecutively inside the file.
When preload service starts, it needs to preload 10K urls from HDFS.
Then it just need to read 60K bytes data from HDFS, and store the offset value of 60K.
Typically this process will take 20-50ms.
Next time it loads another 60K bytes data, starting from the offset value.
The loaded 10K short urls will be stored as linkedlist inside Url short
Upon request, the url shortener service will get a new entry from the preload service.
This step has lock and it could be resource exclusive.
When there is fewer than 2000 new urls inside url preload service, it will load another 10K short urls from HDFS.
Using linkedlist has the following cons:
Much additional GC cost
Additional space cost, shortUrl only takes 6 bytes.
Since opening the file and reading from offset is a mutally exclusive operation, it will prevent multiple short url services load simultaneously from HDFS.