> For the complete documentation index, see [llms.txt](https://eric-zhang-seattle.gitbook.io/mess-around/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://eric-zhang-seattle.gitbook.io/mess-around/scenarios/overview/follow-up.md).

# Follow-up

* [Problem](#problem)
  * [Avoid duplication](#avoid-duplication)
  * [Sharding solutions](#sharding-solutions)
    * [Naive sharding key](#naive-sharding-key)
    * [Combine short and long Url as sharding key](#combine-short-and-long-url-as-sharding-key)
      * [Idea](#idea)
      * [Implementation](#implementation)
    * [Geo location sharding key](#geo-location-sharding-key)

## Problem

* What if disallowing same long urls mapped to different short urls (similar allow user to define customized urls)

### Avoid duplication

* Option1: Before insert random url into database. Check whether it exists within database, if not then insert. And build index on shortUrl for faster search. However, index building could be an expensive operation.
* Option2: Use bloomfilter to check whether a url already exists within the dictionary.
* Option3: Client could cache the short url after the first request.
* Option4: Put longUrl => shortUrl map in the cache for several days. This could make sure that the same short url will be returned during this time.

### Sharding solutions

* Additional considerations: Not only short-to-long mapping needs to be stored. The long-to-short mapping needs to be stored as well.
* This will create additional complexity when choosing sharding key

#### Naive sharding key

* Use Long Url as sharding key
  * Short to long operation will require lots of cross-shard joins
* Use Short Url as sharding key
  * Short to long url: Find database according to short url; Find long url in the corresponding database
  * Long to short url: Broadcast to N databases to see whether the link exist before. If not, get the next ID and insert into database.

#### Combine short and long Url as sharding key

**Idea**

* Problem: For "31bJF4", how do we find which physical machine the code locate at?
* Our query is
  * Select original\_url from tiny\_url where short\_url = XXXXX;
* The simple one
  * We query all nodes. But there will be n queries
  * This will be pretty expensive, even if we fan out database calls.
  * But the overall QPS to database will be n times QPS
* Solution
  * Add virtual node number to prefix "31bJF4" => "131bJF4"
  * When querying, we can find virtual nodes from prefix (the tinyUrl length is 6). Then we can directly query node1.

```
// Assign virtual nodes to physical nodes
// Store within database
{
    0: "db0.tiny_url.com",
    1: "db1.tiny_url.com",
    2: "db2.tiny_url.com",
    3: "db3.tiny_url.com"
}
```

**Implementation**

* Hash(longUrl)%62 + shortkey
* Given shortURL, we can get the sharding machine by the first bit of shortened url.
* Given longURL, get the sharding machine according to Hash(longURL) % 62. Then take the first bit.

#### Geo location sharding key

* Sharding according to the geographical info.
  * First know which websites are more popular in which region. Put all websites popular in US in US DB.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://eric-zhang-seattle.gitbook.io/mess-around/scenarios/overview/follow-up.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
