Design Tiny URL Service

Problem Statement:


Very Famous Interview Question:


Step-1: Why do we need URL shortening?

Example :-

Long Url: https://www.kodefork.com/articles/machine-or-deep-learning-for-biological-data-analysis/

Tiny Url using tinyurl.com : https://tinyurl.com/y8cnselj


Step-2: Requirements and Goals of the System

We should always clarify requirements at the beginning of the interview and should ask questions to find the exact scope of the system that the interviewer has in mind.


Step-3: Capacity Estimation and Constraints

Traffic estimates:
Storage estimates:

Bandwidth estimates:
Memory estimates:
High level estimates:


Step-4: System APIs

Once we’ve finalized the requirements, it’s always a good idea to define the system APIs. This would explicitly state what is expected from the system.

Create API
createURL(api_dev_key, original_url, custom_alias=None, user_name=None, expire_date=None)
Delete API
deleteURL(api_dev_key, url_key)
How do we detect and prevent abuse ?


Step-5: Database Design

Defining the DB schema in the early stages of the interview would help to understand the data flow among various components and later would guide towards the data partitioning.

Database Schema:

What kind of database should we use ?


Step-6: Basic System Design and Algorithm

a) Encoding actual URL
What are different issues with our solution ?
Workaround for the issues:
  1. We can append an increasing sequence number to each input URL to make it unique and then generate a hash of it.
    • We don’t need to store this sequence number in the databases, though.
    • Possible problems with this approach could be how big this sequence number would be, can it overflow ?
    • Appending an increasing sequence number will impact the performance of the service too.
  2. Another solution could be to append user id (which should be unique) to the input URL.
    • However, if the user has not signed in, we can ask the user to choose a unique key.
    • Even after this if we have a conflict, we have to keep generating a key until we get a unique one.


b) Generating keys offline
Can concurrency cause problems ?
What would be the key-DB size ?
Isn’t KGS the single point of failure ?
Can each app server cache some keys from key-DB ?
How would we perform a key lookup ?
Should we impose size limits on custom aliases ?


Step-7: Data Partitioning and Replication

Range Based Partitioning:

Main problem : Can lead to unbalanced servers, for instance; if we decide to put all URLs starting with letter ‘E’ into a DB partition, but later we realize that we have too many URLs that start with letter ‘E’, which we can’t fit into one DB partition.

Hash-Based Partitioning:


Step-8: Caching

How much cache should we have ?
Which cache eviction policy would best fit our needs ?
How can each cache replica be updated ?


Step-9: Load Balancing (LB)


Step-10: Purging or DB cleanup


Step-11: Telemetry


Step-12: Security and Permissions




← Previous: Tools and Techniques

Next: Design Pastebin →