Design Instagram

Problem Statement:


Step-1: Why Instagram ?


Step-2: Requirements and Goals of the System


Step-3: Some Design Considerations


Step-4: Capacity Estimation and Constraints


Step-5: High Level Design


Step-6: Database Schema

Defining the DB schema in the early stages of the interview would help to understand the data flow among various components and later would guide towards the data partitioning.


Step-7: Component Design


Step-8: Reliability and Redundancy


Step-9: Data Sharding

a) Partitioning based on UserID
How can we generate PhotoIDs ?
What are different issues with this partitioning scheme?
  1. How would we handle hot users ? Several people follow such hot users, and any photo they upload is seen by a lot of other people.
  2. Some users will have a lot of photos compared to others, thus making a non-uniform distribution of storage.
  3. What if we cannot store all pictures of a user on one shard ? If we distribute photos of a user onto multiple shards, will it cause higher latencies?
  4. Storing all pictures of a user on one shard can cause issues like unavailability of all of the user’s data if that shard is down or higher latency if it is serving high load etc.


b) Partitioning based on PhotoID
How can we generate PhotoIDs ?
Wouldn’t this key generating DB be a single point of failure ?
KeyGeneratingServer1:
auto-increment-increment = 2 
auto-increment-offset = 1

KeyGeneratingServer2:
auto-increment-increment = 2
auto-increment-offset = 2
How can we plan for future growth of our system ?


Step-10: Ranking and Timeline Generation

Pre-generating the timeline:
What are the different approaches for sending timeline data to the users?
  1. Pull Approach:

    • Clients can pull the timeline data from the server on a regular basis or manually whenever they need it.

    • Possible problems with this approach are:

      • New data might not be shown to the users until clients issue a pull request
      • Most of the time pull requests will result in an empty response if there is no new data.
  2. Push Approach:

    • Servers can push new data to the users as soon as it is available.
    • To efficiently manage this, users have to maintain a Long Poll request with the server for receiving the updates.
    • A possible problem with this approach:
      • A celebrity user who has millions of followers; in this case, the server has to push updates quite frequently.
  3. Hybrid Approach:

    • Move all the users with high followings to pull based model and push data to those users who have a few 100s or 1000s follows.
  4. Tedious Hybrid Approach:

    • Server pushes updates to all the users not more than a certain frequency, letting users with a lot of follows to regularly pull data.


Note:- For a detailed discussion about timeline generation, take a look at Designing Facebook’s Newsfeed.


Step-11: Timeline Creation with Sharded Data

What could be the size of our PhotoID ?


Note:- We will discuss more details about this technique under ‘Data Sharding’ in Designing Twitter.


Step-12: Cache and Load balancing

How can we build more intelligent cache ?




← Previous: Design Pastebin

Next: Design Dropbox →