🐝
Mess around software system design
  • README
  • ArchitectureTradeOffAnalysis
    • Estimation
    • Middleware
    • Network
    • Server
    • Storage
  • Conversion cheat sheet
  • Scenarios
    • TinyURL
      • Estimation
      • Flowchart
      • Shortening mechanisms
      • Rest API
      • Performance
      • Storage
      • Follow-up
    • TaskScheduler
      • JDK delay queue
      • Timer based
      • RabbitMQ based
      • Kafka-based fixed delay time
      • Redis-based customized delay time
      • MySQL-based customized delay time
      • Timer TimingWheel
      • Industrial Scheduler
      • Workflow Engine
      • Airflow Arch
    • GoogleDrive
      • Estimation
      • Flowchart
      • Storage
      • Follow-up
    • Youtube
      • Estimation
      • Flowchart
      • Performance
      • Storage
      • Follow-up
      • Netflix
    • Uber
      • Estimation
      • Rest api
      • Flowchart
      • KNN algorithms
      • Geohash-based KNN mechanism
      • Redis implementation
      • Storage
    • Twitter
      • Estimation
      • Flowchart
      • Storage
      • Scalability
      • Follow-up
    • Instant messenger
      • Architecture overview
      • Presence
      • Unread count
      • Notifications
      • Read receipt
      • Large group chat
      • Storage-Offline 1:1 Chat
      • Storage-Offline group chat
      • Storage-Message roaming
      • NonFunc-Realtime
      • NonFunc-Reliability
      • NonFunc-Ordering
      • NonFunc-Security
      • Livecast-LinkedIn
    • Distributed Lock
      • Single machine
      • AP model based
      • CP model based
      • Chubby-TODO
    • Payment system
      • Resilience
      • Consistency
      • Flash sale
    • Key value store
      • Master-slave KV
      • Peer-to-peer KV
      • Distributed cache
  • Time series scenarios
    • Observability
      • TimeSeries data
      • Distributed traces
      • Logs
      • Metrics
      • NonFunc requirments
  • Search engine
    • Typeahead
    • Search engine
    • Distributed crawler
      • Estimation
      • Flowchart
      • Efficiency
      • Robustness
      • Performance
      • Storage
      • Standalone implementation
      • Python Scrapy framework
    • Stream search
  • Big data
    • GFS/HDFS
      • Data flow
      • High availability
      • Consistency
    • Map reduce
    • Big table/Hbase
    • Haystack
    • TopK
    • Stateful stream
    • Lambda architecture
    • storm架构
    • Beam架构
    • Comparing stream frameworks
    • Instagram-[TODO]
  • MicroSvcs
    • Service Registry
      • Flowchart
      • Data model
      • High availability
      • Comparison
      • Implementation
    • Service governance
      • Load balancing
      • Circuit breaker
      • Bulkhead
      • Downgrade
      • Timeout
      • API gateway
      • RateLimiter
        • Config
        • Algorithm comparison
        • Sliding window
        • Industrial impl
    • MicroSvcs_ConfigCenter-[TODO]
    • MicroSvcs_Security
      • Authentication
      • Authorization
      • Privacy
  • Cache
    • Typical topics
      • Expiration algorithm
      • Access patterns
      • Cache penetration
      • Big key
      • Hot key
      • Distributed lock
      • Data consistency
      • High availability
    • Cache_Redis
      • Data structure
      • ACID
      • Performance
      • Availability
      • Cluster
      • Applications
    • Cache_Memcached
  • Message queue
    • Overview
    • Kafka
      • Ordering
      • At least once
      • Message backlog
      • Consumer idempotency
      • High performance
      • Internal leader election
    • MySQL-based msg queue
    • Other msg queues
      • ActiveMQ-TODO
      • RabbitMQ-TODO
      • RocketMQ-TODO
      • Comparison between MQ
  • Traditional DB
    • Index data structure
    • Index categories
    • Lock
    • MVCC
    • Redo & Undo logs
    • Binlog
    • Schema design
    • DB optimization
    • Distributed transactions
    • High availability
    • Scalability
    • DB migration
    • Partition
    • Sharding
      • Sharding strategies
      • Sharding ID generator overview
        • Auto-increment key
        • UUID
        • Snowflake
        • Implement example
      • Cross-shard pagination queries
      • Non-shard key queries
      • Capacity planning
  • Non-Traditional DB
    • NoSQL overview
    • Rum guess
    • Data structure
    • MySQL based key value
    • KeyValueStore
    • ObjectStore
    • ElasticSearch
    • TableStore-[TODO]
    • Time series DB
    • DistributedAcidDatabase-[TODO]
  • Java basics
    • IO
    • Exception handling
  • Java concurrency
    • Overview
      • Synchronized
      • Reentrant lock
      • Concurrent collections
      • CAS
      • Others
    • Codes
      • ThreadLocal
      • ThreadPool
      • ThreadLifeCycle
      • SingletonPattern
      • Future
      • BlockingQueue
      • Counter
      • ConcurrentHashmap
      • DelayedQueue
  • Java JVM
    • Overview
    • Dynamic proxy
    • Class loading
    • Garbage collection
    • Visibility
  • Server
    • Nginx-[TODO]
  • Distributed system theories
    • Elementary school with CAP
    • Consistency
      • Eventual with Gossip
      • Strong with Raft
      • Tunable with Quorum
      • Fault tolerant with BFT-TODO
      • AutoMerge with CRDT
    • Time in distributed system
      • Logical time
      • Physical time
    • DDIA_Studying-[TODO]
  • Protocols
    • ApiDesign
      • REST
      • RPC
    • Websockets
    • Serialization
      • Thrift
      • Avro
    • HTTP
    • HTTPS
    • Netty-TODO
  • Statistical data structure
    • BloomFilter
    • HyperLoglog
    • CountMinSketch
  • DevOps
    • Container_Docker
    • Container_Kubernetes-[TODO]
  • Network components
    • CDN
    • DNS
    • Load balancer
    • Reverse proxy
    • 云中网络-TODO
  • Templates
    • interviewRecord
  • TODO
    • RecommendationSystem-[TODO]
    • SessionServer-[TODO]
    • Disk
    • Unix philosophy and Kafka
    • Bitcoin
    • Design pattern
      • StateMachine
      • Factory
    • Akka
    • GoogleDoc
      • CRDT
Powered by GitBook
On this page
  • Component chart
  • Upload/Edit flow
  • Upload file metadata
  • Upload file
  • Notify other clients (Optional)
  • Download flow
  • Get notified about updates(Optional)
  • Fetch metadata
  • Fetch files
  • Notification flow
  • Notify online and offline client
  • Long polling vs websockets

Was this helpful?

  1. Scenarios
  2. GoogleDrive

Flowchart

PreviousEstimationNextStorage

Last updated 1 year ago

Was this helpful?

Component chart

  • Overall chart

Upload/Edit flow

Two requests are sent in parallel: add file metadata and upload the file to cloud storage. Both requests originate from client 1.

Upload file metadata

  1. Client app computes the file metadata to be uploaded.

    • File name, file content MD5

    • Number of blocks (assume each block is 4MB) and MD5 values for each block.

  2. Client app sends the metadata to web server.

  3. Web server computes globally unique block IDs for each block.

  4. Web server sends metadata and block IDs to metadata DB change the file upload status to “pending.”

  5. Web server returns the block IDs to client app.

Upload file

  1. Client app request block servers to upload blocks.

  2. Block servers connect to metadata DB to verify permissions.

  3. Block servers verify correctness of MD5 values for each block

  4. Block servers store blocks inside object storage.

Notify other clients (Optional)

  1. Web servers notify notification service that a new file is being added.

  2. The notification service notifies relevant clients (client 2) that a file is being uploaded.

Download flow

  • Download flow is triggered by

    • A file is added or edited elsewhere.

    • User proactively request to sync files

Get notified about updates(Optional)

  1. Notification service informs client app that a file is changed somewhere else.

Fetch metadata

  1. Client app send requests to web servers to download files.

  2. Web servers call metadata DB to fetch metadata of changes and block IDs.

  3. Web servers return block IDs and block servers to client apps.

Fetch files

  1. Client app sends requests to block servers to download blocks.

  2. Block servers fetch metadata from metadata DB and verify permissions.

  3. Block servers fetch blocks from object storage.

  4. Block servers return blocks to client app.

  5. Client app verify MD5 and build the entire file.

Notification flow

  • How does a client know if a file is added or edited by another client?

Notify online and offline client

  • Client online: Notification service will inform client A that changes are made somewhere so it needs to pull the latest data.

  • Client offline: while a file is changed by another client, data will be saved to the cache. When the offline client is online again, it pulls the latest changes.

Long polling vs websockets

  • Even though both options work well, we opt for long polling for the following two reasons:

    • Communication for notification service is not bi-directional. The server sends information about file changes to the client, but not vice versa.

    • WebSocket is suited for real-time bi-directional communication such as a chat app.

Component chart
Upload/Edit flow
Upload file metadata
Upload file
Notify other clients (Optional)
Download flow
Get notified about updates(Optional)
Fetch metadata
Fetch files
Notification flow
Notify online and offline client
Long polling vs websockets
Component chart
Upload flow chart
Download flow chart