ArchitectureTradeOffAnalysis
Architecture tradeoff analysis
Review Rubrics
Soft skills
Requirements gathering
Make decisions and tradeoffs with justification
Describe the solution using concise language and accurate technical terms
Hard skills
Design quality; scalability; reliability, efficiency etc (L4/L5)
Basic facts about existing software and hardware capabilities (L4 partly, L5)
Project lifecycle awareness, e.g. How a project is developed and maintained (L5)
Non-functional requirements (NFRs)
Type | Description |
---|---|
Performance | Efficiency such as throughput and response time |
Availability | Uptime percentage in a year |
Scalability | As number of nodes increases, service capability increases linearly |
Extensibility | Pluggable and easiness to add new functionalities |
Security | Privacy and security |
Observability | Able to detect problems and get root cause quickly |
Testability | Easy to test different componentss |
Robustness | Fault tolerance and fast recovery, high robustness usually indicates high availability |
Portability / Compatibility | Support for different OS, hardwares, softwares (browsers, etc) and versions |
Consistency | Support for different OS, hardwares, softwares (browsers, etc) and versions |
Availability
Availability percentage and service downtime
Commodity hardware failure trend
If your system has 4-5 systems and dozens of database servers (around 10) on the critical path, and assume the failure rate as 2%, then each year you will encounter twice disk failure scenarios.
Decision chart
[TODO: Decison chart]
COGS
Commodity hardware
Two Intel Xeon E5-2623 v3’s (quad core) – $900 total
128GB RAM (using 8GB DIMMs) – $1,920
Two 512GB SSDs for fast storage – $450
Six 4TB hard drives for slow storage – $900
Grand total: $5,070
Capacity planning
1. Get a baseline: MAU and DAU
The benchmarks above show the average stickiness of products for various industries. It is calculated as (DAU/MAU)*100. The chart also mentions the median along with the average because medians are less likely to be skewed by outliers.
For the SaaS industry, the average stickiness is 13% which means slightly less than 4 days of activity/month/user. The Median for the SaaS industry is 9.4%, implying less than 3 days of activity/per user per month.
Multiply DAU/WAU * WAU / MAU to get actual DAU/MAU ratio:
Facebook: ~72%
Ecommerce:
Amazon: 17%
Walmart: 15%
eBay: 3%
Finance:
Paypal: 12.5%
Venmo: 10%
Uber: 12.5%
Netflix: 3%
Groupon: 4.5%
2. Growth speed
For fast growing data (e.g. order data in ecommerce website), use 2X planned capacity to avoid resharding
For slow growing data (e.g. user identity data in ecommerce website), use 3-year estimated capacity to avoid resharding.
3. Divide capacity by system capability
Single Kafka instance
Single machine write: 250K (50MB) messages per second
Single machine read: 550K (110MB) messages per second
Appendix: Conversions
Power of two
Power of two | 10 based number | Short name |
---|---|---|
10 | 1 thousand (10^3) | 1 KB |
20 | 1 million (10^6) | 1 MB |
30 | 1 billion (10^9) | 1 GB |
40 | 1 trillion (10^12) | 1 TB |
50 | 1 quadrillion (10^15) | 1 PB |
Time scale conversion
Total seconds in a day: 86400 ~ 10^5
2.5 million requests per month: 1 request per second
100 million requests per month: 40 requests per second
1 billion requests per month: 400 requests per second
Performance estimation
Memory
Random access: 300K times / s
Sequential access: 5M times / s
Size: GB level per second
Read 1MB memory data takes 0.25ms
Disk IO
Operating system page size for read and write: 4KB
SATA mechanical hard disk
IOPS: 120 times / s
Sequential read size: 100MB / s
Random read size: 2MB / s
Sector size: 0.5KB
SSD hard disk: Speed similar to memory
0.1-0.2ms
Sector size: 4KB
Network latency
Single DC network round trip: 0.5ms
Multi DC network round trip: 30-100ms
Usually set timeout value for RPC within a single DC as 500ms
Interactive latency checker (A scroll bar in the top for different year)
Typical API latency
[TODO: Add a section for typical API latency]
Load balancing design
Example: Design load balancing mechanism for an application with 10M DAU (e.g. Github has around 10M DAU)
Traffic voluem estimation
10M DAU. Suppose each user operate 10 times a day. Then the QPS will be roughly ~ 1160 QPS
Peak value 10 times average traffic ~ 11600 QPS
Suppose volume need to increase due to static resource, microservices. Suppose 10. QPS ~ 116000 QPS.
Capacity planning
Multiple DC: QPS * 2 = 232000
Half-year volume increase: QPS * 1.5 = 348000
Mechanism
No DNS layer
LVS
Stress testing tools
MySqlslap: Shipped together with MySQL. Could not perform long time stress test.
Sysbench: Works on MacOS and Linux.
JMeter: Only basic functionality for database pressure testing.
Scale numbers with examples
Typeahead service
Google search
Google has been visited 62.19 billion times this year.
Google processes over 3.5 billion searches per day.
It means that Google processes over 40,000 search queries every second on average. Let’s also take a look at how Google’s searches per year have progressed. In 1998, Google was processing over 10,000 search queries per day. In comparison, by the end of 2006, the same amount of searches would be processed by Google in a single second.
84 percent of respondents use Google 3+ times a day or more often.
Google has 92.18 percent of the market share as of July 2019.
More than one billion questions have been asked on Google Lens.
63 percent of Google’s US organic search traffic originated from mobile devices.
Facebook was the most searched keyword on Google.
46 percent of product searches begin on Google.
90 percent of survey respondents said they were likely to click on the first set of results.
Instant messaging app
Whatsapp: 1.6 billion MAU
Facebook Messenger: 1.3 billion MAU
Wechat: 1.1 billion MAU
Snapchat: 0.3 billion MAU
Telegram: 0.2 billion MAU
Microsoft Teams
140 million DAU
240 million MAU
Whatsapp
1.6 billion WhatsApp users access the app on a monthly basis. 53 percent of WhatsApp users in the US use the app at least once a day.
More than 65 billion messages are sent via WhatsApp every day. In other words, that boils down to 2.7 billion per hour, 45 million per minute, and more than 750,000 per second.
WhatsApp was downloaded 96 million times in February 2020.
WhatsApp is available in more than 180 countries and 60 different languages.
With 340 million users, India is WhatsApp’s biggest market.
There are more than five million businesses using WhatsApp Business.
Video Streaming
Netflix
200 million subscribers Q4/2020. US has 74 million subscribers.
vs Amazon Prime - 150 million subscribers
vs Hulu - 39 million subscribers
Subscribers spent 3.2 hours per day watching Netflix
serving 100% of our video, over 125 million hours every day, to 100 million members across the globe! https://netflixtechblog.com/how-data-science-helps-power-worldwide-delivery-of-netflix-content-bac55800f9a7
For each episode of the crown, over 1200 files will be created. https://netflixtechblog.com/content-popularity-for-open-connect-b86d56f613b
Youtube
Every second: https://everysecond.io/youtube
2.3 billion MAU
720,000 hours of video uploaded daily
500 hours of video uploaded every minute
(2012) 4 billion hours of video watched every day. 60 hours of video is uploaded every minute. 350+ million devices are YouTube enabled.
(2009) 1 billion videws per day. That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. https://mashable.com/2009/10/09/youtube-billion-views/
8.4 minutes per person per day if everyone watches Youtube
Second most popular search after Google
Localized in 100 countries and 80 languages
70% of traffic come from mobile
Newsfeed
Twitter
There are 330m monthly active users and 145 million daily users.
There are 500 million tweets sent each day. That’s 6,000 tweets every second.
A total of 1.3 billion accounts have been created.
Of those, 44% made an account and left before ever sending a tweet.
Based on US accounts, 10% of users write 80% of tweets.
During the 2014 FIFA World Cup Final, 618,725 tweets were sent in a single minute.
Facebook
Photo sharing
Instagram
In total 250 billion photo since 2004.
Photo uploads total 300 million per day
243,055 new photos uploaded per minute
127 photos uploaded on average per Facebook user
There are 1.074 billion Instagram MAU worldwide in 2021.
Instagram users spend an average of 53 minutes per day.
Dec, 2012: more than 25 photos and 90 likes every second.
File system
Dropbox
Assume the application has 50 million signed up users and 10 million DAU. • Users get 10 GB free space.
Assume users upload 2 files per day. The average file size is 500 KB.
1:1 read to write ratio.
Total space allocated: 50 million * 10 GB = 500 Petabyte
QPS for upload API: 10 million * 2 uploads / 24 hours / 3600 seconds = ~ 240
Peak QPS = QPS * 2 = 480
Reference: Dropbox statistics
Geo location
Yelp
Yelp has more than 178 million unique visitors monthly across mobile, desktop and app platforms
Uber
103 million MAU
Uber has 5 million drivers, Q4 2019 and 18.7 million trips per day on average Q1 2020
versus Lyft has 2 million drivers, who serve over 21.2 million active riders per quarter
References
分布式服务架构 原理、设计与实战
Last updated