🐝
Mess around software system design
  • README
  • ArchitectureTradeOffAnalysis
    • Estimation
    • Middleware
    • Network
    • Server
    • Storage
  • Conversion cheat sheet
  • Scenarios
    • TinyURL
      • Estimation
      • Flowchart
      • Shortening mechanisms
      • Rest API
      • Performance
      • Storage
      • Follow-up
    • TaskScheduler
      • JDK delay queue
      • Timer based
      • RabbitMQ based
      • Kafka-based fixed delay time
      • Redis-based customized delay time
      • MySQL-based customized delay time
      • Timer TimingWheel
      • Industrial Scheduler
      • Workflow Engine
      • Airflow Arch
    • GoogleDrive
      • Estimation
      • Flowchart
      • Storage
      • Follow-up
    • Youtube
      • Estimation
      • Flowchart
      • Performance
      • Storage
      • Follow-up
      • Netflix
    • Uber
      • Estimation
      • Rest api
      • Flowchart
      • KNN algorithms
      • Geohash-based KNN mechanism
      • Redis implementation
      • Storage
    • Twitter
      • Estimation
      • Flowchart
      • Storage
      • Scalability
      • Follow-up
    • Instant messenger
      • Architecture overview
      • Presence
      • Unread count
      • Notifications
      • Read receipt
      • Large group chat
      • Storage-Offline 1:1 Chat
      • Storage-Offline group chat
      • Storage-Message roaming
      • NonFunc-Realtime
      • NonFunc-Reliability
      • NonFunc-Ordering
      • NonFunc-Security
      • Livecast-LinkedIn
    • Distributed Lock
      • Single machine
      • AP model based
      • CP model based
      • Chubby-TODO
    • Payment system
      • Resilience
      • Consistency
      • Flash sale
    • Key value store
      • Master-slave KV
      • Peer-to-peer KV
      • Distributed cache
  • Time series scenarios
    • Observability
      • TimeSeries data
      • Distributed traces
      • Logs
      • Metrics
      • NonFunc requirments
  • Search engine
    • Typeahead
    • Search engine
    • Distributed crawler
      • Estimation
      • Flowchart
      • Efficiency
      • Robustness
      • Performance
      • Storage
      • Standalone implementation
      • Python Scrapy framework
    • Stream search
  • Big data
    • GFS/HDFS
      • Data flow
      • High availability
      • Consistency
    • Map reduce
    • Big table/Hbase
    • Haystack
    • TopK
    • Stateful stream
    • Lambda architecture
    • storm架构
    • Beam架构
    • Comparing stream frameworks
    • Instagram-[TODO]
  • MicroSvcs
    • Service Registry
      • Flowchart
      • Data model
      • High availability
      • Comparison
      • Implementation
    • Service governance
      • Load balancing
      • Circuit breaker
      • Bulkhead
      • Downgrade
      • Timeout
      • API gateway
      • RateLimiter
        • Config
        • Algorithm comparison
        • Sliding window
        • Industrial impl
    • MicroSvcs_ConfigCenter-[TODO]
    • MicroSvcs_Security
      • Authentication
      • Authorization
      • Privacy
  • Cache
    • Typical topics
      • Expiration algorithm
      • Access patterns
      • Cache penetration
      • Big key
      • Hot key
      • Distributed lock
      • Data consistency
      • High availability
    • Cache_Redis
      • Data structure
      • ACID
      • Performance
      • Availability
      • Cluster
      • Applications
    • Cache_Memcached
  • Message queue
    • Overview
    • Kafka
      • Ordering
      • At least once
      • Message backlog
      • Consumer idempotency
      • High performance
      • Internal leader election
    • MySQL-based msg queue
    • Other msg queues
      • ActiveMQ-TODO
      • RabbitMQ-TODO
      • RocketMQ-TODO
      • Comparison between MQ
  • Traditional DB
    • Index data structure
    • Index categories
    • Lock
    • MVCC
    • Redo & Undo logs
    • Binlog
    • Schema design
    • DB optimization
    • Distributed transactions
    • High availability
    • Scalability
    • DB migration
    • Partition
    • Sharding
      • Sharding strategies
      • Sharding ID generator overview
        • Auto-increment key
        • UUID
        • Snowflake
        • Implement example
      • Cross-shard pagination queries
      • Non-shard key queries
      • Capacity planning
  • Non-Traditional DB
    • NoSQL overview
    • Rum guess
    • Data structure
    • MySQL based key value
    • KeyValueStore
    • ObjectStore
    • ElasticSearch
    • TableStore-[TODO]
    • Time series DB
    • DistributedAcidDatabase-[TODO]
  • Java basics
    • IO
    • Exception handling
  • Java concurrency
    • Overview
      • Synchronized
      • Reentrant lock
      • Concurrent collections
      • CAS
      • Others
    • Codes
      • ThreadLocal
      • ThreadPool
      • ThreadLifeCycle
      • SingletonPattern
      • Future
      • BlockingQueue
      • Counter
      • ConcurrentHashmap
      • DelayedQueue
  • Java JVM
    • Overview
    • Dynamic proxy
    • Class loading
    • Garbage collection
    • Visibility
  • Server
    • Nginx-[TODO]
  • Distributed system theories
    • Elementary school with CAP
    • Consistency
      • Eventual with Gossip
      • Strong with Raft
      • Tunable with Quorum
      • Fault tolerant with BFT-TODO
      • AutoMerge with CRDT
    • Time in distributed system
      • Logical time
      • Physical time
    • DDIA_Studying-[TODO]
  • Protocols
    • ApiDesign
      • REST
      • RPC
    • Websockets
    • Serialization
      • Thrift
      • Avro
    • HTTP
    • HTTPS
    • Netty-TODO
  • Statistical data structure
    • BloomFilter
    • HyperLoglog
    • CountMinSketch
  • DevOps
    • Container_Docker
    • Container_Kubernetes-[TODO]
  • Network components
    • CDN
    • DNS
    • Load balancer
    • Reverse proxy
    • 云中网络-TODO
  • Templates
    • interviewRecord
  • TODO
    • RecommendationSystem-[TODO]
    • SessionServer-[TODO]
    • Disk
    • Unix philosophy and Kafka
    • Bitcoin
    • Design pattern
      • StateMachine
      • Factory
    • Akka
    • GoogleDoc
      • CRDT
Powered by GitBook
On this page
  • Logical design
  • Normal forms with example
  • ER diagram design with example
  • Physical design
  • Select the DB engine
  • Select the correct data type
  • Best practices for primary key of Innodb engine
  • Schema design

Was this helpful?

  1. Traditional DB

Schema design

PreviousBinlogNextDB optimization

Last updated 3 years ago

Was this helpful?

Logical design

Normal forms with example

  • Normal forms are a way to measure the redundancy and potential maintenance (Needs to update multiple relation tables). They should not be bindly followed to minimize redundancy because they will also increase query cost.

First normal form

  • Def:

    • A relation is in first normal form if every attribute in that relation is atomic and could not be split further.

  • Examples for violations:

    • Use Json as an attribute.

    • Use concatination of multiple attribute as an attribute.

  • Apply to the example: Attribute property could be split further.

Second normal form

  • Def:

    • Prerequisite: If a relation satisfy second norm form, then it must satisfy first norm form.

    • A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.

  • Examples for violations:

    • Redundancy in tables. Partial primary key could determine another attribute.

  • Apply to the example: When using attribute "listnumber" + "barcode" as primary key,

    • "goodsname", "specification", "unit" only relies on "barcode". Separate into a different table.

    • "supplierId", "supplierName", "stock" only relies on "list number". Separate into a different table.

Third normal form

  • Def:

    • Prerequisite: If a relation satisfy third norm form, then it must satisfy second norm form.

    • A relation is not in third normal form if a non-prime attribute is dependent on a non-prime attribute.

  • Apply to the example:

    • SupplierName depends on SupplierId

Denormalization

Use case

  • When the size of table to join is big (~million level), then the performance gain from denormalization (~0.5s) could help.

  • Typically used in dataware house scenarios where historial data is stored for analytics purpose.

Example

  • Within the above table, although quantity * importprice = importvalue and importvalue is redundant column

    • If using multiplication of importprice * quantity, it might not equal to importvalue. E.g. Importprice might change if the supplier has any promotion selling activity.

  • Vice versa, importprice = importvalue / quantity, importprice is redundant column

    • Most of financial calculations for cost rely on importprice. If it is a derivative value, it will increase a lot of computational cost overhead.

ER diagram design with example

  • Example: https://www.exploredatabase.com/2017/07/reduce-er-diagram-to-relation-table-solved-exercise.html

  • https://www.youtube.com/watch?v=OwdFzygGZqk&ab_channel=OrangeOutputs

Rules

  • Strong entity: Convert to a table

  • Weak entity: Convert to a table. Since weak entity cannot exist without a strong entity, the table needs to include a foreign key referencing the strong entity.

  • 1:M relationship: Use foreign key

  • M:N relationship: Use a separate table

Physical design

Select the DB engine

Select the correct data type

  • Determine the type category: number, string, time, binary

Integer types

  • TinyInt vs SmallInt vs MediumInt vs Int vs BigInt: Use TinyInt to replace enum.

  • Int is the most frequently used type, uses 4 bytes.

  • Guarantee that the data won't overflow first, then save space.

Float/Double and decimal types

  • Floating-point types: Float vs Double Use binary to represent decimal and could be inaccurate.

  • Fixed-Point types: Decimal/Numeric separate the int and Use decimal whenever possible because it is more precise.

    • Decimal(M,D): M represents digit, D represents decimal

Text

  • Types: Tinytext/text/mediumtext/longtext

  • Varchar vs Char: Char has fixed size length. Space will be padded in the end. 0-255 byte; Varchar has a larger upperbound 65535 byte.

    • Use char whenever possible.

    • Use varchar on cases where max size could be much bigger than average and not so often updated (Updating on varchar might break indexes)

  • Text could not be used as primary key

Date time

  • Datetime vs Timestamp vs Date vs Time:

    • In most case use Datetime (twice the size of Timestamp but unlimited time range. Timestamp only extends until 2038).

Best practices for primary key of Innodb engine

Schema design

  • https://web.csulb.edu/colleges/coe/cecs/dbdesign/dbdesign.php?page=keys.php

Comparison between different DB engines: 3.08'

Cheatsheet for all data types:

Logical design
Normal forms with example
First normal form
Second normal form
Third normal form
Denormalization
Use case
Example
ER diagram design with example
Rules
Physical design
Select the DB engine
Select the correct data type
Integer types
Float/Double and decimal types
Text
Date time
Best practices for primary key of Innodb engine
Schema design
https://coding.imooc.com/lesson/49.html#mid=403
https://tableplus.com/blog/2018/07/mysql-data-types-cheatsheet.html
https://coding.imooc.com/lesson/49.html#mid=406