Index data structure

Database index

Pros

  • Change random to sequential IO

  • Reduce the amount of data to scan

  • Sort data to avoid using temporary table

Cons

  • Update speed will be slowed because the index needs to be maintained.

  • Increase query optimizer process time

Balanced binary tree

  • Based on the idea of binary search tree, with the following improvements:

    • The height difference between left and right child is 1 at maximum

  • Cons:

    • Lots of rebalancing during inserting new nodes

    • Each nodes could only store one value while operating system load items from disk in page size (4k).

    • Tree too high which results in large number of IO operations

Skiplist

  • Will degrade to a linked list in the worst case.

B tree

Structure

Pros as index underlying data structure

  • Based on the idea of binary tree, with the following improvements:

    • Store more values in each node: For a N-degree B tree,

      • Every non-leaf node (except root) has at least N/2 children nodes.

      • Root node has at least 2 children nodes.

      • Each node has at most N children nodes.

    • All the leaf nodes stay on the same depth level.

    • B tree is built up in a bottom-up way. Everything is sent into a leaf node first node (in innoDB the leaf node size is 16KB). If the leaf node could not fit, then another leaf node will be created and a node will be promoted as parent.

Cons as index underlying data structure

  • All nodes in B tree stores actual data so the index could not fit into memory.

  • Not friendly for range query.

B+ Tree

Structure

Pros as index data structure

  • Suitable for range query: The leaf nodes are linked in a doubly linked list. These links will be used for range query.

  • Indexes small enough for fitting in memory: Non-leaf nodes could be stored inside memory.

  • Low height as a tree

Cons Characteristics

  • B+ tree has write amplification

  • The storage is not continuous

Cons example for write amplification

  • Initial B+ tree

  • B+ tree after insertion

Practices to use B/B+ tree efficiently

  • Minimise number of indexes: This is a very common strategy to increase the performance of relational databases as more index means more index updation on every INSERT & UPDATE operation. When the database grows older, delete old & probably unused indexes, hesitate to create indexes when your database crumble. Though you should be extra careful as less index means less select query performance. You might have to pay the penalty if you have less index or very heavy index.

  • Sequential Insert: If you can manage to insert a lot of data together in sequential order where the primary key of the rows are sequential, then the insert operation will be faster as already the page blocks is there in memory, so possibly all the insertions will be applied in the same page blocks and then committed to database at once in a single disk I/O. A lot depends on database here though, but I assume, modern databases are designed to apply such optimizations when required.

References

Fractal tree

  • Fractal tree is based on B Tree structure. But there are certain optimizations to make it perform fast, very fast at least as per their performance benchmark. Fractal tree supports message buffers inside all nodes except leaf nodes. Fractal tree is used in Tokudb that was later acquired by Percona. Tokudb can be used as MySQL storage engine like InnoDB is used.

Last updated