Database Design Notes

Published on December 15, 2022

This is exported from my Logseq graph, so it looks sorta weird. I'll attempt to update this as my notes get updated 🙂

The knowledge graph equivalent can be found here.

Links
- IO
  - What every programmer should know about solid state drives
    - https://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/
- Storage Algorithms
  - Write-Optimized Dynamic Hashing for Persistent Memory
    - https://www.usenix.org/system/files/fast19-nam.pdf
    - @Write-Optimized Dynamic Hashing for Persistent Memory
    - CCEH
    - MehDB uses an implementation of this
  - Log Structured Merge Trees
    - https://www.cs.umb.edu/~poneil/lsmtree.pdf
    - https://github.com/facebook/rocksdb/wiki/How-we-keep-track-of-live-SST-files
      - The entire "implementation details" section is good
  - BW Trees
    - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bw-tree-icde2013-final.pdf
    - Developed by Microsoft
    - Binary Tree derivative
  - Hitchhiker Tree
    - https://github.com/datacrypt-project/hitchhiker-tree
    - https://www.youtube.com/watch?v=jdn617M3-P4
    - Binary Tree derivative
  - Skip List
    - https://brilliant.org/wiki/skip-lists/
- Hashing/Hashmaps
  - Extendible hashing
    - https://www.geeksforgeeks.org/extendible-hashing-dynamic-approach-to-dbms/
    - https://en.wikipedia.org/wiki/Extendible_hashing
  - EVMap
    - https://www.youtube.com/watch?v=eLNAMEoKAAc
- Compression
  - ZSTD
    - https://facebook.github.io/zstd/
- Replication
  - View-stamped Replication
    - VSR
    - https://pmg.csail.mit.edu/papers/vr-revisited.pdf
    - https://pmg.csail.mit.edu/papers/vr.pdf
      - Original paper
  - Raft
    - https://raft.github.io/
  - Paxos
    - https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
    - https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
- Practical Examples
  - GFS
    - https://www.micahlerner.com/2020/03/22/understanding-googles-file-system.html
  - Pinecone
    - Vector database
    - https://www.pinecone.io/learn/rust-rewrite/
Notes
- Hard Drives and Solid State Drives
  - Align to 4Kb page size
  - Deletes don't happen when you think they happen
  - Hard drives are very bad at random access
    - Spacial locality + reduced # of total reads for a query are very important
  - SSDs are much better at random reads
- File Systems
  - The (user's) choice of file system can have a significant impact on the performance of a database
  - Copy-on-write file systems (such as ZFS) increase write-amplification and are not generally well suited to hosting databases that update files in-place, though this of course depends
- Record storage
  - For datasets that are of fixed, consistent size it's usually best to use a row-like approach
    - Indexing is easy, as the offset is size of record * row ID + any header offset
      - Indexes become <column you're indexing> + row ID (unless a covering index)
    - Deletes may or may not be difficult
      - Requires a compaction step otherwise you store the record indefinitely, which might be ok
    - Most relational databases use rows as their primary record storage mechanism with various mechanics for handling variable sized data-structures
  - Columnar databases are similar to row databases, but benefit from better storage locality when use cases involve longitudinal reads
    - Great for large analytical engines, but suffers in performance when reading all columns associated with a particular entity/row due to "random" disk read
    - Indexing is possibly harder, but should be roughly the same as row-based
    - Probably want to store each column as a separate file to facilitate easy appending of new records
  - Key-value stores are purely associative data-structures
    - Typically do not care about record byte alignment
    - Commonly implementations use Log Structured Merge Trees and hash maps
  - Fact or datalog databases... are weird
    - Apparently were more popular at some point, but have largely fallen out of style (unless you're a Clojure developer)
    - Few implementations
      - Mozilla had one, abandoned
        
        TODO get link for this
      - Datomic is most prominent modern-ish example
    - Records stored as tuples with indexed values

Database Design Notes

Links

IO

Storage Algorithms

Hashing/Hashmaps

Compression

Replication

Practical Examples

Notes

Hard Drives and Solid State Drives

File Systems

Record storage