~ngp

Database Design Notes

This is exported from my Logseq graph, so it looks sorta weird. I'll attempt to update this as my notes get updated 🙂

The knowledge graph equivalent can be found here.


  • Links

    • IO

    • Storage Algorithms

      • Write-Optimized Dynamic Hashing for Persistent Memory
        • https://www.usenix.org/system/files/fast19-nam.pdf
        • @Write-Optimized Dynamic Hashing for Persistent Memory
        • CCEH
        • MehDB uses an implementation of this
      • Log Structured Merge Trees
        • https://www.cs.umb.edu/~poneil/lsmtree.pdf
        • https://github.com/facebook/rocksdb/wiki/How-we-keep-track-of-live-SST-files
          • The entire "implementation details" section is good
      • BW Trees
        • https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bw-tree-icde2013-final.pdf
        • Developed by Microsoft
        • Binary Tree derivative
      • Hitchhiker Tree
        • https://github.com/datacrypt-project/hitchhiker-tree
        • https://www.youtube.com/watch?v=jdn617M3-P4
        • Binary Tree derivative
      • Skip List
        • https://brilliant.org/wiki/skip-lists/
    • Hashing/Hashmaps

      • Extendible hashing
        • https://www.geeksforgeeks.org/extendible-hashing-dynamic-approach-to-dbms/
        • https://en.wikipedia.org/wiki/Extendible_hashing
      • EVMap
        • https://www.youtube.com/watch?v=eLNAMEoKAAc
    • Compression

      • ZSTD
        • https://facebook.github.io/zstd/
    • Replication

      • View-stamped Replication
        • VSR
        • https://pmg.csail.mit.edu/papers/vr-revisited.pdf
        • https://pmg.csail.mit.edu/papers/vr.pdf
          • Original paper
      • Raft
        • https://raft.github.io/
      • Paxos
        • https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
        • https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
    • Practical Examples

      • GFS
        • https://www.micahlerner.com/2020/03/22/understanding-googles-file-system.html
      • Pinecone
        • Vector database
        • https://www.pinecone.io/learn/rust-rewrite/
  • Notes

    • Hard Drives and Solid State Drives

      • Align to 4Kb page size
      • Deletes don't happen when you think they happen
      • Hard drives are very bad at random access
        • Spacial locality + reduced # of total reads for a query are very important
      • SSDs are much better at random reads
    • File Systems

      • The (user's) choice of file system can have a significant impact on the performance of a database
      • Copy-on-write file systems (such as ZFS) increase write-amplification and are not generally well suited to hosting databases that update files in-place, though this of course depends
    • Record storage

      • For datasets that are of fixed, consistent size it's usually best to use a row-like approach
        • Indexing is easy, as the offset is size of record * row ID + any header offset
          • Indexes become <column you're indexing> + row ID (unless a covering index)
        • Deletes may or may not be difficult
          • Requires a compaction step otherwise you store the record indefinitely, which might be ok
        • Most relational databases use rows as their primary record storage mechanism with various mechanics for handling variable sized data-structures
      • Columnar databases are similar to row databases, but benefit from better storage locality when use cases involve longitudinal reads
        • Great for large analytical engines, but suffers in performance when reading all columns associated with a particular entity/row due to "random" disk read
        • Indexing is possibly harder, but should be roughly the same as row-based
        • Probably want to store each column as a separate file to facilitate easy appending of new records
      • Key-value stores are purely associative data-structures
        • Typically do not care about record byte alignment
        • Commonly implementations use Log Structured Merge Trees and hash maps
      • Fact or datalog databases... are weird
        • Apparently were more popular at some point, but have largely fallen out of style (unless you're a Clojure developer)
        • Few implementations
          • Mozilla had one, abandoned
            • TODO get link for this
          • Datomic is most prominent modern-ish example
        • Records stored as tuples with indexed values
Thoughts? Leave a comment