Sauvik Biswas

Comics enthusiast, Musician, Programmer and Traveller

  • About
  • Travelogue
  • On Comics
  • Now
Comics enthusiast, Musician, Programmer and Traveller

YetiDB: an academic exercise

Print This Post February 22, 2024 by Sauvik Biswas Leave a Comment

Over the last couple of years I have seen protobufs being used not only as a means to serialize and deserialize data for transport over network but also as a means to implement a type system, business logic, version control, storage schema, and what not. These often require authoring some sort of custom options (extensions) for the protobuf messages, and writing code that can parse these options and implement some sort of business logic that would use the data contained in the protobuf.

One of the most common usecase is to define storage schema using protobuf. Infobolox’s protoc plugin for GORM comes to my mind. I have seen being used by many folks, although it is a bit restrictive. One of my teammates wrote his own Python/Jinja-based parser to churn out custom code for our application. Needless to say, he has automated the hell out of our system.

There is always a middle layer that prohibits us from transacting with the database as protobufs. However, if we look at a protobuf’s core use case, it is for serializing and deserializing data. This is something that can—in theory—be used for storage and retrieval as well. Most middlewares essentially translate protobufs into language-specific queries. My teammate did it for GORM which in turn generated SQL queries for Postgres, and I did it for generating Cypher queries for neo4j.

Three days ago, I wrote a primary requirement in terms of building a protobuf-centric database—

User chooses:

  1. A storage system
    1. One Record / File
    2. B-Tree based
    3. LSM tree
    4. Non-indexed sequential
  2. A host of servers

User interacts with db:

  1. Write as protobuf
  2. Read into protobuf

It is first-and-foremost an academic exercise. I know very little of how a database really works. This can be an excellent gateway to understanding the fundamentals of a storage system.

There are two books that I will use as my starting point—

  1. Database Internals by Alex Petrov
  2. Designing Data-Intensive Applications by Martin Kleppmann

I call it YetiDB. I was reading Tintin in Tibet for the nth time just before I had to give the repository a name. Anyways, here it is—
https://github.com/sauvikbiswas/yeti

P.S. The answers of this question on SoftwareEngineering.StackExchange gave me a lot of confidence.

A naive implementation of file-based storage
That one time we actually trekked to Goecha-La
Posted in: Coding Tagged: database, yetidb

Search the Site

Subscribe to my blog

Or use these links for your reader: RSS / Atom

Recent Posts

  • A tryst with B+Trees: Part I March 14, 2024
  • Tintin chases a plot for the first time in The Broken Ear March 5, 2024
  • A naive implementation of file-based storage February 26, 2024
  • YetiDB: an academic exercise February 22, 2024
  • That one time we actually trekked to Goecha-La July 9, 2023
  • Tour de Self: From Udupi to Bangalore January 3, 2023
  • Twenty Twenty-One February 23, 2022
  • Day 16: Back to Guwahati December 20, 2020
  • Day 14-15: Bomdila December 19, 2020
  • Day 12-13: Villages around Dirang December 17, 2020
  • Day 11: Dirang Monastery and Mandala Top December 15, 2020
  • Day 10: Through Sela Pass to Dirang December 14, 2020

Tags

Anime Artwork Bande Dessinée Bangalore Batman Berlin Casterman cycling Dark Project Dehradun Delhi Dharamshala Europe Trip '19 Food Graphic novel Guwahati Hergé Himachal trip '15 Himachal trip '19 Hybrid mod '17 Juda ka Talab Kasol Kerala trip '15 Kodaikanal-Ooty Trip '16 Manali Mandi Manga Munich Music NaNoWriMo North-East trip '14 North-East trip '20 Ooty Poetry Prague python Reckong Peo Rishikesh Tabo Tawang Tintin Tour of Nilgiris '16 Trekking Uttarakhand trip '17 Vietnam trip '15

Copyright © 2025 Sauvik Biswas.

Lifestyle Hack WordPress Theme by Sauvik Biswas modding themehit.com