Writing a sqlite clone from scratch in Java
Part 2 - World's Simplest SQL Compiler and Virtual Machine
As a data engineer, I use relational databases every day at my job, but they’re a black box to me. Some questions I have:
In other words, how does a database work?
To figure things out, I’m writing a database from scratch. It’s modeled off sqlite because it is designed to be small with fewer features than MySQL or PostgreSQL, so I have a better hope of understanding it. The entire database is stored in a single file!
There’s lots of documentation of sqlite internals on their website, plus I’ve got a copy of SQLite Database System: Design and Implementation.
![]() |
A query goes through a chain of components in order to retrieve or modify data. The front-end consists of the:
The input to the front-end is a SQL query. the output is sqlite virtual machine bytecode (essentially a compiled program that can operate on the database).
The back-end consists of the:
The virtual machine takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.
Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
The pager receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
The os interface is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I’m not going to support multiple platforms.
A journey of a thousand miles begins with a single step, so let’s start with something a little more straightforward: the REPL.
Sqlite starts a read-execute-print loop when you start it from the command line:
~ sqlite3
SQLite version 3.16.0 2016-11-04 19:09:39
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table users (id int, username varchar(255), email varchar(255));
sqlite> .tables
users
sqlite> .exit
~
To do that, our main function will have an infinite loop that prints the prompt, gets a line of input, then processes that line of input:
public static void main(String[] args) {
new Main().run();
}
public void run() {
while (true) {
printPrompt();
String input = readInput();
Objects.requireNonNull(input);
if (input.isEmpty()) {
continue; // do nothing and loop
}
if (input.equals(".exit")) {
System.out.println("Exiting - Good bye.");
System.exit(0);
} else {
System.out.printf("Unrecognized command: '%s'%n", input);
}
}
}
printPrompt()
prints a prompt to the user. We do this before reading each line of input.
private void printPrompt() {
System.out.print("homemadeDB > ");
}
To read a line of input, we use a (buffered) InputStreamReader:
private String readInput() {
String inputLine = null;
try {
BufferedReader is = new BufferedReader(new InputStreamReader(System.in));
inputLine = is.readLine();
} catch (IOException e) {
System.out.println("IOException: " + e);
}
return inputLine;
}
Finally, we parse and execute the command.
There is only one recognized command right now : .exit
, which terminates the program. Otherwise we print an error message and continue the loop.
Also, if the input is empty, we loop again.
if (input.equals(".exit")) {
System.out.println("Exiting - Good bye.");
System.exit(0);
} else if (input.isEmpty()) {
// do nothing and loop
} else {
System.out.println("Unrecognized command.");
}
Let’s try it out!
~ javac Main.java
homemadeDB > .tables
Unrecognized command: '.tables'
db > .exit
Exiting - Good bye.
~
Alright, we’ve got a working REPL. In the next part, we’ll start developing our command language. Meanwhile, here’s the entire program from this part:
package org.cyrilou242.homemadedb;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Objects;
public class Main {
public static void main(String[] args) {
new Main().run();
}
public void run() {
while (true) {
printPrompt();
String input = readInput();
Objects.requireNonNull(input);
if (input.isEmpty()) {
continue; // do nothing and loop
}
if (input.equals(".exit")) {
System.out.println("Exiting - Good bye.");
System.exit(0);
} else {
System.out.printf("Unrecognized command: '%s'%n", input);
}
}
}
private void printPrompt() {
System.out.print("homemadeDB > ");
}
private String readInput() {
String inputLine = null;
try {
BufferedReader is = new BufferedReader(new InputStreamReader(System.in));
inputLine = is.readLine();
} catch (IOException e) {
System.out.println("IOException: " + e);
}
return inputLine;
}
}