Working With a Titanic Database in R and SQLite
When working with relational databases, it’s often helpful to practice by creating a small project from scratch. In this example, we’ll use R together with SQLite to create a Titanic passenger table, insert some synthetic data, and then query it back. This is a great way to learn the basics of database management with R.
Setting Up the Connection
The first step is to load the necessary libraries and connect to a database file. If the file doesn’t already exist, SQLite will create it for you. In our case, the file is called titanic.db.
library(DBI)
library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), "titanic.db")
At this point, we have an open connection to the database. Think of it as opening a channel of communication between R and SQLite.
Creating the Table
Next, we define the structure of our table — the schema. In this example, the table is named titanic, and it includes columns like PassengerID, Survived, PClass, Name, Age, Parch, and Ticket. The most important column is PassengerID, which we declare as INTEGER PRIMARY KEY. This means every passenger must have a unique identifier, and SQLite will automatically manage this ID as new rows are inserted.
dbExecute(con, "
CREATE TABLE titanic (
PassengerID INTEGER PRIMARY KEY,
Survived INTEGER,
PClass INTEGER,
NAME TEXT,
Age REAL,
Parch INTEGER,
Ticket TEXT
)
")
By defining the table this way, we ensure that each row represents a single passenger with all of their associated attributes.
Generating Synthetic Data
Since we don’t want to type in all the Titanic passenger records by hand, we’ll generate some test data in R. Here we create 20 passengers with random attributes such as survival status, passenger class, age, and ticket numbers.
set.seed(123) # reproducibility
n <- 20
titanic_data <- data.frame(
PassengerID = 1:n,
Survived = sample(c(0,1), n, replace = TRUE),
PClass = sample(1:3, n, replace = TRUE),
NAME = paste("Passenger", 1:n),
Age = round(runif(n, 1, 70), 1),
Parch = sample(0:3, n, replace = TRUE),
Ticket = paste0("T", sprintf("%04d", sample(1000:9999, n)))
)
This gives us a small but realistic dataset that mimics the structure of the Titanic manifest.
Writing Data to the Database
To save the generated dataset into the database, we use dbWriteTable(). Here’s where you need to be careful. If you use overwrite = TRUE, R will drop the existing table and recreate it based on the dataframe. This can accidentally remove constraints like PRIMARY KEY. If you want to keep the schema exactly as you defined it, you should use append = TRUE instead.
dbWriteTable(con, "titanic", titanic_data, append = TRUE)
With this approach, the schema is preserved and new rows are simply added to the table.
Reading and Querying Data
Once the data is inside the database, you can pull it back into R either by reading the whole table or by running a query. For example, if you want to view just the first 10 passengers along with their survival status and passenger class:
dbGetQuery(con, "SELECT PassengerID, Survived, PClass FROM titanic LIMIT 10")
This gives you a clean slice of the table, much like running head() on a dataframe.
This function, get_passenger(), makes it easy to retrieve a single passenger’s record from the Titanic database by their unique PassengerID. Instead of rewriting an SQL query each time, you simply call the function with the ID you want. The use of the ? placeholder with params ensures that the query is parameterized, which is both safer (prevents SQL injection) and cleaner to read. In the example, calling get_passenger(con, 154) returns all the information stored for the passenger whose ID is 154, and saves it into a dataframe df. This dataframe can then be used for further analysis or reporting in R just like any other dataset.
# function to query a passenger by ID
get_passenger <- function(con, passenger_id) {
dbGetQuery(
con,
"SELECT * FROM titanic WHERE PassengerID = ?",
params = list(passenger_id)
)
}
df <- get_passenger(con, 154)
df
The function is also flexible such that it also have two inputs
get_passenger_cols <- function(con, passenger_id, cols = c("PassengerID", "Survived")) {
col_string <- paste(cols, collapse = ", ")
sql <- paste0("SELECT ", col_string, " FROM titanic WHERE PassengerID = ?")
dbGetQuery(con, sql, params = list(passenger_id))
}
# Example: only get PassengerID and Age
get_passenger_cols(con, 154, cols = c("PassengerID", "Age"))
Cleaning Up
Finally, it’s always good practice to close the connection once you’re done.
dbDisconnect(con)
This ensures the database file is released and not left in a locked state.
Key Takeaways
Use dbConnect() to open or create a SQLite database file.
Define your schema with CREATE TABLE, making sure to set a primary key for uniqueness.
Use dbWriteTable() with append = TRUE to add data without dropping your schema.
Query the data back with dbGetQuery() just as you would with SQL in any other database.
Always remember to disconnect when finished