Tracking Database Schema And Schema Migrations In Git Version Control

0
68


A decade ago, I attended a presentation from Tim Cunningham titled “Version Control on the Database: the Final Frontier“. At the time, I was relatively new to version control. At InVision, we had just started using git and GitHub; and, before that, I had only dabbled briefly with SVN (Subversion). I was still struggling to wrap my head around branching and merging and rebasing. So, when Tim suggested that we should also be storing our “database” in version control, I simply wasn’t ready to hear it. I mean, it sounded awesome; but, due to my lack of experience, I couldn’t connect – at a practical level – with anything he was saying. Cut to 2022 and I can’t imagine not tracking our database schema and schema migrations in version control!

Tim Cunningham seemed to be ahead of his time, at least in the ColdFusion community. I had never seen a presentation like that before; and, now that I think about it, I don’t believe that I’ve seen a presentation like that since. Apparently, tracking schema migrations isn’t a sexy thing to talk about. Though, I will say that Chris Toomey and Stephanie Viccari often refer to Active Record Migrations (Ruby on Rails) on the Bike Shed podcast – one of my favorite podcasts.

When I was sitting in on Tim’s presentation, the missing piece for me at the time was the fact that the state of database can be expressed in code. There are two general categories for this code:

  • DDL (Data Definition Language): This code defines the structure of your database and the tables within it. Character sets, collation, columns, default values, indexes, foreign keys, etc – this is everything about the database, less the actual data.

  • DML (Data Manipulation Language): This code provides the data for the database that has been defined (and altered) using the DDLs. These are the INSERT, UPDATE, and DELETE statements that capture the state – and the changes to the that state – over time.

But, don’t let the fancy terms fool you, it’s all just SQL statements, nothing more. It’s just a series of SQL statements, executed in a predictable order, that leaves you with the most current state of your application database.

Running these SQL statements in a predictable order can get fairly fancy-pants – at work, we use a product called Liquibase to keep all of our environments in sync. But, it doesn’t have to be fancy and automated to be valuable. In my local development environment, I hand-roll my SQL statements and put them in the docker-entrypoint-initdb.d directory of my MySQL Docker container. Then, when the MySQL Docker container is created, it simply executes all of the files in this directory in alphabetical order, applying them in turn, and leaving me with a predictable database schema.

For my blog’s local development environment, if I list (ls) the contents of the initdb folder, here’s what I get:

% ls -l build/mysql/initdb 
total 203776
2020-01-11-001-initial-structure.sql
2020-01-11-007-display_status.sql
2020-01-11-008-member.sql
2020-01-11-009-people.sql
2020-01-11-012-site_photo.sql
2020-01-11-013-tag.sql
2020-03-21-002-blog_entry_blog_entry_jn.sql
2020-03-21-003-blog_entry_tag_jn.sql
2020-03-22-001-blog_entry.sql
2021-11-18-001-drop-old-tables.sql
2021-11-19-001-drop-exercise-list.sql
2021-11-19-002-drop-kinky-calendar.sql
2021-11-19-003-drop-is-initialized-column.sql
2021-11-19-004-drop-old-admin-table.sql
2021-11-22-001-comment-edit-token-table.sql
2021-12-05-001-change-default-charset.sql
2021-12-05-002-comment-moderation.sql
2021-12-15-001-dropping-extra-indices.sql
2022-01-06-001-drop-comment-author-columns.sql
2022-01-22-001-recreate-comment-edit-token.sql
2022-02-28-001-mysql8-charset-defaults.sql

As you can see, all of these files follow the same format:

{ yyyy-mm-dd }-{ counter }-{ description }

Since the MySQL Docker container runs these SQL files in alphabetical order, I use the date-prefix with a counter in order to make sure that these are executed in a top-down manner. Every time I teardown my local development environment, the database is destroyed. And, every time I bring it back up again, these SQL files run against the empty database and leave me with a “known good state”.

ASIDE: You don’t actually have to destroy your database every time you shutdown your Docker development environment. You can configure the Docker container to store the MySQL data outside of the container, which will persist it across down/up cycles.

Now, you may notice that my first DDL (Data Definition Language) file is from January 2020. Of course, if you look at my blog posts, you’ll see that I started blogging in 2006. That means it took me 14 years to start properly tracking my blog’s schema changes.

Yikes! That’s a bit embarrassing.

But, the good news is, it means that it’s never too late to start tracking database schema migrations, even in a brown-field application like mine. All I did to get started was go into my MySQL Database GUI (Graphical User Interface) of choice (Navicat for MySQL), and dump the entire structure of the database to a SQL file:

Navicat for MySQL showing a 'Structure Only' SQL dump about to be executed as the basis of the first DDL (Data Definition Language) migration file.

This feature in Navicat for MySQL generates a .sql file with a series of CREATE TABLE statements, one for each table that currently exists in the application. I took that .sql file, renamed it 2020-01-11-001-initial-structure.sql and then dropped it in my build directory.

My first DDL (Data Definition Language) file! Woot woot!

Then, I selectively went to each table and dumped its data out to a separate SQL file using the same GUI feature (selecting the Structure + Data option rather than the Structure Only option). This generated SQL files with a bunch of INSERT INTO statements. I renamed those files and dropped them into my build directory.

My first DML (Data Migration Language) files! Hecks to the yeah!

At that point, I had a set of .sql files that allowed me to recreate the current state of my database. Files that were checked into my docker repository; files that would be executed from a blank slate; files that would leave me with database structure that exactly matched my production database no matter where or when I ran them.

ASIDE: While I was committing all the table “structures” to git, I wasn’t committing all the table “data” to the schema migrations. For example, I never committed the comment table data as a DML – it would be too much data. If I want the comments to be loaded locally, I’ll run that as a special SQL file after the fact.

From that point on, it just became a matter of generating new database schema migration files locally first before running migrations in production. This way, I make sure to always keep my local development database in sync with my production database. In fact, my local development database has become my “source of truth” for the database schema. And, I have to manually apply migrations to production (I don’t have any fancy build tooling for my blog).

A decade ago, databases felt a lot more like a black-box. So, when Tim talked about persisting them to version control, I didn’t have the mental model I needed to understand how something like that would even work. But, once I realized that the database could be created and migrated with a series of .sql files, suddenly it all clicked. It all became “just code”. And SQL files, like ColdFusion, JavaScript, HTML, and CSS files, can be committed to version control all the same.

I stated on the Working Code podcast, episode 57, one of my big goals for 2022 is to become better at “doing containers” and deploying them to production. But for now, hopefully this is helpful enough.





Source link

Leave a reply

Please enter your comment!
Please enter your name here