Skip to content

obsidiansystems/beam-automigrate

Repository files navigation

beam-automigrate

Haskell Hackage Github CI BSD3 License

Automatic migrations for beam databases!

Auto-migration example animation

Table of Contents

Getting started (User guide/reference)

This example is a literate haskell source file. You can run it interactively with the following command:

$ cabal repl readme

If you're using nix, you can enter a shell with the appropriate dependencies with the following command:

$ nix-shell

From that nix-shell, you can run cabal repl readme.

To run the example code, run main inside of the cabal repl.

For a more examples, refer to the examples folder.

Deriving an AnnotatedDatabaseSettings

Deriving an AnnotatedDatabaseSettings for a Haskell database type is a matter of calling defaultAnnotatedDbSettings. For example, given:

> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE DeriveGeneric #-}
> {-# LANGUAGE DeriveAnyClass #-}
> {-# LANGUAGE OverloadedStrings #-}
> {-# LANGUAGE TypeApplications #-}
> {-# LANGUAGE TypeFamilies #-}
> import Prelude hiding ((.))
> import Control.Category ((.))
> import Data.Proxy (Proxy(..))
> import Database.Beam.Postgres
> import Database.Beam.Schema
> import Database.Beam (val_)
> import qualified Database.Beam.AutoMigrate as BA
> import Database.Beam.AutoMigrate.TestUtils
> import Database.PostgreSQL.Simple as Pg
> import GHC.Generics
> import Data.Text
> import System.Environment (getArgs)
>
> data CitiesT f = City
>   { ctCity     :: Columnar f Text
>   , ctLocation :: Columnar f Text
>   , ctCapital  :: Columnar f Bool
>   }
>   deriving (Generic, Beamable)
>
> data WeatherT f = Weather
>   { wtId             :: Columnar f Int
>   , wtCity           :: PrimaryKey CitiesT f
>   , wtTempLo         :: Columnar f Int
>   , wtTempHi         :: Columnar f Int
>   }
>   deriving (Generic, Beamable)
>
> data ForecastDB f = ForecastDB
>   { dbCities   :: f (TableEntity CitiesT)
>   , dbWeathers :: f (TableEntity WeatherT)
>   }
>   deriving (Generic, Database be)
>
> instance Table CitiesT where
>   data PrimaryKey CitiesT f = CityID (Columnar f Text)
>     deriving (Generic, Beamable)
>   primaryKey = CityID . ctCity
>
> instance Table WeatherT where
>   data PrimaryKey WeatherT f = WeatherID (Columnar f Int)
>     deriving (Generic, Beamable)
>   primaryKey = WeatherID . wtId

Then calling defaultAnnotatedDbSettings will yield:

> forecastDB :: BA.AnnotatedDatabaseSettings Postgres ForecastDB
> forecastDB = BA.defaultAnnotatedDbSettings defaultDbSettings

Where defaultDbSettings is the classic function from beam-core.

Overriding the defaults of an AnnotatedDatabaseSettings

It is likely that the end user would like to attach extra meta-information to an AnnotatedDatabaseSettings. For example, the user might want to specify which is the default value for a nullable field, or simple express things like uniqueness constraints or foreign keys (when the inference algorithm cannot proceed due to ambiguity). To do this, we can piggyback on the familiar API from beam-core. For example, we can do something like this:

> annotatedDB :: BA.AnnotatedDatabaseSettings Postgres ForecastDB
> annotatedDB = BA.defaultAnnotatedDbSettings defaultDbSettings `withDbModification` dbModification
>   { dbCities =
>       BA.annotateTableFields tableModification { ctCapital = BA.defaultsTo $ val_ False }
>         <> BA.uniqueConstraintOn [BA.U ctCity, BA.U ctLocation]
>   , dbWeathers = BA.annotateTableFields tableModification <>
>       BA.foreignKeyOnPk (dbCities defaultDbSettings) wtCity BA.Cascade BA.Restrict
>   }

This is where we deviate from beam-migrate. Beam-migrate forces you to specify "annotations" for each and every field of each and every column, making everything a bit verbose. Here we decided instead to use the other API that beam-core offers, that is typically used to modify the field names for a standard DatabaseSettings, but it turns out its types are general enough to be used for an AnnotatedDatabaseSettings, just by adding some extra functions (the ones prefixed with BA, the rest is the standard beam-core API).

This allows us to still override defaults when we need to without all the extra boilerplate. This API and these combinators simply manipulates under the hood the particular TableFieldSchema associated to each column, so that the information contained within can be "spliced back" when we generate a Schema out of an AnnotatedDatabaseSettings.

Deriving a Schema

Once we have an AnnotatedDatabaseSettings, the next step is to generate a Schema. This can be done simply by calling fromAnnotatedDbSettings, like so:

> hsSchema :: BA.Schema
> hsSchema = BA.fromAnnotatedDbSettings annotatedDB (Proxy @'[])

This will generate something like this:

Schema
    { schemaTables = fromList
        [
            ( TableName { tableName = "cities" }
            , Table
                { tableConstraints = fromList
                    [ PrimaryKey "cities_pkey"
                        ( fromList
                            [ ColumnName { columnName = "city" } ]
                        )
                    ]
                , tableColumns = fromList
                    [
                        ( ColumnName { columnName = "city" }
                        , Column
                            { columnType = SqlStdType ( DataTypeChar True Nothing Nothing )
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ,
                        ( ColumnName { columnName = "location" }
                        , Column
                            { columnType = SqlStdType ( DataTypeChar True Nothing Nothing )
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ]
                }
            )
        ,
            ( TableName { tableName = "weathers" }
            , Table
                { tableConstraints = fromList
                    [ PrimaryKey "weathers_pkey"
                        ( fromList
                            [ ColumnName { columnName = "id" } ]
                        )
                    , ForeignKey "weathers_city__city_fkey"
                        ( TableName { tableName = "cities" } )
                        ( fromList
                            [
                                ( ColumnName { columnName = "city__city" }
                                , ColumnName { columnName = "city" }
                                )
                            ]
                        ) NoAction NoAction
                    ]
                , tableColumns = fromList
                    [
                        ( ColumnName { columnName = "city__city" }
                        , Column
                            { columnType = SqlStdType ( DataTypeChar True Nothing Nothing )
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ,
                        ( ColumnName { columnName = "id" }
                        , Column
                            { columnType = SqlStdType DataTypeInteger
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ,
                        ( ColumnName { columnName = "temp_hi" }
                        , Column
                            { columnType = SqlStdType DataTypeInteger
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ,
                        ( ColumnName { columnName = "temp_lo" }
                        , Column
                            { columnType = SqlStdType DataTypeInteger
                            , columnConstraints = fromList [ NotNull ]
                            }
                        )
                    ]
                }
            )
        ]
    , schemaEnumerations = fromList []
    }

Notable things to notice:

  • Foreign keys have been automatically inferred;
  • Each column is mapped to a sensible SQL datatype;
  • Non Maybe/Nullable types have a NotNull constraint added;

Generating an automatic migration

Once a Schema has been generated, in order to automatically migrate your schema, it is sufficient to write something like this:

>
> exampleShowMigration :: Connection -> IO ()
> exampleShowMigration conn = runBeamPostgres conn $
>   BA.printMigration $ BA.migrate conn hsSchema
>
> exampleAutoMigration :: Connection -> IO ()
> exampleAutoMigration conn =
>   BA.tryRunMigrationsWithEditUpdate annotatedDB conn
>
> main :: IO ()
> main = do
>   args <- getArgs
>   let (getLine', connMethod) = case args of
>         -- The "ci" argument allows the readme to be run and tested in a headless
>         -- environment (e.g., by a continuous integration server)
>         ["ci"] -> (return "y", ConnMethod_Direct $ defaultConnectInfo { connectDatabase = "readme" })
>         _ -> (getLine, ConnMethod_Gargoyle "readme-db")
>   withConnection connMethod $ \conn -> Pg.withTransaction conn$ do
>     putStrLn "----------------------------------------------------"
>     putStrLn "MIGRATION PLAN (if migration needed):"
>     putStrLn "----------------------------------------------------"
>     exampleShowMigration conn
>     putStrLn "----------------------------------------------------"
>     putStrLn "MIGRATE?"
>     putStrLn "----------------------------------------------------"
>     putStrLn "Would you like to run the migration on the database in the folder \"readme-db\" (will be created if it doesn't exist)? (y/n)"
>     response <- getLine'
>     case response of
>       "y" -> exampleAutoMigration conn
>       "Y" -> exampleAutoMigration conn
>       _ -> putStrLn "Exiting"
>

The exampleAutoMigration function will try to generate another Schema, this time from the Postgres database and the two Schemas will be "diffed together" in order to compute the list of edits necessary to migrate from the DB to the Haskell database. To begin with, we can call exampleShowMigration to "preview" the full SQL command that will be run:

Ok, 12 modules loaded.
-> import Database.Beam.Migrate.Example.ForeignKeys
-> exampleShowMigration
CREATE TABLE "cities" (city VARCHAR NOT NULL, location VARCHAR NOT NULL);

CREATE TABLE "weathers" (city__city VARCHAR NOT NULL, id INT NOT NULL, temp_hi INT NOT NULL, temp_lo INT NOT NULL);

ALTER TABLE "cities" ADD CONSTRAINT "cities_pkey" PRIMARY KEY (city);

ALTER TABLE "weathers" ADD CONSTRAINT "weathers_pkey" PRIMARY KEY (id);

ALTER TABLE "weathers" ADD CONSTRAINT "weathers_city__city_fkey" FOREIGN KEY (city__city) REFERENCES "cities"(city);

Once we are satisfied with visual inspection, we can run it via exampleAutoMigration. If all is well, we can now check how the DB has been migrated. If we try to call exampleShowMigration again, no output should be shown, because the DB is now up-to-date with the Haskell types.

Current design (10_000ft overview)

Beam itself provides a DatabaseSettings type. A value of this type is typically derived generically from the Haskell datatypes representing the database schema, but can be amended. The primary purpose of DatabaseSettings is to provide a mapping between Haskell names for tables and columns and the corresponding DB-side names.

On top of this, we provide an AnnotatedDatabaseSettings type. This is similar in spirit to the CheckedDatabaseSettings provided by the original beam-migrate, but a bit simpler. On top of DatabaseSettings, the AnnotatedDatabaseSettings contain additional information, in particular constraints that tables and columns must satisfy. Once again, a value of this type can be derived generically, but the information can be amended.

Both DatabaseSettings and AnnotatedDatabaseSettings follow the structure of the Haskell datatypes comprising the schema, and are therefore quite strongly typed.

From an AnnotatedDatabaseSettings value, we can internally derive a Schema. This is a straightforward representation of a DB schema without any type-level magic.

We can similarly generate a Schema value for the DB schema currently stored in the database. We can then diff the two Schemas to get a Diff which determines a list of Edits. Such edits can be applied in a particular (prioritised) order to migrate the DB schema to the Haskell schema.

There are two important stages in the library, the first one being when we call defaultAnnotatedDbSettings and the second one when we call fromAnnotatedDbSettings. The first function converts from a DatabaseSettings into an AnnotatedDatabaseSettings whereas the latter convert from an AnnotatedDatabaseSettings into a Schema. Both uses generic-derivation but in a different way and for different purposes.

Deriving information from a DatabaseSettings

During this phase we "zip" all the tables together and we essentially convert each DatabaseEntity into an AnnotatedDatabaseEntity. The latter is ever so slightly similar to the former but crucially it embeds extra information. One way to see this is that exactly as a DatabaseEntity carry around a TableSettings which carries meta-information on the naming of each particular column for each table, an AnnotatedDatabaseEntity carries what's called a TableSchema, which is defined as:

-- | A table schema.
type TableSchema tbl =
    tbl (TableFieldSchema tbl)

-- | A schema for a field within a given table
data TableFieldSchema (tbl :: (* -> *) -> *) ty where
    TableFieldSchema
      ::
      { tableFieldName :: Text
      , tableFieldSchema :: FieldSchema ty }
      -> TableFieldSchema tbl ty

data FieldSchema ty where
  FieldSchema :: ColumnType
              -> Set ColumnConstraint
              -> FieldSchema ty

Looking at this, the similarity with a TableSettings is quite obvious:

type TableSettings tbl = tbl (TableField tbl)

Here is where the second generic-derivation algorithm comes in, and it "maps" each TableField with a new TableFieldSchema, which is initialised with "stock" default values for the ColumnType and Set ColumnConstraint. There values are automatically inferred thanks to the HasDefaultSqlDataType and HasSchemaConstraints typeclasses defined over at Database.Beam.Migrate.Compat. This gives us a concrete anchor point for the user to further annotate the database and override each individual table & column with extra information.

This is described later on in the "Overriding the defaults of an AnnotatedDatabaseSettings" section.

Deriving information from an AnnotatedDatabaseSettings

This is the phase where we traverse the generic representation of an AnnotatedDatabaseSettings in order to infer the Schema. It is during this phase that we try to discover Foreign keys.

Foreign key discovery can fail statically. If foreign key discovery fails, one should have the possibility to override AnnotatedDatabaseSettings before running the transformation to Schema to manually provide the necessary hints.

What is implemented

  • Support for JSON, JSONB and Range types;
  • Support for automatically inferring FKs (if unambiguous);
  • Support for annotating tables and fields via the standard, familiar beam-core API (e.g. add a table/column constraint);
  • Support for running a migration to mutate the database;

Shortcomings and limitations

  • Deriving a particular instance using deriving via could hamper Schema discovery. For example let's imagine we have:
data Foo = Bar | Baz deriving HasDefaultSqlDataType via (DbEnum Foo)

data MyTable f = MyTable {
  myTableFoo :: Columnar f Foo
}

This won't correctly infer Foo is a DbEnum, at the moment, as this information is derived directly from the types of each individual columns.

  • There is no support yet for specifying FKs in case the discovery algorithm fails due to ambiguity;
  • There is no support yet for manual migrations;
  • There is no support/design for "composable databases";
  • Some parts of the library are Pg-specific.

Contributors

This library was originally written for Obsidian Systems by Alfredo Di Napoli and Andres Löh of Well-Typed. Other contributors include Dan Bornside, Sean Chalmers, Ryan Trinkle, and Ali Abrar of Obsidian Systems.