Make dbt Magic with Mage

Get started

Request demo

All articles

Mage Pro

Your AI data engineer

Get started

Get a demo

Share on LinkedIn

Cole Freeman

May 11, 2024

TLDR

In this tutorial, we integrate dbt with Mage to create a data pipeline, moving data from a source to a PostgreSQL database and performing SQL transformations through staged models. By setting up Docker and PostgreSQL, and following a step-by-step process, we effectively manage data orchestration and analytics using Mage and dbt.

Outline

Introduction to dbt
Mage quickstart
dbt project setup
Setting up dbt packages
Let’s create some datad
bt integration and staging layer
SQL transformations
Conclusion

Introduction to dbt

Mage is a magical data engineering tool, and its integration with data build tool (dbt) makes it even more magical. Using dbt core blocks within a Mage data pipeline allows developers to easily generate SQL transformations and load data into the different schemas of a database. This demonstration will require the understanding of some YAML code and other analytics engineering best practices for transforming and loading data.

For this project you will need Docker and a local PostgreSQL database. Your goal is to move data from your source location to the target PostgreSQL database.

Let’s get started on our basic project.

Mage Quickstart

After you install docker and a PostgreSQL database you can follow the steps below to start a Mage instance in a docker container.

Run the following command to start your Mage Project with dbt integration (The command below is for a mac, see Mage’s documentation for other operating systems.

1 docker run -it -p 6789:6789 -v $(pwd):/home/src mageai/mageai /app/run_app.sh mage start [project_name]

If you are using Docker Desktop you can open Mage from the GUI by clicking the link in the Ports. It should look like this 6789:6789. If you are not using Docker Desktop open the Mage overview page at http://localhost:6789

Click the new pipeline button and then click Standard (Batch) option located at the top left of the GUI to begin the pipeline.
Either use the name provided by Mage or delete it and customize a name for your pipeline in the popup screen provided. You can give the pipeline a description if you want.
Hit create and you will be taken to the Pipeline editor GUI in Mage.

dbt project setup

Setting up a dbt project is straight forward in Mage, where building your profiles.yml file is prompted upon setup. Initially we will set up by navigating to our dbt folder within our project and then begin the initial setup from there. Follow these steps below:

From the src folder navigate to your dbt folder by running the command below.

1 cd <your mage project name>/dbt

Once inside the dbt folder run the command below to create your dbt project. After you run the command you should see the dbt folder structure populate.

1 dbt init <dbt project name

After the folder structure populates you will be prompted to enter some information in the terminal to build out the profiles.yml file in your dbt folder. The profiles.yml file should look like the code below

1 dbt_tutorial:
2   outputs:
3     dev:
4       dbname: <your db name> # your postgres database name remove < >
5       host: host.docker.internal
6       pass: <your db password> # your postgres database password remove < >

💡NOTE: Copy the profiles.yml file above and include it in the hierarchy of your dbt project folder by right clicking on the folder name and selecting create New file. Name the file profiles.yml.

Setting Up dbt Packages

While this project does not utilize any dbt packages, if you want to take your new magical powers a step further in creating a more detailed project, it’s necessary to know this information. Utilize dbt packages in Mage by following these instructions below:

Right click on the your dbt_project folder (dbt_tutorial) and select Create New Folder from the drop down menuGenerate a file called packages.yml from the popup menu and click Create file. After this you should see a packages.yml file populate in your dbt_tutorial folder.

Enter in the YAML code below to install set up the dbt packages for install

1 packages:
2 - package: dbt-labs/dbt_utils
3      version: 1.1.1
4    - package: calogica/dbt_date
5      version: 0.10.0

Hover over the right pop out and select the terminal

From the terminal navigate to your dbt project folder and run the code below

1 dbt deps

With a wave of your wand and your magical dbt powers within Mage, you installed two of the most popular dbt dependency packages, dbt_utils and calogica. Next we will customize the dbt_projects.yml file and will be ready to start extracting, transforming, and loading data.

Customize the dbt_project.yml File

Lets go ahead and create the formulation to generate the dbt_project.yml file. There are a few simple additions you need wave your magic wand over to move the data into your PostgreSQL database. Follow the formula below for the magic touch:

Click the dbt_project.yml file located in your dbt project folder, the text editor popup should appear.Edit the file based on the picture below (if you are familiar with YAML and how it interacts with dbt sql files, you can customize this file to your liking).

Save the file and close the text editor, and get ready for the real magic to start

Let’s create some data

Tutorials are best when experienced, they mimic real life examples. In the wild, Mage is used to orchestrate data and can perform all the ETL operations. If you are a company who already uses dbt, Mage has you covered with a managed dbt Core integration.

The data we will formulate for this tutorial mimics a medical risk score analysis. Our goal is to conjure up an analytics engineering pipeline that has a staging layer, a refined layer, and a serving layer.

Lets get started:

From the edit pipeline page select the data loader block
Choose Python, and then from the expanded choices choose
Generic (no template).Replace the boilerplate code in the Mage Data Loader with the code below. This will create a medical records table that somewhat resembles real life.

1 import random
2 import string
3 import pandas as pd
4 from typing import Dict, List
5 from faker import Faker
6 import random
7 
8 if 'data_loader' not in globals():
9     from mage_ai.data_preparation.decorators import data_loader
10 if 'test' not in globals():
11     from mage_ai.data_preparation.decorators import test
12 @data_loader
13 def load_data(*args, **kwargs):
14     """
15     Template code for loading data from any source.
16     Returns:
17         Anything (e.g. data frame, dictionary, array, int, str, etc.)
18     """
19     # Initialize Faker
20     fake = Faker()
21   # Number of samples to generate
22     num_samples = 20000
23   # Generate data
24     data = {
25         'patient_id': [f'PID{str(i).zfill(5)}' for i in range(1, num_samples + 1)],
26         'name': [fake.name() for _ in range(num_samples)],
27         'date_of_birth': [fake.date_of_birth(minimum_age=18, maximum_age=90).strftime('%Y-%m-%d') for _ in range(num_samples)],
28         'diastolic_bp': [random.randint(60, 90) for _ in range(num_samples)],
29         'systolic_bp': [random.randint(90, 140) for _ in range(num_samples)],
30         'pulse': [random.randint(50, 100) for _ in range(num_samples)],
31         'height_in_inches': [round(random.uniform(57, 84), 1) for _ in range(num_samples)],
32         'weight_in_lbs': [round(random.uniform(100, 350), 1) for _ in range(num_samples)]
33         }
34         # Create DataFrame
35     df = pd.DataFrame(data)
36     return df
37 @test
38 def test_output(output, *args) -> None:
39     """
40     Template code for testing the output of the block.
41     """
42     assert output is not None, 'The output is undefined'

Run the code by hitting the blue right arrow button at the top right of the data loader block. You should see a sample output below the block.

After the block runs Mage prints out a sample for you to evaluate. You should now have 20,000 rows and 7 columns of data consisting of people’s mock personal information and their medical test results.

dbt Integration and Staging Layer

Mage helps manage some of the manual processes dbt Core presents the user with such as creating the sources.yml file. Let’s get started on creating the dbt project folders and files by completing the following steps:

First, create a bronze, silver, and gold folder structure under the models folder within the dbt project folder. You can also go ahead and add the files to the folders.

Create folders and files by right clicking on their respective folder locations. To create a folder select New folder, to create a new file select New file.
In the popup window give the folder or file a name and click the create button.
From the pipeline editor page click the dbt Model button and then select Single Model or Snapshot (from file)
The popup below will appear, choose the file name under the bronze folder since that is our staging layer

The pipeline editor page should now have your data loader block and a dbt model block
Next click on the edit parents button located at the top of the block and you will see a pop-out from the right side appear.
Either click inside your parent block in the dependency tree or write in the name of your block and click save dependencies.

Completing this step will create a mage_sources.yml file in your project folder. To see this file move your mouse to your file structures search bar and click the refresh button located to the right of the search bar.
The mage_sources.yml file should look similar to the YAML code below

1 sources:
2 - description: Dataframes Mage upstream blocks
3   loader: mage
4   name: mage_dbt_tutorial
5   schema: public
6   tables:
7   - description: Dataframe for block `get_data` of the `dbt_tutorial` mage pipeline.
8     identifier: mage_dbt_tutorial_get_data
9     meta:
10       block_uuid: get_data
11       pipeline_uuid: dbt_tutorial
12     name: dbt_tutorial_get_data
13 version: 2

Begin writing your SQL code in the first Mage dbt block (see example below). The source code comes from the mage_sources.yml file. The first argument within the parentheses is the sources name, the second argument is the tables name.

1 {{
2     config(
3         materialized='view'
4     )
5 }}
6 
7 SELECT
8     name
9     , date_of_birth as dob
10     , diastolic_bp
11     , systolic_bp
12     , pulse
13     , height_in_inches as height
14     , weight_in_lbs as weight
14 FROM 
16     {{source('mage_dbt_tutorial', 'dbt_tutorial_get_data')}}

Execute the code within Mage by hitting the run button at the top of the block or command enter
You should see the logs begin running and a sample of the query return when the block completes
Execute the code below

1 dbt run --select bronze_medical

Executing the dbt run command builds the model and exports it to your target database. Check your PostgreSQL database and you will see a view in the Public schema called bronze_medical, or whatever name you gave your first model.

SQL Transformations

You just landed your data as a staging file in your bronze folder. Once we are ready to run the dbt files this staging file will be pushed to your PostgreSQL database as well. But first let’s run some SQL transformations on the project.

Our magical medical office gave us a requirement that we should remove people’s names and dates of birth, but generate their age and include it as a column in the model. For the silver model file complete the following:

Write the following SQL code to generate the Age of a person, but remove their name and date of birth from the data

1 {{
2     config(
3         materialized='view'
4     )
5 }}
6 
7 WITH bronze_Silver as (
8     SELECT
9         *
10         ,FLOOR(EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob))) AS age
11     FROM 
12         {{ ref('bronze_medical') }}
13 )
14 
15 SELECT
16     patient_id
17     , age
18     , diastolic_bp
19     , systolic_bp
20     , pulse
21     , height
22     , weight
23 FROM 
24     bronze_silver

Execute the dbt run command to

1 dbt run --select silver_medical

Next the medical analysis team needed to create some business rules about how they scored patients’ heart health risk. The following requirements are needed to help patients and doctors monitor their risk of having heart issues:

Develop a score rating for age, diastolic_bp, systolic_bp, pulse, and bmi
Sum the score on a new column
Develop of score rating for low, medium, and high risk

Make sure to include the business rules in the gold model by taking the following steps:

Write your sql code as below with the rating system, then compile and run the model.

1 WITH patient_bmi AS (
2     SELECT
3         patient_id
4         , age
5         , diastolic_bp
6         , systolic_bp
7         , pulse
8         , height
9         , weight
10         , ROUND(CAST((weight / (height * height) * 703) AS numeric), 2) AS bmi
11     FROM 
12         {{ ref('silver_medical') }}
13 ),
14 risk_score_calculation AS (
15     SELECT
16         *
17         , CASE 
18             WHEN age < 40 THEN 2 
19             WHEN age BETWEEN 40 AND 60 THEN 1 
20             ELSE 0 
21         END AS age_score
22         -- Calculate diastolic_bp score (0 = bad, 1 = medium, 2 = good)
23         , CASE 
24             WHEN diastolic_bp < 60 OR diastolic_bp > 80 THEN 0 
25             ELSE 2 
26         END AS diastolic_bp_score
27         -- Calculate systolic_bp score (0 = bad, 1 = medium, 2 = good)
28         , CASE 
29             WHEN systolic_bp < 90 OR systolic_bp > 120 THEN 0 
30             ELSE 2 
31         END AS systolic_bp_score
32         -- Calculate pulse score (0 = bad, 1 = medium, 2 = good)
33         , CASE 
34             WHEN pulse < 60 OR pulse > 100 THEN 0 
35             ELSE 2 
36         END AS pulse_score
37         -- Calculate bmi score (0 = bad, 1 = medium, 2 = good)
38         , CASE 
39             WHEN bmi < 18.5 OR bmi >= 30 THEN 0 
40             WHEN bmi BETWEEN 25 AND 29.9 THEN 1 
41             ELSE 2 
42         END AS bmi_score
43     FROM 
44         patient_bmi
45 ),
46 total_risk_score AS (
47     SELECT
48         *
49         , age_score + diastolic_bp_score + systolic_bp_score + pulse_score + bmi_score AS cardiac_risk_score
50     FROM risk_score_calculation
51 )
52 SELECT
53     *
54     , CASE
55         WHEN cardiac_risk_score <= 5 THEN 'high risk'
56         WHEN cardiac_risk_score BETWEEN 6 AND 8 THEN 'medium risk'
57         ELSE 'low risk'
58     END AS cardiac_risk
59 FROM 
60    total_risk_score

Execute the dbt run command to run all models

1 dbt run

This code will run all the models. In your PostgreSQL database you should see the bronze and silver model in the views dropdown and the gold model in the Table drop down. Your data pipeline should look similar to the picture below.

Conclusion

In this tutorial, we successfully integrated Mage, a powerful data engineering tool, with dbt to create an efficient data pipeline. We set up a development environment using Docker and PostgreSQL, initialized a new Mage project, and configured dbt for SQL transformations. By generating mock medical data and applying transformations through a structured bronze, silver, and gold model approach, we effectively demonstrated the capabilities of both Mage and dbt in managing and transforming data. This integration simplifies the orchestration of complex data workflows, enhancing your ability to perform advanced analytics and data-driven decision-making.

Tutorials

dbt

Staff Picks

Grimoire.