Data Integration: Google BigQuery with Mage

First published on July 5, 2023

 

9 minute read

Guest post by Shashank MishraData Engineer @ Expedia

TLDR

This article outlines the integration between 

and

, a serverless data warehousing service. We'll discuss the integration process, its benefits, and how it aids businesses in making data-driven decisions.

Outline

  • Introduction to Mage

  • Overview of Google BigQuery

  • Step by step process to integrate Google BigQuery with Mage

  • Conclusion

Introduction to Mage

In an age where data is the new oil, efficient and reliable data management tools are essential.

is a platform committed to simplifying data integration and analytics. Designed for seamless data transformation and loading, Mage is transforming how businesses approach data management. Here are its key features: 

  • Automated Data Pipeline

    : Mage automates data extraction, transformation, and loading (ETL) processes. It can extract data from multiple sources, transform it to a desirable format, and load it into a data warehouse.

  • Data Connectors

    : Mage offers various data connectors to widely-used data sources like Shopify, Facebook Ads, Google Ads, Google Analytics, etc. This makes it easier to import data from these platforms.

  • Easy Integration

    : Mage provides easy integration with popular data warehouses including Google BigQuery, Amazon Redshift, and Snowflake.

  • Pre-built SQL Models

    : Mage comes with pre-built SQL models for popular e-commerce platforms like Shopify and WooCommerce. These models simplify the process of data analysis.

  • Incremental Loading

    : Mage supports incremental loading, which means only new or updated data is loaded into the data warehouse. This saves storage space and improves efficiency.

  • Data Transformations

    : Mage performs automatic data transformations, converting raw data into a more usable format. This process makes the data ready for analysis and reporting.

  • Scheduled Refresh

    : Data refreshes can be scheduled in Mage, ensuring that the data in the warehouse is always up-to-date.

  • Data Security

    : Mage places a high emphasis on data security, ensuring data privacy and compliance with GDPR and other data protection regulations.

(Source: 

)

Overview of Google BigQuery

is a highly scalable, serverless data warehouse offered by Google as part of its Google Cloud Platform (GCP). It is designed to streamline and simplify the processing of big data.

  • Serverless Architecture:

    BigQuery operates on a serverless model, which means users don't need to manage any servers or infrastructure. This means you can focus more on analysis and less on maintenance. It allows you to query massive datasets in seconds and get insights in real-time, without needing to worry about resource provision.

  • Real-Time Analytics:

    BigQuery is engineered for real-time analytics. It allows users to analyze real-time data streams instantly. With its ability to run SQL queries on petabytes of data, it delivers speedy results on real-time data analytics, enabling businesses to make timely decisions.

Google BigQuery, with its serverless architecture and real-time analytics, serves as a robust platform to handle, analyze, and draw insights from massive datasets with ease.

(Source: 

)

Step by step process to migrate Google BigQuery with Mage

Before we begin, we’ll need to create a service account key. Please read 

on how to create that.

Once we are finished, following these steps:

  1. Create a new pipeline or open an existing pipeline.

  2. Expand the left side of the screen to view the file browser.

  3. Scroll down and click on a file named

    io_config.yaml

  4. Enter the following keys and values under the key named 

    default

    (we can have multiple profiles, add it under whichever is relevant for us)

  5. Note: we only need to add the keys under 

    GOOGLE_SERVICE_ACC_KEY

    or the value for key 

    GOOGLE_SERVICE_ACC_KEY_FILEPATH

    (both are not simultaneously required).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
version: 0.1.1
default:
  GOOGLE_SERVICE_ACC_KEY:
    type: service_account
    project_id: project-id
    private_key_id: key-id
    private_key:
      "-----BEGIN PRIVATE KEY-----\nyour_private_key\n-----END_PRIVATE_KEY"
    client_email: your_service_account_email
    auth_uri: "https://accounts.google.com/o/oauth2/auth"
    token_uri: "https://accounts.google.com/o/oauth2/token"
    auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs"
    client_x509_cert_url: 
"https://www.googleapis.com/robot/v1/metadata/x509/your_service_account_email"
  GOOGLE_SERVICE_ACC_KEY_FILEPATH: "/path/to/your/service/account/key.json"

Using SQL block

  1. Create a new pipeline or open an existing pipeline.

  2. Add a data loader, transformer, or data exporter block.

  3. Select 

    SQL

    .

  4. Under the 

    Data provider

    dropdown, select 

    BigQuery

    .

  5. Under the 

    Profile

    dropdown, select 

    default

    (or the profile we added credentials underneath).

  6. Next to the 

    Database

    label, enter the database name we want this block to save data to.

  7. Next to the 

    Save to schema

    label, enter the schema name we want this block to save data to.

  8. Under the 

    Write policy

    dropdown, select 

    Replace

    or 

    Append

    (please see 

    for more information on write policies).

  9. Enter in this test query: 

    SELECT 1

  10. Run the block.

Using Python block

  1. Create a new pipeline or open an existing pipeline.

  2. Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).

  3. Select 

    Generic

    (no template).

  4. Enter this code snippet (note: change the 

    config_profile

    from 

    default

    if we have a different profile):

1
2
3
4
5
from mage_ai.data_preparation.repo_manager import get_repo_path
from mage_ai.io.bigquery import BigQuery
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame
1
2
if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader

1
2
3
4
5
@data_loader
def load_data_from_big_query(**kwargs) -> DataFrame:
    query = 'SELECT 1'
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

1
    return BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
  1. Run the block.

(Source: 

)

Conclusion

Integrating Mage with Google BigQuery provides your team with a potent combination of automated data pipeline management and robust data warehousing. This partnership not only simplifies data extraction, transformation, and loading but also provides a seamless pathway for data analysis and insight generation. As we've demonstrated in this step-by-step guide, the integration process is straightforward, making it an accessible option for businesses of all sizes. By leveraging this integration, you can unlock the full potential of your data, streamline operations, and drive data-informed decisions.