Integrated Systems Magic with Mage

First published on June 25, 2024

 

10 minute read

Cole Freeman

TLDR

Organizations face challenges managing vast amounts of fragmented data. Centralized data systems using integration pipelines and incremental models offer a practical solution. These systems unify data, improve quality, and enhance efficiency. Incremental models process only new or updated data, reducing computation time and costs. This approach enables faster decision-making, better resource optimization, and improved analytics capabilities. While implementation can be complex, the long-term benefits make it a valuable strategy for organizations dealing with large-scale, frequently updated data.

Outline

  • What are Integrated Pipelines

  • Medical Database Systems Integration

  • Create an Integrated Pipeline

  • Create a Trigger

  • Run a Trigger

  • Conclusion

What are Integrated Data Pipelines

In today’s data-driven world, organizations across industries face a common challenge: managing vast amounts of constantly updating information efficiently and effectively. As data volumes grow exponentially, traditional methods of data processing and analysis are becoming increasingly inadequate. Enter the game-changing concepts of incremental models and integrated data pipelines. These powerful techniques are revolutionizing how we handle, process, and derive insights from our data.

Source: Giphy

An integrated pipeline is a streamlined data processing system that automates the flow of information from multiple sources through various stages of transformation and analysis. It seamlessly connects different data processes, from extraction and cleansing to transformation and loading, ensuring data consistency and reducing manual interventions. This unified approach enables organizations to efficiently manage, process, and analyze large volumes of data from diverse sources in a cohesive and scalable manner.

Medical Database Systems Integration

Many healthcare systems find themselves drowning in a sea of patient data, spread across multiple hospitals, clinics, and departments. This fragmentation leads to numerous problems:

  1. Incomplete Patient Histories: Doctors often work with partial medical records, leading to suboptimal treatment decisions.

  2. Inefficiency: Retrieving and combining patient data from various hospitals is time-consuming and error-prone, delaying critical care.

  3. Data Inconsistency: Different hospitals may contain conflicting patient information, undermining the integrity of medical records.

  4. Scalability Issues: As patient populations grow and more detailed health data is collected, traditional processing methods become unsustainable.

These challenges call for a unified approach to medical data management — one that can handle large volumes of patient data, adapt to frequent updates, and provide a single source of truth across the entire healthcare system.

Source: Giphy

To address these challenges, forward-thinking healthcare organizations are turning to centralized medical data systems built on the foundation of data integration pipelines and incremental models. Here are some of the results Chief Technology Officers (CTO) and other administrators could expect from integrated health systems:

  1. Comprehensive Patient Profiles: A centralized system brings all patient data into one place, providing doctors with a complete view of a patient’s medical history.

  2. Improved Data Quality: Standardization and cleaning processes ensure consistency and accuracy across all patient records.

  3. Enhanced Efficiency: Automated pipelines reduce manual data entry and retrieval, saving time for healthcare professionals and improving patient care.

  4. Real-time Health Insights: Up-to-date patient data enables more timely and accurate diagnoses and treatment decisions.

  5. Scalability: These systems can handle growing patient populations and increasing amounts of health data without compromising performance.

This approach allows healthcare systems to provide better, more coordinated care while efficiently managing the ever-growing volume of medical data.

Create an Integrated Pipeline in Mage

Lets integrate two databases that have mock medical systems data. The MySQL database needs to be integrated with the main storage system which is a PostgreSQL database. The purpose of this tutorial is to spark ideas for Mage developers to implement similar techniques in their production data.

  • Navigate to the Pipelines page from the left navigation panel and click the new button to begin building the data pipeline.

  • After clicking new a dropdown menu will appear. Click the Data integration pipeline and then either create a new name for the pipeline or choose the default pipeline name.

  • Hit the create button to initiate the pipeline’s build.

  • From the select source dropdown menu select the source connector for data to be integrated into the data warehouse (for purposes of this tutorial MySQL was selected).

  • Once you make a selection configuration template will populate.

  • Configure the template to connect to your data source.

  • If you are using Docker, and connecting to a local database use the host.docker.internal command as your host.

  • If you want to use secrets for passwords and other sensitive information select Secrets from the right navigation pane and follow the instructions in the UI

  • Click the Test Connection button to see if you successfully connect to your data source.

  • Once the Connected successfully notification appears you are connected to the database and you can click the View and select streams button.

  • After clicking the View and select streams button the popup below will appear. Select the table you want integrated in your data warehouse and click the Confirm Streams button.

  • Next enter the table you want to integrate data with in the Destination table name field

  • Enter the replication method Full Table or Incremental depending on how you want your records to update upon data insertion.

  • Choose to update the table or ignore unique conflicts.

  • Change the integration features using the

    Features UI

    below

  • Remove the data type by clicking the button under the type column and then select the type of the destination table from the dropdown menu to the right of the type column.

  • If available you can check off that the column is unique, book mark the column to keep track of sync progress and incrementally sync new records, and check key prop if you want the column to be a primary key in the destination table.

  • Click the

    Load sample data

    button to see sample data which will pop out on the right side of the UI when clicked.

  • Also format features further when properties are selected. There are specific details and instructions present in the UI

  • A summary of the different streams you selected and edited is available in the UI.

  • Create transformers either through the template selection below, or create custom transformers to manipulate data.

  • Select your destination source where you will store the integrated data and configure the YAML code with specific details.

  • The table name was already configured in a previous step.

  • Test the connection. If the configuration is correct, the Connected successfully notification will appear next to the button.

  • Finally, toggle the automatically add new fields selection on if you want the stream to add new attributes if there are schema changes, and toggle on the Disable column type check if the data types for each column do not need to be the same.

After completing the integration pipeline, there is one final step in moving data from your source to the data warehouse. Create a trigger to run the pipeline and move the data.

Create a Trigger to Run the Pipeline

In Mage, triggers run pipelines. They can be configured from the Triggers UI within the pipeline editor page. Select the Triggers button from the navigation menu on the left and you will be taken to the trigger configuration UI.

  • Select the schedule option located under the trigger type.

  • From the frequency dropdown select once and then click save changes.

This is a very basic trigger in Mage. We are only trying to run the Integrated pipeline to show that data is moved from the source database to the PostgreSQL database acting as a data warehouse.

Run Trigger

By running the trigger, hit the 

run@once

 button, you should see the source data moved into your data warehouse with two additional columns. If you selected Incremental uploads you should see columns _mage_created_at and _mage_updated_at.

Conclusion

Incremental models and data integration pipelines aren’t just technical innovations – they're strategic assets driving organizational success. By enabling more efficient, scalable, and timely data processing, these tools empower organizations to unlock the full potential of their data. They pave the way for predictive analytics and data-driven innovations that transform how businesses operate in the digital age.

As data continues to grow exponentially in volume and importance, organizations that adopt these advanced data management techniques will be well-positioned to thrive in the data-driven future. They'll be able to respond swiftly to market changes, personalize experiences at scale, and drive innovation at an unprecedented pace. In an era where data is the new currency, mastering incremental models and integrated pipelines will be the key to not just surviving, but leading in the increasingly competitive and data-centric business landscape.

Are you looking for more information on Integrated data pipelines, check out 

.