Background
A Fortune 500 CRM (Customer Relationship Management) and cloud solutions provider faced the complex challenge of exponential growth in integration, compute, and data warehousing costs caused by the rapid acceleration of data volumes across its data landscape. The organization needed to move its data and analytics from Snowflake to a custom-built Hive solution on the Amazon AWS (Amazon Web Services) platform to reduce costs and improve performance.
Why Definian
While the Client has significant in-house data analytics and cloud infrastructure skills, they needed data engineering expertise to complete this initiative in the desired time-frame. The client had previously worked with Definian on data governance and integration projects, and through that experience, knew Definian had the necessary skills, accelerators, and methods to carry out this project efficiently and effectively. This was validated by Definian's three decades of experience building complex data engineering solutions.
The Project: A Joint Effort Across Four Milestones
Like many large transformative initiatives, this project simultaneously posed significant risk and value. To reduce project risk and maximize value along the way, the initiative was split into four distinct milestones. This modular approach enabled the Client to realize value throughout the initiative without disrupting current processes. It also enabled the Client and Definian to focus their energy on their respective strengths.
Being a pioneer in cloud applications and data modeling, the Client owned the design and development of the new analytics platform. The Client leveraged Definian’s data engineering solutions to minimize development time and maximize data pipeline throughput. While Definian upgraded the data pipelines that fed their analytics platforms, the Client focused on data models and cloud architectures.
Milestone 1: Migrate Jitterbit Integrations to AWS Glue
The project's first milestone focused on replacing approximately 300 Jitterbit integrations that connected the Client's operational data to their primary analytics data stores in Snowflake and Redshift. To help keep this milestone on track, Definian used its integration design frameworks and reference library to reverse engineer the poorly documented legacy Jitterbit integrations and replicate them in AWS glue.
Milestone 2: Design and Build the Pipelines for the Future State Data and Analytics Platform
While the Client focused on designing and developing the Hive database in AWS infrastructure,Definian designed and built the future state integration framework and process. Collaborating closely with the Client, Definian enhanced the integrations from Milestone 1 to easily re-point to the new analytics warehouse during cut-over. Additionally, Definian and the Client found opportunities to rationalize and improve the performance of existing integrations. As part of the improvements, Definian increased pipeline efficiency by transitioning/mirroring the ETLs from AWS Glue to Apache Airflow.
Milestone 3: Migrate from Snowflake to Hive
With the new analytics platform operational, it was time to migrate the data and shut down Snowflake. Definian built a Snowflake to Hive pipeline to execute the migration using PySpark in Apache Airflow. This approach maximized throughput and minimized development time. To reduce downtime during the cut-over, Definian and the Client collaborated on a tight cut-over plan. The execution of the plan exceeded expectations, resulting in no downtime and an on-time go-live.
Phase 4: Consolidate Data Silos
After the new analytics platform went live, the last step was consolidating and decommissioning additional data silos into the new analytics platform. Definian designed the pipelines and processes for this last step to enable the Client to self-execute the plan when ready. When the Client was ready to migrate, Definian provided as-needed back-up to the Client.
Impact: Improved Data Pipelines, Improved Data Analytics, Lower Costs
This complex initiative enabled long-term sustainable analytics capabilities for the Client. They have a pathway for more intelligent AI, sharper analytics, and data-driven decisions. The new data pipelines in Apache Airflow run at a lower cost and greater efficiency than the prior Jitterbit framework.
















