Success story

Migrating data pipelines and database structures from Cloudera to GCP services for a global leader in Consumer Packaged Goods industry

September, 2019- ongoing

Customer

Global FMCG (Fast Moving Consumer Goods)/ CPG (Consumer Packaged ) Company

Industry

Consumer Goods

Services

Data Migration

Technologies

Google Cloud Products: DataProc, BigQuery, GKE (Google Kubernetes Engine)

Service Level Agreement

Challenge

Multiple data sources contained various semi-structured data types and suffered from data quality problems. The goal was to enhance the cost efficiency of campaigns in linear TV planning and purchasing processes by constructing pipelines utilizing Kubeflow services. This approach aimed to streamline the overall system performance, enhance the reliability of data transformation, and optimize Python-based advertising procedures.

Our approach

The current data pipelines have been migrated to DataProc, GCS, and Composer. To enhance scalability, we have containerized the Python ad optimization code, enabling us to execute expandable tasks on Kubeflow hosted in GKE. By utilizing Kubeflow pipelines and node pools, we can efficiently manage job resources, taking into account the diverse hardware resource needs across different scenarios. This approach allows us to optimize resource utilization and ensure a better fit for the specific workloads required.

Kubernetes monitoring tools
Graph database use cases

Outcome

Cloudera data pipelines were successfully migrated to the GCP platform. The new data pipelines have been enhanced to ensure cost-effectiveness and ease of maintenance. Fast response times is guaranteed by utilizing the BigQuery cache. By leveraging GKE, Kubeflow, and Docker images, jobs can be executed on various code versions and hardware resources. The process of initiating optimization jobs has been streamlined through the utilization of Cloud Functions.

Business Impact

The migration to GCP was a success, resulting in enhanced performance, easier maintenance, and improved data reliability. This achievement was made possible by leveraging reliable cloud native services. Thanks to Kubeflow hosted on GKE, the development time for optimization jobs was significantly reduced. As a result, the final optimization jobs now run on an environment that is both flexible and robust, while also being cost-optimized.

7 Vs of Big Data – what are they and why are they so important?

Get in touch with us

Contact us to see how we can help you.
We’ll get back to you within 4 hours on working days (Mon – Fri, 9am – 5pm).

Dominik Radwański

Service Delivery Partner

Address

Poland
DS Stream sp. z o.o.
Grochowska 306/308
03-840 Warsaw, Poland

United States of America
DS Stream LLC
1209 Orange St,
Wilmington, DE 19801

    Select subject of application


    The Controller of your personal data is DS Stream sp. z o.o. with its registered office in Warsaw (03-840), at ul. Grochowska 306/308. Your personal data will be processed in order to answer the question and archive the form. More information about the processing of your personal data can be found in the Privacy Policy.