Data Lakehouse System Implementation for a Video Streaming Platform

Introduction

The client is a free video streaming multimedia platform that aims to provide inspirational and award-winning content, interact with hundreds of nonprofits, and take action instantly to improve our future on this planet. However, the client faced a challenge where they lacked an overview of users’ behavior and content performance. Additionally, they needed partner reports and campaign lifecycle monitoring. The client wanted to become a data-driven company and make data-based decisions. The SmartCat team implemented a Data Lakehouse System to address these challenges.

Problem

The client lacked visibility into users’ behavior and content performance, which led to difficulties in making data-driven decisions. They needed a solution to process and analyze data from multiple sources, to monitor partner reports and campaigns lifecycle, and to become a data-driven company.

Solution

The SmartCat team addressed the client’s problem by implementing a Data Lakehouse System. Data from multiple sources is ingested, processed, stored, analyzed and visualized in the system, which is also a base for advanced analytics and Machine Learning. The system allows the client to track all important metrics and KPIs and make data-driven decisions. All important data is centralized in the system, which enables the client to monitor partner reports and the campaign lifecycle.

SmartCat used the following technologies to implement the solution:

  • Google Cloud Platform: a cloud computing platform for building, deploying, and scaling applications.
  • Apache Spark: an open-source distributed computing system for big data processing.
  • Apache Trino: a distributed SQL query engine for big data analytics.
  • Apache Airflow: a platform to programmatically author, schedule, and monitor workflows.
  • Delta Lake: an open-source storage layer that brings reliability to data lakes.
  • Terraform: a tool for building, changing, and versioning infrastructure safely and efficiently.
  • Python: a programming language used for data analysis and data science.
  • Looker: a business intelligence and data visualization tool.

Results

After implementing the Data Lakehouse System, the client could track all important metrics and KPIs and make data-driven decisions. The client became a data-driven company and was able to make decisions based on data. The key metrics used to measure the success of the project included:

  • The number of data sources integrated into the system.
  • The number of users accessing the system.
  • The number of reports generated from the system.
  • The time required to generate reports.

Smart Tip

In implementing a Data Lakehouse System, it’s crucial to have synchronization and agreement between data producers and consumers. This helps to ensure that data is accurate and consistent across all sources.

Smart Fact

A Data Lakehouse System is a new approach to big data architecture that combines the best features of data lakes and data warehouses. It enables organizations to process and analyze data from multiple sources, which can help them make better data-driven decisions.

About the Client

Client is a free video streaming service that offers a diverse selection of inspirational and award-winning content with a focus on social and environmental issues.

Table of Content

Back to Top