An EU enterprise, construction and industrial company is committed to sustainability and environmental responsibility, with a history of keeping urban areas clean for almost a century. Therefore, the client was looking for a stable data infrastructure solution that they could scale up and later on add other partner companies too. Having multiple data sources in their business led to a need for a unique storage solution. In addition to that, the client wasn’t utilizing the data to its full potential, despite the fact that their end users were requesting various insights that they might offer if the data was kept and arranged in a better manner.
The challenge was to identify the requirements and to adjust the pipeline in order to get data from different sources. Moreover, there was a need to ensure that the solution could be easily scaled up to accommodate future data needs.
Since the client wanted to make an impact by reducing carbon emissions, decreasing fine dust levels, and employing alternative fuels, with the aim to provide responsible solutions for municipal street cleaning challenges, SmartCat created a Data Lakehouse solution in order to enable advanced analytics and machine learning for the client, as well as an API implementation that would enable easier access to data and building reports.
The team worked closely with the client in order to identify and adjust the pipeline to ensure that data could be easily accessed and used for advanced analytics and machine learning. We delivered value through reports with insights and forecasts while enabling the client to smoothly read and understand data and how to use it in an efficient way.
SmartCat built the necessary infrastructure to deploy an internal Data Lakehouse solution – AIDA to the Client’s AWS account. This allowed for a stable and scalable solution that could accommodate future data needs. The end result was a successful Data Lakehouse solution that met all of the Client’s requirements and enabled them to achieve their goals. Moreover, having structured data allowed different visualization options through dashboards that brought more value from the data.
- AWS cloud storage: used for storing unstructured, semi-structured and structured data.
- Databricks: combines data warehouse and data lake advantages into single architecture.
- Apache Spark: an open-source distributed computing system for big data processing.
- Terraform: a tool for building, changing, and versioning infrastructure safely and efficiently.
- Python: a programming language used for data analysis and data science.
- Power BI: a tool for data visualization used for generating reports.
The client’s company operates in more than 160 countries.