Open Positions
Tech Blog
Case Study
News

Data processing from multiple data sources with an extremely complex data model in order to provide detailed historical information

Problem

Our client is building a new real-estate evaluation system where tax will be calculated based on the property value. Property value is calculated by taking into consideration many factors other than just the property sale value: surrounding property sales prices, distance to the school, park or lake, other amenities and attractions. The new system is designed so that it can provide historical data for all the properties in the system and the operator can always check what the property value looked like at a given point in time and provide information on how tax was calculated for legal requirements. This was a hard legal requirement since the data will be required in every legal dispute or by court order.

Solution

The biggest challenge was creating the full snapshot of every single property in Denmark for every property sale event in the last 20 years. After a couple of iterations and reworking the processing code, we decided to leverage Spark for its distributed infrastructure but moved all the processing functionality into a custom code being deployed to and run by workers utilizing the hardware a lot more efficiently.

Results

This optimized approach significantly reduced code complexity and Spark lineage, leading to a major improvement in processing speed. The final solution is capable of generating historical data snapshots on any change and drastically shortens processing time.

Key Achievements:

Reduced processing time from 96 hours to under 3 hours.
Simplified codebase, minimizing Spark lineage overhead.
Enabled fast and scalable generation of historical property snapshots.
Facilitated implementation of clustering algorithms with variable parameters for property tax calculations.
Enhanced regulatory compliance by ensuring precise, traceable historical records.

Business Impact: By delivering rapid, accurate, and legally compliant historical property data, the solution enhanced the Danish Tax Authority’s capability to efficiently handle regulatory obligations and judicial challenges. This not only reduced operational risk but also increased transparency and public trust in governmental processes.

Smart Tip

In GovTech, optimizing data processing through tailored coding within distributed systems like Spark can dramatically enhance regulatory compliance capabilities and responsiveness to legal and operational demands.

Smart Fact

Optimizing distributed data processing reduced computation time by over 96%, significantly enhancing regulatory agility and compliance accuracy.

About the Client

The client is a governmental organization responsible for accurate taxation based on property evaluations. The organization requires precise, transparent, and legally compliant historical data management for real estate assessments to address regulatory standards and legal disputes effectively.

Technology Used

Apache Cassandra
Apache Spark
SQL
Scala
Amazon EMR
Apache Solr

Table of Contents

Data processing from multiple data sources with an extremely complex data model in order to provide detailed historical information

Problem

Solution

Results

Smart Tip

Smart Fact

About the Client

Technology Used

Maybe you would like to read this

2 weeks from Product Design Workshop to a Maritime VDR Tool

Custom SaaS Platform for Digital Content Management and Site Building

Increasing Sales with Frequently Bought Together Recommender