We wanted to share why we want to do what we do and why we think this is the future of IT.
The number of internet users is constantly growing (3.17 billion up from 2.94 billion last year) and what strikes us even more is the fact that 90% of all data in the world today has been created in the last 2 years only.
IoT made the use of data open to everyone, and now there is every little thing built with sensor and sending information.
Social network boom made every single application or business stakeholder eager to go viral, and the volume of shared data increases with viral marketing.
However, Big Data is more than just data itself.
It is a combination of factors that require a new way of collecting, analyzing, visualizing and sharing data.
These factors are forcing software companies to re-think the ways in which they manage and offer their data, from new insights to completely new revenue streams.
Bringing right information at the right time to business decision makers is what counts.
There is a nice pyramid which explains the process of collecting data, analysing it and pulling some meaningful facts out of data sets which improve business.
During the first stage, business owners are aware that they have big data sets which they are collecting, but do not have a clear idea what to do with that information. Here, the challenge is on engineers to build a good architecture which can facilitate 3Vs of big data, velocity, variety and volume.
When dealing with big data systems, complexity arises usually from system distribution (many application servers, database servers, microservices etc.).
After this phase, when systems keep collecting data for some time, data scientists usually kick in to analyze those data sets and figure out the repeated patterns and are important to business owners. The output of this phase is either a report, idea or algorithm which can provide answers to some questions about business which owners are interested in (what products are bought most often on which day, when to do promotions, how to keep orders from distribution centers at a minimum level but hit customer needs).
At this point data becomes knowledge.
In the last stage after business owners obtain their answers, prediction needs to be incorporated to existing software.
During that phase, both data scientists and engineers work together to build better software which can alert, learn from patterns, improve processes, visualize and give even more insight. This phase is transforming knowledge into wisdom.
From technical perspective, industry is changing a lot. NoSQL databases are not premature anymore, there are systems with multiple storages where relational databases are used for relational stuff in combination with NoSQL (i.e. cassandra for time series data, redis for session, mongo for document storage).
Microservices provide an answer for the huge complexity of monolithic systems but incur large costs of maintenance, monitoring, deployment etc.. Devops have become a must, and every developer must know operational stuff.
Messaging systems have become popular as a way of communication between many small services.
Everything is asynchronous and non-blocking. Because there is a huge volume of data which needs processing, the speed of processing and delay when getting results has become important.
The Hadoop way of batch processing is not enough anymore, near real time processing is in demand.
Providing both fast and slow but more accurate processing (as lambda architecture is proposing) is something to strive towards.
Apache Spark has become a huge player in the field of batch processing because it provides both batch and stream processing.
We have realized that industry is changing and we didn’t want to be just a part of that change but to actively influence the changes through consultation service and our ideas.
In order to do that, we must constantly challenge ourselves and that is in the DNA of our company.
We like exploring new technologies, we do our homework, research when we have a problem and choose the best tool for the job and learn the good parts from technologies we are not familiar with so we can use them in technologies we are familiar with.
Open source is another thing in the DNA of our company.
For a long time we have been searching for answers on StackOverflow, waiting for the jira issue to be solved by someone in order to use something the way we wanted, looked at someone else’s GitHub example to find the solution we need.
We want to give back to the community, to provide answers, patch bugs and issues in technologies in which we believe, provide examples and ideas for some solutions we have come to, fire up discussions based on those ideas and solutions so we can provide even better solutions.
Written by: Smart Cat
June 9, 2021