Contact us

Learnings from Developing Big Data Applications

27.12.2016| Article

New Year 2017 is coming soon, and it already sounds like the future is here. Artifical intelligence, self-driving cars, virtual reality and everything in the cloud cheaper and faster than ever before. Things are developing constantly and often you notice that maybe you must learn to think differently, again. 

During the last two years, while developing fast data analytics software, the most important learning area for me has been the real adoption of containers and cloud services like Amazon, Google Cloud, and Azure. Just a few years ago, I was thinking more about servers: how to develop something to virtual and physical servers. Now I think more about resources in the cloud, data streams here and there, services working inside containers and scaling out when needed, or saving resources when you don't need those. 

With test automation, smaller microservices and functional, stateless programming it is easier to manage complex stream processing applications. Programming with functional languages has also required me to relearn some computer science theories and forced a paradigm shift in the way how I think software development should be made. 

But in the end, the most important thing is always to understand customer needs, model the data properly and choose right tools for the job. Very much can be done with just traditional relational databases, which have advanced greatly in the last few years. For example, PostgreSQL with native JSON types, parallel queries, asynchronous features and replication is totally different than you thought just 3 years ago. For advanced indexing and data exploration, just plug in Elasticsearch with Kibana. But when you have real need for highly available real-time big data analytics, it's hard to find better software stack than Apache Kafka, Spark, and Cassandra.

Collecting data in real time, globally from tens of thousands of sources, and then validating and distributing data streams sounds quite hard. But with Kafka you can make those problems easier, scalable and resilient. For fast analytics, either real time streams or batches, Apache Spark offers lots of machine learning tools out of the box. And for storing huge amounts of time-series data, Apache Cassandra outperforms all the alternatives. For a technical geek like me, it has been a real pleasure to see things working well in production on Quva® Flow platform. 

See more blogposts

Tommi Lehtinen,

Senior Sofware Engineer, 

Quva Oy


Quva Oy

Business ID: 2348506-3

Address: Sumeliuksenkatu 18 B
33100 Tampere, Finland

Electronic invoice address: 003723485063


Olli Pasanen
+358 50 5279 952 

Other contacts:

Order demo and other inquiries


Subscribe to newsletter