Home » Author Archives: Arnon Rotem Gal Oz

Author Archives: Arnon Rotem Gal Oz

Spark, Parquet and S3 – It’s complicated


(A version of this post was originally posted in AppsFlyer’s blog. Also special thanks to Morri Feldman and Michael Spector from AppsFlyer data team that did most of the work solving the problems discussed in this article) TL;DR; The combination of Spark, Parquet and S3 (& Mesos) is a powerful, flexible and cost effective analytics platform (and, incidentally, an alternative ...

Read More »

Hadoop and the OpenDataPlatform


Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site. This move has a lot of potential and a few possible downsides. ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this ...

Read More »

Is there a future for Map/Reduce?


Google’s Jeffrey Dean and Sanjay Ghemawat filed the patent request and published the map/reduce paper  10 year ago (2004). According to WikiPedia Doug Cutting and Mike Cafarella created Hadoop, with its own implementation of Map/Reduce,  one year later at Yahoo – both these implementations were done for the same purpose – batch indexing of the web. Back than, the web began its “web 2.0″ transition, ...

Read More »

Services, Microservices, Nanoservices – oh my!


Apparently there’s this new distributed architecture thing called microservices out and about – so last week I went ahead and read Martin Fowler’s & James Lewis’s extensive article on the subject . and my reaction to this was basically:       I guess it is easier to use a new name (Microservices) rather than say that this is what SOA ...

Read More »



The NoSQL moniker that was coined circa 2009 marked a move from the “traditional” relational model. There were quite a few non-relational databases around prior to 2009, but in the last few years we’ve seen an explosion of new offerings (you can see,for example, the “NoSQL landscape” in a previous post I made). Generally speaking, and everything here is a wild ...

Read More »

Fallacies of massively distributed computing


In the last few years, we see the advent of highly distributed systems. Systems that have clusters with lots of servers are no longer the sole realm of the googles’ and facebooks’ of the world and we begin to see multi-node and big data systems in enterprises. e.g. I don’t think a company such as Nice (the company I work ...

Read More »
Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.