Home » Archives for Arnon Rotem Gal Oz

Author Archives: Arnon Rotem Gal Oz

Replacing Docker Desktop with hyperkit + minikube

MacOS is a Unix but it isn’t a Linux so, unfortunately, if/when we need to use linux-y things like docker we need to install a VM just like in the Windows world. That’s of course also true for docker. Like most people I’ve been using Docker Desktop for a lot of years to get my fix for container. It works ...

Read More »

Where is Apache Spark heading?

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images comparing spark usage on their platform on 2013 vs. 2020: While Databricks’ platform is, of course, not the whole spark community, I would wager that they have enough users to represent the overall trend. Incidentally, ...

Read More »

Big data isn’t – well, almost

Back in ancient history (2004) Google’s Jeff Dean & Sanjay Ghemawat presented their innovative idea for dealing with huge data sets – a novel idea called MapReduce Jeff and Sanjay presented that a typical cluster was made of 100s to few 1000s of machines with 2 CPUs and 2-4 GB RAM each. They presented that in the whole of Aug ...

Read More »

Spark, Parquet and S3 – It’s complicated

(A version of this post was originally posted in AppsFlyer’s blog. Also special thanks to Morri Feldman and Michael Spector from AppsFlyer data team that did most of the work solving the problems discussed in this article) TL;DR; The combination of Spark, Parquet and S3 (& Mesos) is a powerful, flexible and cost effective analytics platform (and, incidentally, an alternative ...

Read More »

Hadoop and the OpenDataPlatform

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site. This move has a lot of potential and a few possible downsides. ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this ...

Read More »

Is there a future for Map/Reduce?

Google’s Jeffrey Dean and Sanjay Ghemawat filed the patent request and published the map/reduce paper  10 year ago (2004). According to WikiPedia Doug Cutting and Mike Cafarella created Hadoop, with its own implementation of Map/Reduce,  one year later at Yahoo – both these implementations were done for the same purpose – batch indexing of the web. Back than, the web began its “web 2.0″ transition, ...

Read More »

Services, Microservices, Nanoservices – oh my!

Apparently there’s this new distributed architecture thing called microservices out and about – so last week I went ahead and read Martin Fowler’s & James Lewis’s extensive article on the subject . and my reaction to this was basically:       I guess it is easier to use a new name (Microservices) rather than say that this is what SOA ...

Read More »

ReSQL?

The NoSQL moniker that was coined circa 2009 marked a move from the “traditional” relational model. There were quite a few non-relational databases around prior to 2009, but in the last few years we’ve seen an explosion of new offerings (you can see,for example, the “NoSQL landscape” in a previous post I made). Generally speaking, and everything here is a wild ...

Read More »

Fallacies of massively distributed computing

In the last few years, we see the advent of highly distributed systems. Systems that have clusters with lots of servers are no longer the sole realm of the googles’ and facebooks’ of the world and we begin to see multi-node and big data systems in enterprises. e.g. I don’t think a company such as Nice (the company I work ...

Read More »