Home » Big Data

Tag Archives: Big Data

BigQuery Storage API: Avro

java-interview-questions-answers

Previously we had an introduction on the BigQuery Storage API and we proceeded reading data using the Arrow format. In this tutorial we shall read Data using the Avro format. What applied on the previous tutorial applies here too. We shall create a BigQuery Storage Client, create a ReadSession using the Avro format and iterate the data on each stream. ...

Read More »

BigQuery Storage API: Arrow

java-interview-questions-answers

Previously we had an introduction on the BigQuery Storage API. As explained the storage API of BigQuery supports two formats. For this tutorial we will choose the Arrow Format. First let’s import the dependencies. The BigQuery storage API binary does not come with a library to parse Arrow. This way the consumer receives the binaries in an Arrow format, and ...

Read More »

BigQuery Storage API: Get Started and Comparisons

java-interview-questions-answers

BigQuery provides us with the Storage API for fast access using an rpc-based protocal. With this option you can receive the data in a binary serialized format. The alternative ways to retrieve BigQuery Data is through the Rest API and a Bulk export. Bulk Data export is a good solution to export big result sets however you are limited to ...

Read More »

Apache Arrow on the JVM: Streaming Writes

Previously we went to create some schemas on Arrow.  On this blog we will have a look on writing through streaming API. Based on the previous post’s Schema we shall create a DTO for our classes. 01 02 03 04 05 06 07 08 09 10 11 12 13 package com.gkatzioura.arrow;   import lombok.Builder; import lombok.Data;   @Data @Builder public ...

Read More »

Apache Arrow on the JVM: Get Started and Schemas

Arrow is memory format for flat and hierarchical data. It is a popular format used by various big data tools, among them BigQuery. One of the benefits that Arrow brings is that the format of the data has the same byte representation on the languages supported. So apart from the benefits of a columnar memory format there are also the ...

Read More »

Where is Apache Spark heading?

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images comparing spark usage on their platform on 2013 vs. 2020: While Databricks’ platform is, of course, not the whole spark community, I would wager that they have enough users to represent the overall trend. Incidentally, ...

Read More »

Popular frameworks for big data processing in Java

java-interview-questions-answers

The big data challenge The concept of big data is understood differently in the variety of domains where companies face the need to deal with increasing volumes of data. In most of these scenarios the system under consideration needs to be designed in such a way so that it is capable of processing that data without sacrificing throughput as data ...

Read More »

Big data isn’t – well, almost

Back in ancient history (2004) Google’s Jeff Dean & Sanjay Ghemawat presented their innovative idea for dealing with huge data sets – a novel idea called MapReduce Jeff and Sanjay presented that a typical cluster was made of 100s to few 1000s of machines with 2 CPUs and 2-4 GB RAM each. They presented that in the whole of Aug ...

Read More »