Home » Tag Archives: Big Data (page 5)

Tag Archives: Big Data

Can MapReduce solve planning problems?

software-development-2-logo

To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit hardware memory limits at a few thousand variables and few million constraint matches. One way their users typically work around such hardware limits, is ...

Read More »

Big data: when single node is better than clustered

software-development-2-logo

There’s a lot of hype about “big data” and a general trend to try to apply Hadoop to almost every problem. However, sometimes it turns out that you can get much better results by writing an old-fashioned, but optimised, single-node version of your algorithm. The specific case I’m writing about is generating recommendations (what items user may like) basing on ...

Read More »

Big Data: What about Security?

apache-hadoop-logo

From the first time Hadoop appeared it had a security problem. Apache Knox and Cloudera Manager have been solutions for providing authentication and authorization for basic database management functions. Also, the underlying Hadoop Filesystem now incorporates Unix-like permissions. But the issue has not been solved, so usually the pattern followed is to “plunk the S-word after the name of a ...

Read More »

NoSQL is not just about BigData

nosqlunit-logo

There is so much debate on the SQL vs NoSQL subject, and probably this is our natural way of understanding and learning what’s the best way of storing data. After publishing the small experiment on MongoDB aggregating framework, I was challenged by the JOOQ team to match my results against Oracle. Matching MongoDB and Oracle is simply honoring Mongo, as ...

Read More »

Big Data the ‘reactive’ way

software-development-2-logo

A metatrend going on in the IT industry is a shift from query-based, batch oriented systems to (soft) realtime updated systems. While this is associated with financial trading only, there are many other examples such as “Just-In-Time”-logistic systems, flight companies doing realtime pricing of passenger seats based on demand and load, C2C auction system like EBay, real time traffic control ...

Read More »

Drones and Big Data

software-development-logo

Two weeks ago, I had a conversation with some colleagues where I was postulating a future bull market for drones, as I envisioned a number of commercial applications (food service, surveillance, etc). Coincidentally, this topic has gained major momentum since Amazon’s disclosure of a drone R&D project for goods delivery on 60 Minutes this week. Suddenly, everyone has a drone ...

Read More »

Creating an on-line recommender system with Apache Mahout

apache-mahout-logo

Recently we’ve been implementing a recommender system for Yap.TV: you can see it in action after installing the app and going to the “Just for you” tab. We’re using Apache Mahout as the base for doing recommendations. Mahout is a “scalable machine learning library” and contains both local and distributed implementations of user- and item- based recommenders using collaborative filtering ...

Read More »

Unit testing a Java Hadoop job

apache-mrunit-logo

In my previous post I showed how to setup a complete Maven based project to create a Hadoop job in Java. Of course it wasn’t complete because it is missing the unit test part . In this post I show how to add MapReduce unit tests to the project I started previously. For the unit test I make use of ...

Read More »

Broken Glass : Diagnosing Production Cassandra Issues

apache-cassandra-logo

I just past my second year anniversary at Health Market Science (HMS), and we’ve been working with Cassandra for almost the entirety of my career here.   In that time, we have had remarkably few problems with it.  Like few other technologies I’ve worked with, Cassandra “just works”. But, as with *every* technology I’ve ever worked with, you eventually have ...

Read More »

ReSQL?

nosqlunit-logo

The NoSQL moniker that was coined circa 2009 marked a move from the “traditional” relational model. There were quite a few non-relational databases around prior to 2009, but in the last few years we’ve seen an explosion of new offerings (you can see,for example, the “NoSQL landscape” in a previous post I made). Generally speaking, and everything here is a wild ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Advanced Java Guide
  • Java Design Patterns
  • JMeter Tutorial
  • Java 8 Features Tutorial
  • JUnit Tutorial
  • JSF Programming Cookbook
  • Java Concurrency Essentials