Featured FREE Whitepapers

What's New Here?


Clojure at Scale: Why Python Just Wasn’t Enough for AppsFlyer

A first hand experience and an introduction to Clojure at scale Still considered a bit of an esoteric language, Clojure is one of the JVM languages that get us excited. There aren’t many stories around about why companies start using Clojure or how they use it to build systems at Scale. We were lucky to hear an excellent example of using Clojure during a demo of Takipi’s error viewer for Clojure at AppsFlyer’s office where we learned about the architecture that powers their mobile app measuring and tracking platform. In this post we’re sharing with you the experience of our new friends from AppsFlyer, Adi Shacham-Shavit, who manages the R&D department, and Ron Klein, a senior backend developer. First thing’s first, a huge thanks to Ron and Adi who treated us to behind the scenes of Clojure at AppsFlyer! If you have any questions for them and interested to learn more, please feel free to use the comments section below. Here’s their story: Let’s get started with some numbers2 Billion events per day The traffic was doubled in the last 3 months Hundreds of instances The company grew from 6 to 50 people over the past year 10 Clojure developers Technologies – Redis, Kafka, Couchbase, CouchDB, Neo4j, ElasticSearch, RabbitMQ, Consul, Docker, Mesos, MongoDB, Riemann, Hadoop, Secor, Cascalog, AWSThe Pains of Scaling Up At AppsFlyer we actually started our code base in Python. Two years later this wasn’t enough to handle the growing number of users and requests. We started to encounter issues like one of the critical Python processes taking too long to digest the incoming messages, caused mainly by string manipulations and Python’s own memory management system. Even partitioning the messages amongst several processes and servers could not overcome this. This eventually killed the process and caused data loss – the first ‘Python victim’ was the reporting service. Taking the functional approach As these kinds of difficulties accumulated, we had to choose between 2 options:Rewrite some of our services in C (great performance, but less fun to code) and wrap it with Python interop code (easy to do) Rewrite some of our services in a programming language more suitable for data processingIt is important to mention at this point, that we took the asynchronous event-driven approach to handle incoming messages, which allows the system to easily scale as traffic grows. We’ve been toying around with the idea of introducing Functional Programming into the company for some time before the rogue reporting service started failing. It’s a good fit with our way of thinking and architecture, so it was logical to make the change – especially since the reporting service failures encouraged us to make the call already. After deciding to go with it, came the second hurdle, which language should we choose? Scala Vs. OCaml Vs. Haskell Vs. Clojure Scala was out of the picture because it’s a hybrid of Object Oriented Programming & Functional Programming and leans more towards OOP. OCaml was discarded because of the relatively small community and the Global Interpreter Lock (GIL) that allows only one thread to execute at a time – even on multicore machines (which was also a problem for us in Python). Monads in Haskell made us cringe in fear, so we were left with Clojure. But that’s not the only reason we chose this path, Clojure won because of 2 major issues. First, it runs on the JVM and second, it’s a functional language with easy access to a mutable state if you need it, even in a heavily concurrent environment. Clojure is a dialect of the Lisp programming language by Rich Hickey. It’s a general-purpose programming language with an emphasis on functional programming. Like other Lisps, Clojure treats code as data and has a macro system. At its center are immutable values and explicit progression-of-time constructs that are intended to facilitate the development of more robust programs, particularly multithreaded ones. Micro-services architecture The server side of AppsFlyer’s system is designed to continuously receive messages (events), process them, store them, and sometimes invoke additional web requests to external endpoints based on them. This “stream” of events made us take some architectural decisions that helped us scale as needed. One of the main decisions was to think of the system as a collection of services, intercommunicating mainly by message queues (formerly via Redis’ pub/sub and currently via Kafka). This made our services independent and loosely coupled. The flow of events Let’s take a simplified example: the event of “Application Installed” is published to the entire system through a Kafka topic (queue) named “Installs”. Our Reports service listens to that topic so that it could store this piece of data for the relevant reports. In addition, our Postbacks service listens to that very same topic, and decides, upon its own rules, whether or not to invoke a web request and to which endpoint. Since the entire system is based on micro-services that consume messages from (and publish messages to) a common pipeline, it’s easy to rewrite them in any programming language, assuming that it has a decent client library to the common pipeline. Kafka is used as the main backbone, with RabbitMQ for the real-time channel. Concurrency in Clojure Clojure provides its own approach to concurrency and it might take some time to adjust to it. However, once the mindset is there, it’s much easier to achieve tasks in Clojure than when taking the “conventional” approach. In most cases, writing code that deals with concurrency in Clojure doesn’t include lock statements at all. This is a huge advantage: coding is more focused on the logic itself, rather than the plumbing around locks. Clojure also has mechanisms that guard data from being corrupted. This, of course, comes with a trade-off: there’s a very low probability that the shared resource held by thread A does not contain all changes made earlier by thread B. Generally speaking, Clojure provides a nice mechanism of immutable data structures, ensuring data integrity and somewhat sacrificing consistency. Clojure has access to almost everything the JVM can provide so you can still use traditional locks. However, if the system you build is based on statistics, and you can tolerate minor data loss, such as the analytics system we have at AppsFlyer, then Clojure is way more than enough. A real life example Say we have a service that holds its state in a key-value data structure, a map. The map is initially defined in the module level as empty (this example is simplified for clarity, so code is not written to be fully reusable): (def my-map {}) ;; Don't panic, you'll get used to the braces... The statement above creates an empty map, accessible by the name my-map. The first thing that strikes most newcomers to Clojure programming, after the braces syntax, is the freedom of naming variables. Clojure allows some interesting characters for variable names such as “-“, “?”, “!” etc. Think about the simplicity behind a function named contains? used to check whether a collection contains an item. The basic code to add a key “k” with a value “v” to a given map is: (assoc some-map "k" "v") This code does not update the original map. Clojure keeps its data structures as immutable as possible. Instead, the statement above returns a new copy of the original map, with the new key and the new value. Behind the scenes, Clojure doesn’t fully duplicate the entire map. Instead, it keeps revisions with pointers to previous revisions, along with the differences. Smart, eh?! Back to my-map. We’ll have to modify our statement so that it’s ready for concurrency: (def my-map (atom {})) That little atom is almost all we need to go the concurrent way. So now, when a running thread “updates” my-map (read: creates a new revision of it) so that it also contains the key “my-key” with the value 42, the code looks like this: (swap! my-map assoc "my-key" 42) This statement changes my-map so that it now holds a new version of itself. So far, we have a thread “updating” my-map. Reading a map in Clojure and continuing with the previous example, looks like this: (get some-map "k") The statement above should return the value “v.” When working with Clojure’s atom, the following code can be executed when a thread reads a value from my-map: (get @my-map "my-key") The only difference is that little “@” before my-map. What it says is something like, “Hey Clojure, give me the latest revision you have for my-map.” As stated above, the latest, most updated revision might not contain all the changes that have been made to our map so far, but the returned value is always safe in terms of data integrity (e.g. not corrupted). Conclusion Clojure has its own mindset – immutable objects, Lispy syntax, etc. The major advantage is in its approach to concurrency, focusing on an application’s logic and reducing the overhead of locking mechanisms. This post covers just a tiny bit of Clojure’s way of concurrency. We experienced a significant performance boost when we moved AppsFlyer to Clojure. In addition, using functional programming allows us to have a really small code base with only a few hundred lines of code for each service. The effects of working in Clojure dramatically speed up the development time and allow us to create a new service in days.Reference: Clojure at Scale: Why Python Just Wasn’t Enough for AppsFlyer from our JCG partner Alex Zhitnitsky at the Takipi blog....

Do you really want to speed up your Maven compile/packaging? Then takari lifecycle plugin is the answer.

Like many of you out there, I am working with a multi module Maven project. It is not a huge one comparing to many systems out there, it has 15 modules, with 3 different ear deployments, lots of parametrization with property files and around 100K lines of Java code. During peak development times, the code is heavily refactored, due it’s legacy origins and so the need for continuous compiling/packaging and deployment, for every developer. Despite the steep learning curve all these years I have embraced Maven and it’s philosophy. I am not saying that is perfect, but I truly believe that is a good tool, still relevant, especially while your project and team grows as you grow your project. (This post is not about Maven evangelism though). So, one of the problems we had on our team is that, despite switching the right flags, breaking and packaging our code into modules, using profiles and all the ‘tools’ maven provides, our build and packaging time was  slowly starting to increase, hitting the 1 minute threshold after a complete clean. Our main compiler was Sun/Oracle Javac and the time was monitored through packaging from the command line and not  through the IDE, where you can see different times depending on the ‘Maven Integration’ and internal compiler invoked by each tool. [My reference machine is my good old MacBookPro 2009, Core 2 Duo 2.5, with an Vertex 3 SSD (trim enabled)] Recently while I was browsing Jason Van Zyl’s (he father of Maven) twitter account I discovered the the takari lifecycle plugin. Jason and his team are creating tools and plugins for the Maven ecosystem, that I hope to  bring  the much anticipated evolution  on the Maven ecosystem that the community of Maven seeks for a many years now. To cut a long story short, the takari lifecycle plugin, is an alternative Maven lifecycle implementation, that covers 5 different plugins into one. Once you activate it, it will take over, and invoke it’s own implementation of the following 5:resources plugin compiler plugin jar plugin install plugin deploy pluginYou can read about it here. The great thing at least in my case was the compiler plugin, that internally implements a incremental compilation strategy based on a mechanism that can detect changes on source files and resources!! In order to understand the difference, when using the takari compiler plugin on your maven build compared with the classic compiler plugin and javac (which most probably many of you use), I am going to share a table from this blog post (explaining incremental compilation).It is far more obvious that if you choose to invoke JDT instead of Javac, the results are going to be even better. Currently we stick with Javac, but the above diagram made my change the default compiler on my IntelliJ IDE, especially when I do refactoring and changes all around, JDT was anyway far better on incremental compilation comparing to Javac. How to add takari to my build? Is it safe Well in my case (and I guess for many of you out there), I just followed the proposed way here. I activated the plugin in my parent pom and then changed the packaging type for all my jar modules, into ‘takari-jar’. takari-jar This is not, eventually the change is so easy that you can revert it back. The day that I pushed the takari lifecycle change on our git repo, after half an hour I started hearing ‘wowss’ and ‘yeees’ from my team members. Repated packaging on changes is very very cheap, changes on resources files and properties ensure that we will get a fresh package when needed. Our repacking times dropped to more than 50%-60%. If you happen to have the same issues with your Maven build, I trully encourage you to try takari for a day – it will same you and your team some serious time. I also want to note, that takari is free and despite the fact that is evolved and updated by the takari team for an unnamed ‘big’ client, the team is free to give it away for free and share it with the community. So thank you very much for this!!!The plugin is can be found on maven central. The takari team is doing a weekly google hangout, information can be found here, I want to apologize that I have not managed yet to attend one, maybe soon enough. So go Maven! go Takari!Reference: Do you really want to speed up your Maven compile/packaging? Then takari lifecycle plugin is the answer. from our JCG partner Paris Apostolopoulos at the Papo’s log blog....

Agile Testing Days 2014–The #AgileTD Post

Second time around is even better. And much more fun. You know people, who know you. You meet new people, who want to share and talk and ask. Oh and there’s beer, more beer. Thank you for all the people I met and re-met. You are awesome. You make ATD what it is. It is now summary time. Let’s recap, shall we? Keynotes this year were very good. David Evans gave an excellent talk about the “Pillars of Testing”, which continued to explain how agile and testing fit together. Antony Marcano’s “Don’t put me in a box” was an excellent show of how Antony made the role of the tester change by leading these changes (including glimpses of eXtreme Programming). Alan Richardson’s continued these lines, and bashing the term QA forever. And it was full of Transformers, Dr. Seuss, Pinky and the Brain, and Princess Bride, so what’s not to love? Those were my favorite keynotes, and that latter two aligned with my quest this time around (not the one for beer, the other quest): What is really testing in an agile context? This is where two workshops helped. The first one by Huib Schoots about testing strategy. How does a tester approach an application? How is planning done, and what needs to be taken into account? Coming from development and developer testing background used to lead me to ad hoc testing. In system testing, a different point of view is needed, and I learned a lot from this workshop. That’s him by the way (that’s the normal situation):Alan Richardson, Tony Bruce and Steve Green’s “Black ops testing” was also a step in the same direction – this was an actual testing dojo, where we paired on testing a web application. We could concentrate on whatever skill we wanted to improve, in a safe environment. My partner, Marine, focused on notetaking. I tried to follow in her steps, trying to learn how a tester’s mind works. I learned quite a few things, as well as tricks for creating data files for text boxes cutting off. This may sound stupid, but the ideas that you’ll need to validate text boxes for a while, and will want to speed this up, is a true quality of a professional tester. It’s a tool in her belt, just like developers carry (or may I say, should carry) TDD. The final workshop I went too was Bryan Beecham and Mike Bowler’s “Lego TDD and Refactoring”. This was more developer oriented, although no code was written, and no Lego was hurt in the making of the workshop. If you ever get a chance to experience this, jump on it. This are Mike and Bryan:There were quite a few genius ideas there, including explaining technical debt with Lego, and as you can see in the picture, extracting methods. Refactoring in Lego, who would have thunk? Before there are RED blocks everywhere:After extracting the RED method:One more thing that resonated with me that the workshop was full. My “TDD Patterns” talk also quite filled the room. While testing is still in the title, I can see the merging of testers and developers skills continue, as slowly the distinction disappears. What else? George Dinwiddie’s excellent talk about the finer points in tests. He got me this TDD merit badge (plus, he has a cool wizard hat):Marcus Ahnve talks about continuous deployment, dev-ops and wore an awesome T shirt:Lean Coffee, which I helped facilitate  with Janet Gregory and Lisa Crispin in a crowded room. Shachar Schiff’s and Bob Marshal’s keynotes were also thought provoking, and Joe Justice’s almost sci-fi show (I called it the “I didn’t know scrum could do that” keynote). Lisa and Janet dressed up as Star Trek characters from a parallel purple universe (you had to be there):An over-joyous #Table2 at the speakers’ dinner (which Oana Juncu can tell what actually happened). Tom Roden and Ben Williams’ version of “A Christmas Carol” (a cucumber was involved). There was a car that was built in less than a day. A Brazilian party (which was in hindsight was quite a clue about who was going to win the World Test Cup). Even I wore my costume:  Matt Heusser accepting the MIATPP award. The Agile Game night was full of learning and puzzle solving (I helped run that too with Pete George and made people feel comfortable with balloons, them picking them up):And everyone singing along with Vagif Abilov an ABBA song as the curtain comes down. Above all that – there are the people. Smart, nice, eager to learn and consume beer. Great intellectual fun. How much fun? On the short, final night, I went out to Berlin with a couple of friends, and after eating a midnight Kebab, said my final rounds of goodbye to the people at the bar at 1am. As with the ATD spirit, when I came down to take my taxi to the airport, some of them were still there at 4:30am. That’s the ATD spirit. No wonder it keeps breaking records every year – in number of attendees, quality, and events (Come on, building a car in a day? When did you last see that?) Which brings me to the final point: The organizers. Being currently part of a team organizing a conference (Agile Practitioners 2015, go register, it’s going to be just as awesome), it gave me a new point of view of how professional Uwe, Madeleine and Jose and their incredible team are. Last year’s was excellent, but when you start looking at the details, where the car parts were late, or working with sponsors, or just hanging with the attendees and speakers to make sure everything is fine. Everything just worked. Madeleine told me that the trick to her job is to “make it appear as everything works”. And they have done a marvelous job again. I appreciate it even more now. Good job! Did I mention the people already? Some of them. You can find lots of them on the #AgileTD thread. All of you: YOU ROCK! Bye bye, Agile Testing Days, and thanks for all the beer!Reference: Agile Testing Days 2014–The #AgileTD Post from our JCG partner Gil Zilberfeld at the Geek Out of Water blog....

Spark: Parse CSV file and group by column value

I’ve found myself working with large CSV files quite frequently and realising that my existing toolset didn’t let me explore them quickly I thought I’d spend a bit of time looking at Spark to see if it could help. I’m working with a crime data set released by the City of Chicago: it’s 1GB in size and contains details of 4 million crimes:           $ ls -alh ~/Downloads/Crimes_-_2001_to_present.csv -rw-r--r--@ 1 markneedham staff 1.0G 16 Nov 12:14 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv   $ wc -l ~/Downloads/Crimes_-_2001_to_present.csv 4193441 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv We can get a rough idea of the contents of the file by looking at the first row along with the header: $ head -n 2 ~/Downloads/Crimes_-_2001_to_present.csv ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,ASSAULT,SIMPLE,APARTMENT,false,true,0422,004,7,46,08A,1196652,1852516,2014,01/20/2014 12:40:05 AM,41.75017626412204,-87.55494559131228,"(41.75017626412204, -87.55494559131228)" I wanted to do a count of the ‘Primary Type’ column to see how many of each crime we have. Using just Unix command line tools this is how we’d do that: $ time tail +2 ~/Downloads/Crimes_-_2001_to_present.csv | cut -d, -f6 | sort | uniq -c | sort -rn 859197 THEFT 757530 BATTERY 489528 NARCOTICS 488209 CRIMINAL DAMAGE 257310 BURGLARY 253964 OTHER OFFENSE 247386 ASSAULT 197404 MOTOR VEHICLE THEFT 157706 ROBBERY 137538 DECEPTIVE PRACTICE 124974 CRIMINAL TRESPASS 47245 PROSTITUTION 40361 WEAPONS VIOLATION 31585 PUBLIC PEACE VIOLATION 26524 OFFENSE INVOLVING CHILDREN 14788 CRIM SEXUAL ASSAULT 14283 SEX OFFENSE 10632 GAMBLING 8847 LIQUOR LAW VIOLATION 6443 ARSON 5178 INTERFERE WITH PUBLIC OFFICER 4846 HOMICIDE 3585 KIDNAPPING 3147 INTERFERENCE WITH PUBLIC OFFICER 2471 INTIMIDATION 1985 STALKING 355 OFFENSES INVOLVING CHILDREN 219 OBSCENITY 86 PUBLIC INDECENCY 80 OTHER NARCOTIC VIOLATION 12 RITUALISM 12 NON-CRIMINAL 6 OTHER OFFENSE 2 NON-CRIMINAL (SUBJECT SPECIFIED) 2 NON - CRIMINAL   real 2m37.495s user 3m0.337s sys 0m1.471s This isn’t too bad but it seems like the type of calculation that Spark is made for so I had a look at how I could go about doing that. To start with I created an SBT project with the following build file: name := "playground"   version := "1.0"   scalaVersion := "2.10.4"   libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"   libraryDependencies += "net.sf.opencsv" % "opencsv" % "2.3"   ideaExcludeFolders += ".idea"   ideaExcludeFolders += ".idea_modules" I downloaded Spark and after unpacking it launched the Spark shell: $ pwd /Users/markneedham/projects/spark-play/spark-1.1.0/spark-1.1.0-bin-hadoop1   $ ./bin/spark-shell ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.1.0 /_/   Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51) ... Spark context available as sc.   scala> I first import some classes I’m going to need: scala> import au.com.bytecode.opencsv.CSVParser import au.com.bytecode.opencsv.CSVParser   scala> import org.apache.spark.rdd.RDD import org.apache.spark.rdd.RDD Now, following the quick start example, we’ll create a Resilient Distributed Dataset (RDD) from our Crime CSV file: scala> val crimeFile = "/Users/markneedham/Downloads/Crimes_-_2001_to_present.csv" crimeFile: String = /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv   scala> val crimeData = sc.textFile(crimeFile).cache() 14/11/16 22:31:16 INFO MemoryStore: ensureFreeSpace(32768) called with curMem=0, maxMem=278302556 14/11/16 22:31:16 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 265.4 MB) crimeData: org.apache.spark.rdd.RDD[String] = /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv MappedRDD[1] at textFile at <console>:17 Our next step is to process each line of the file using our CSV Parser. A simple way to do this would be to create a new CSVParser for each line: scala> crimeData.map(line => { val parser = new CSVParser(',') parser.parseLine(line).mkString(",") }).take(5).foreach(println) 14/11/16 22:35:49 INFO SparkContext: Starting job: take at <console>:23 ... 4/11/16 22:35:49 INFO SparkContext: Job finished: take at <console>:23, took 0.013904 s ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,ASSAULT,SIMPLE,APARTMENT,false,true,0422,004,7,46,08A,1196652,1852516,2014,01/20/2014 12:40:05 AM,41.75017626412204,-87.55494559131228,(41.75017626412204, -87.55494559131228) 9460704,HX113741,01/14/2014 04:55:00 AM,091XX S JEFFERY AVE,031A,ROBBERY,ARMED: HANDGUN,SIDEWALK,false,false,0413,004,8,48,03,1191060,1844959,2014,01/18/2014 12:39:56 AM,41.729576153145636,-87.57568059471686,(41.729576153145636, -87.57568059471686) 9460339,HX113740,01/14/2014 04:44:00 AM,040XX W MAYPOLE AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,false,true,1114,011,28,26,14,1149075,1901099,2014,01/16/2014 12:40:00 AM,41.884543798701515,-87.72803579358926,(41.884543798701515, -87.72803579358926) 9461467,HX114463,01/14/2014 04:43:00 AM,059XX S CICERO AVE,0820,THEFT,$500 AND UNDER,PARKING LOT/GARAGE(NON.RESID.),false,false,0813,008,13,64,06,1145661,1865031,2014,01/16/2014 12:40:00 AM,41.785633535413176,-87.74148516669783,(41.785633535413176, -87.74148516669783) That works but it’s a bit wasteful to create a new CSVParser each time so instead let’s just create one for each partition that Spark splits our file up into: scala> crimeData.mapPartitions(lines => { val parser = new CSVParser(',') lines.map(line => { parser.parseLine(line).mkString(",") }) }).take(5).foreach(println) 14/11/16 22:38:44 INFO SparkContext: Starting job: take at <console>:25 ... 14/11/16 22:38:44 INFO SparkContext: Job finished: take at <console>:25, took 0.015216 s ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,ASSAULT,SIMPLE,APARTMENT,false,true,0422,004,7,46,08A,1196652,1852516,2014,01/20/2014 12:40:05 AM,41.75017626412204,-87.55494559131228,(41.75017626412204, -87.55494559131228) 9460704,HX113741,01/14/2014 04:55:00 AM,091XX S JEFFERY AVE,031A,ROBBERY,ARMED: HANDGUN,SIDEWALK,false,false,0413,004,8,48,03,1191060,1844959,2014,01/18/2014 12:39:56 AM,41.729576153145636,-87.57568059471686,(41.729576153145636, -87.57568059471686) 9460339,HX113740,01/14/2014 04:44:00 AM,040XX W MAYPOLE AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,false,true,1114,011,28,26,14,1149075,1901099,2014,01/16/2014 12:40:00 AM,41.884543798701515,-87.72803579358926,(41.884543798701515, -87.72803579358926) 9461467,HX114463,01/14/2014 04:43:00 AM,059XX S CICERO AVE,0820,THEFT,$500 AND UNDER,PARKING LOT/GARAGE(NON.RESID.),false,false,0813,008,13,64,06,1145661,1865031,2014,01/16/2014 12:40:00 AM,41.785633535413176,-87.74148516669783,(41.785633535413176, -87.74148516669783) You’ll notice that we’ve still got the header being printed which isn’t ideal – let’s get rid of it! I expected there to be a ‘drop’ function which would allow me to do that but in fact there isn’t. Instead we can make use of our knowledge that the first partition will contain the first line and strip it out that way: scala> def dropHeader(data: RDD[String]): RDD[String] = { data.mapPartitionsWithIndex((idx, lines) => { if (idx == 0) { lines.drop(1) } lines }) } dropHeader: (data: org.apache.spark.rdd.RDD[String])org.apache.spark.rdd.RDD[String] Now let’s grab the first 5 lines again and print them out: scala> val withoutHeader: RDD[String] = dropHeader(crimeData) withoutHeader: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at mapPartitionsWithIndex at <console>:14   scala> withoutHeader.mapPartitions(lines => { val parser = new CSVParser(',') lines.map(line => { parser.parseLine(line).mkString(",") }) }).take(5).foreach(println) 14/11/16 22:43:27 INFO SparkContext: Starting job: take at <console>:29 ... 14/11/16 22:43:27 INFO SparkContext: Job finished: take at <console>:29, took 0.018557 s 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,ASSAULT,SIMPLE,APARTMENT,false,true,0422,004,7,46,08A,1196652,1852516,2014,01/20/2014 12:40:05 AM,41.75017626412204,-87.55494559131228,(41.75017626412204, -87.55494559131228) 9460704,HX113741,01/14/2014 04:55:00 AM,091XX S JEFFERY AVE,031A,ROBBERY,ARMED: HANDGUN,SIDEWALK,false,false,0413,004,8,48,03,1191060,1844959,2014,01/18/2014 12:39:56 AM,41.729576153145636,-87.57568059471686,(41.729576153145636, -87.57568059471686) 9460339,HX113740,01/14/2014 04:44:00 AM,040XX W MAYPOLE AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,false,true,1114,011,28,26,14,1149075,1901099,2014,01/16/2014 12:40:00 AM,41.884543798701515,-87.72803579358926,(41.884543798701515, -87.72803579358926) 9461467,HX114463,01/14/2014 04:43:00 AM,059XX S CICERO AVE,0820,THEFT,$500 AND UNDER,PARKING LOT/GARAGE(NON.RESID.),false,false,0813,008,13,64,06,1145661,1865031,2014,01/16/2014 12:40:00 AM,41.785633535413176,-87.74148516669783,(41.785633535413176, -87.74148516669783) 9460355,HX113738,01/14/2014 04:21:00 AM,070XX S PEORIA ST,0820,THEFT,$500 AND UNDER,STREET,true,false,0733,007,17,68,06,1171480,1858195,2014,01/16/2014 12:40:00 AM,41.766348042591375,-87.64702037047671,(41.766348042591375, -87.64702037047671) We’re finally in good shape to extract the values from the ‘Primary Type’ column and count how many times each of those appears in our data set: scala> withoutHeader.mapPartitions(lines => { val parser=new CSVParser(',') lines.map(line => { val columns = parser.parseLine(line) Array(columns(5)).mkString(",") }) }).countByValue().toList.sortBy(-_._2).foreach(println) 14/11/16 22:45:20 INFO SparkContext: Starting job: countByValue at <console>:30 14/11/16 22:45:20 INFO DAGScheduler: Got job 7 (countByValue at <console>:30) with 32 output partitions (allowLocal=false) ... 14/11/16 22:45:30 INFO SparkContext: Job finished: countByValue at <console>:30, took 9.796565 s (THEFT,859197) (BATTERY,757530) (NARCOTICS,489528) (CRIMINAL DAMAGE,488209) (BURGLARY,257310) (OTHER OFFENSE,253964) (ASSAULT,247386) (MOTOR VEHICLE THEFT,197404) (ROBBERY,157706) (DECEPTIVE PRACTICE,137538) (CRIMINAL TRESPASS,124974) (PROSTITUTION,47245) (WEAPONS VIOLATION,40361) (PUBLIC PEACE VIOLATION,31585) (OFFENSE INVOLVING CHILDREN,26524) (CRIM SEXUAL ASSAULT,14788) (SEX OFFENSE,14283) (GAMBLING,10632) (LIQUOR LAW VIOLATION,8847) (ARSON,6443) (INTERFERE WITH PUBLIC OFFICER,5178) (HOMICIDE,4846) (KIDNAPPING,3585) (INTERFERENCE WITH PUBLIC OFFICER,3147) (INTIMIDATION,2471) (STALKING,1985) (OFFENSES INVOLVING CHILDREN,355) (OBSCENITY,219) (PUBLIC INDECENCY,86) (OTHER NARCOTIC VIOLATION,80) (NON-CRIMINAL,12) (RITUALISM,12) (OTHER OFFENSE ,6) (NON-CRIMINAL (SUBJECT SPECIFIED),2) (NON - CRIMINAL,2) We get the same results as with the Unix commands except it took less than 10 seconds to calculate which is pretty cool!Reference: Spark: Parse CSV file and group by column value from our JCG partner Mark Needham at the Mark Needham Blog blog....

Java Reflection Tutorial – The ULTIMATE Guide

This tutorial is about reflection, the ability of a computer program to examine and modify the structure and behavior (specifically the values, meta-data, properties and functions) of the program at runtime. We are going to explain what reflection is in general and how can be used in Java. Real uses cases about different reflection uses are listed in the next chapters. Several code snippets will be shown; at the end of this article you can find a compressed file that contains all these examples (and some more). All code has been written using Eclipse Luna 4.4 and Java update 8.25, no third party libraries are needed.Want to master Java Reflection ?Subscribe to our newsletter and download the Java Reflection Ultimate Guide right now! In order to help you master the topic of Reflection, we have compiled a kick-ass guide with all the major features and use cases! Besides studying them online you may download the eBook in PDF format!Email address:Given email address is already subscribed, thank you!Oops. Something went wrong. Please try again later.Please provide a valid email address.Thank you, your sign-up request was successful! Please check your e-mail inbox.Please complete the CAPTCHA.Please fill in the required fields.Table Of Contents1. Reflection 2. Introduction to reflection in Java 3. Use cases 4. Reflection components and mechanisms 5. Classes 6. Interfaces 7. Enums 8. Primitive types 9. Fields 10. Methods 11. Constructors 12. Getters and Setters 13. Static elements 14. Arrays 15. Collections 16. Annotations 17. Generics 18. Class Loaders 19. Dynamic Proxies 20. Java 8 Reflection features 21. Summary 22. Download 23. Resources  1. Reflection The concept of reflection in software means the ability to inspect, analyze and modify other code at runtime. For example imagine an application that takes as input some files containing source code (we do not care about what source code yet). The goal of this application is to count the number of methods that are contained in each passed class. This can be solved using reflection by analyzing the code and counting the elements which are actually methods, ignoring other kind of elements like attributes, interfaces, etc, and grouping them by classes. Purely speaking, this example is not really reflection, because the code does not have to be analyzed at runtime and the task can be done in any other stage, but it can be also done at runtime and then we would be actually talking about reflection. Another example would be an application that analyzes the content of given classes and executes the methods that contain a specific annotation with arguments provided in runtime: In the Java Junit framework we have for example the annotation @Test. This is actually what Junit does; and does it using reflection. 2. Introduction to reflection in Java In Java, it is possible to inspect fields, classes, methods, annotations, interfaces, etc. at runtime. You do not need to know how classes or methods are called, neither the parameters that are needed, all of that can be retrieved at runtime using reflection. It is also possible to instantiate new classes, to create new instances and to execute their methods, all of it using reflection. Reflection is present in Java since the beginning of the times via its reflection API. The class Class contains all the reflection related methods that can be applied to classes and objects like the ones that allow a programmer to retrieve the class name, to retrieve the public methods of a class, etc. Other important classes are Method, Field and Type containing specific reflection methods that we are going to see in this tutorial. Although reflection is very useful in many scenarios, it should not be used for everything. If some operation can be executed without using reflection, then we should not use it. Here are some reasons:The performance is affected by the use of reflection since all compilation optimizations cannot be applied: reflection is resolved at runtime and not at compile stages. Security vulnerabilities have to be taken into consideration since the use of reflection may not be possible when running in secure contexts like Applets. Another important disadvantage that is good to mention here is the maintenance of the code. If your code uses reflection heavily it is going to be more difficult to maintain. The classes and methods are not directly exposed in the code and may vary dynamically so it can get difficult to change the number of parameters that a method expects if the code that calls this method is invoked via reflection. Tools that automatically refactor or analyze the code may have trouble when a lot of reflection is present.3. Use cases Despite all the limitations, reflection is a very powerful tool in Java that can be taken into consideration in several scenarios. In general, reflection can be used to observe and modify the behavior of a program at runtime. Here is a list with the most common use cases:IDEs can heavily make use of reflection in order to provide solutions for auto completion features, dynamic typing, hierarchy structures, etc. For example, IDEs like Eclipse or PHP Storm provide a mechanism to retrieve dynamically the arguments expected for a given method or a list of public methods starting by “get” for a given instance. All these are done using reflection. Debuggers use reflection to inspect dynamically the code that is being executed. Test tools like Junit or Mockito use reflection in order to invoke desired methods containing specific syntax or to mock specific classes, interfaces and methods. Dependency injection frameworks use reflection to inject beans and properties at runtime and initialize all the context of an application. Code analysis tools like PMD or Findbugs use reflection in order to analyze the code against the list of code violations that are currently configured. External tools that make use of the code dynamically may use reflection as well.In this tutorial we are going to see several examples of use of reflection in Java. We will see how to get all methods for a given instance, without knowing what kind of class this instance is and we are going to invoke different methods depending on their syntax. We are not just going to show what other tutorials do, but we will go one step forward by indicating how to proceed when using reflection with generics, annotations, arrays, collections and other kind of objects. Finally we will explain the main new features coming out with Java 8 related to this topic. 4. Reflection components and mechanisms In order to start coding and using reflection in Java we first have to explain a couple of concepts that may be relevant.Interface in Java is a contract with the applications that may use them. Interfaces contain a list of methods that are exposed and that have to be implemented by the subclasses implementing these interfaces. Interfaces cannot be instantiated. Since Java 8 they can contain default method implementations although this is not the common use. Class is the implementation of a series of methods and the container of a series of properties. It can be instantiated. Object is an instance of a given class. Method is some code performing some actions. They have return types as outputs and input parameters. Field is a property of a class. Enums are elements containing a set of predefined constants. Private element is an element that is only visible inside a class and cannot be accessed from outside. It can be a method, a field… Static elements are elements that belong to the class and not to a specific instance. Static elements can be fields used across all instances of a given class, methods that can be invoked without need to instantiate the class, etc. This is very interesting while using reflection since it is different to invoke a static method than a non static one where you need an instance of a class to execute it. Annotation is code Meta data informing about the code itself. Collection is a group of elements, can be a List, a Map, a Queue, etc. Array is an object containing a fixed number of values. Its length is fixed and is specified on creation. Dynamic proxy is a class implementing a list of interfaces specified at runtime. They use the class java.lang.reflect.Proxy. We will see this more in detail in the next chapters. Class loader is an object in charge of loading classes given the name of a class. In Java, every class provide methods to retrieve the class loader: Class.getClassLoader(). Generics were introduced in java update 5. They offer compile time safety by indicating what type or sub types a collection is going to use. For example using generics you can prevent that an application using a list containing strings would try to add a Double to the list in compile time.The different nature of these components is important in order to use reflection within them. Is not the same to try to invoke a private method than a public one; it is different to get an annotation name or an interface one, etc. We will see examples for all of these in the next chapters. 5. Classes Everything in Java is about classes, reflection as well. Classes are the starting point when we talk about reflection. The class java.lang.Class contains several methods that allow programmers to retrieve information about classes and objects (and other elements) at runtime. In order to retrieve the class information from a single instance we can write (in this case, for the String class): Class<? extends String> stringGetClass = stringer.getClass();Or directly from the class name without instantiation: Class<String> stringclass = String.class;or using the java.lang.Class.forName(String) method: Class.forName( "java.lang.String" )From a class object we can retrieve all kind of information like declared methods, constructors, visible fields, annotations, types…In this tutorial all these is explained in the following chapters. It is also possible to check properties for a given class like for example if a class is a primitive, or an instance: stringGetClass.isInstance( "dani" ); stringGetClass.isPrimitive();It is also possible to create new instances of a given class using the method java.lang.Class.newInstance() passing the right arguments: String newInstanceStringClass = stringclass.newInstance();String otherInstance = (String)Class.forName( "java.lang.String" ).newInstance();The java.lang.Class.newInstance() method can be used only when the class contains a public default constructor or a constructor without arguments, if this is not the case, this method cannot be used. In these cases where the java.lang.Class.newInstance() method cannot be used the solution is to retrieve a proper constructor at runtime and create an instance using this constructor with the arguments that it is expecting. We will see in the chapter related to constructors. 6. Interfaces Interfaces are elements that cannot be instantiated and that contain the exposed methods that should be implemented by their subclasses. Related to reflection there is nothing special regarding interfaces. Interfaces can be accessed like a class using their qualified name. All methods available for classes are available for interfaces as well. Here is an example of how to access interface class information at runtime:// can be accessed like a class System.out.println( "interface name: " + InterfaceExample.class.getName() );Assuming that the InterfaceExample element is an interface. One obvious difference between classes and interfaces is that interfaces cannot be instantiated using reflection via the newInstance() method:// cannot be instantiated: java.lang.InstantiationException InterfaceExample.class.newInstance();The snippet above will throw an InstantiationException at runtime. At compile time no error appears. 7. Enums Enums are special java types that allow variables to be a set of constants. These constants are predefined in the enum declaration:enum ExampleEnum { ONE, TWO, THREE, FOUR };Java contains several enums specific methods:java.lang.Class.isEnum(): Returns true if the element is of the type enum. False otherwise java.lang.Class.getEnumConstants(): Gets all constants for the given element (which is an enum). In case the element is not an enum an exception is thrown. java.lang.reflect.Field.isEnumConstant(): Returns true in case the field used is an enum constant. False otherwise. Only applicable to fields.We are going to see an example of how to use the main enum methods related to reflection. First of all we create an instance of the enum:ExampleEnum value = ExampleEnum.FOUR;We can check if the element is an enum using the method isEnum(): System.out.println( "isEnum " + value.getClass().isEnum() );In order to retrieve all the enum constants we can do something like the following using the method getEnumConstants():ExampleEnum[] enumConstants = value.getClass().getEnumConstants(); for( ExampleEnum exampleEnum : enumConstants ) { System.out.println( "enum constant " + exampleEnum ); }Finally we can check how to use the field related method isEnumConstants(). First we retrieve all declared fields for the given class (we will see more in detail in the next chapters all methods related reflection utilities) and after that we can check if the field is an enum constant or not:Field[] flds = value.getClass().getDeclaredFields(); for( Field f : flds ) { // check for each field if it is an enum constant or not System.out.println( f.getName() + " " + f.isEnumConstant() ); }The output of all these pieces of code will be something like the following:isEnum true enum constant ONE enum constant TWO enum constant THREE enum constant FOUR ONE true TWO true THREE true FOUR true ENUM$VALUES falseThe string ENUM$VALUES false refers to the internal enum values field. For more information about enums and how to handle them, please visit http://docs.oracle.com/javase/tutorial/java/javaOO/enum.html. 8. Primitive types In Java, there are a couple of types that are handled differently because of its nature and behavior: when we are talking about reflection, primitive types like int, float, double, etc. can be accessed and used almost like any other classes. Here are a couple of examples of how to use reflection when we are working with primitive types: • It is possible to retrieve a class object from a primitive type as for any other non primitive type:Class<Integer> intClass = int.class; • But It is not possible to create new instances for primitive types using reflection:Integer intInstance = intClass.newInstance(); • It is possible to check if a given class belongs to a primitive type or not using the method java.lang.Class.isPrimitive():System.out.println( "is primitive: " + intClass.isPrimitive() ); In this case an exception of the type java.lang.InstantiationException is going to be thrown. 9. Fields Class fields can be handled in runtime using reflection. Classes offer several methods to access their fields at runtime. The most important ones are: • java.lang.Class.getDeclaredFields(): It returns an array with all declared fields for the class. It returns all private fields as well. • java.lang.Class.getFields(): It returns an array with all accessible fields for the class. • java.lang.Class.getField(String): It returns a field with the name passed as parameter. It throws an exception if the field does not exist or is not accessible. • java.lang.Class.getDeclaredFields(): It returns a field with the given name, if the field does not exist it throws an exception. These methods return an array of elements (or a single one) of the type java.lang.reflect.Field. This class contains several interesting methods that can be used at runtime that allow a programmer to read the properties and the values of the specific field. Here is a class that uses this functionality:String stringer = "this is a String called stringer";Class<? extends String> stringGetClass = stringer.getClass();Class<String> stringclass = String.class;Field[] fields = stringclass.getDeclaredFields();for( Field field : fields ) { System.out.println( "*************************" ); System.out.println( "Name: " + field.getName() ); System.out.println( "Type: " + field.getType() );// values if( field.isAccessible() ) { System.out.println( "Get: " + field.get( stringer ) ); // depending on the type we can access the fields using these methods // System.out.println( "Get boolean: " + field.getBoolean( stringer ) ); // System.out.println( "Get short: " + field.getShort( stringer ) ); // ... } System.out.println( "Modifiers:" + field.getModifiers() ); System.out.println( "isAccesible: " + field.isAccessible() );}// stringclass.getField( "hashCode" );//exceptionField fieldHashCode = stringclass.getDeclaredField( "hash" );// all fields can be // accessed this way// fieldHashCode.get( stringer ); // this produces an java.lang.IllegalAccessException// we change the visibility fieldHashCode.setAccessible( true );// and we can access it Object value = fieldHashCode.get( stringer ); int valueInt = fieldHashCode.getInt( stringer ); System.out.println( value );System.out.println( valueInt );In the snippet shown above you can see that the Field contains several methods to get the values of a given field like get() or type specific ones like getInt(). We also can see in the pasted code how we can change the way the visibility of a given field by using the method setAccessible(). This is not always possible and under specific conditions and environments may be prevented. However this allows us to make a private field accessible and access its value and properties via reflection. This is very useful in testing frameworks like Mockito or PowerMock. The output or the program would be:************************* Name: value Type: class [C Modifiers:18 isAccesible: false ************************* Name: hash Type: int Modifiers:2 isAccesible: false ************************* Name: serialVersionUID Type: long Modifiers:26 isAccesible: false ************************* Name: serialPersistentFields Type: class [Ljava.io.ObjectStreamField; Modifiers:26 isAccesible: false ************************* Name: CASE_INSENSITIVE_ORDER Type: interface java.util.Comparator Modifiers:25 isAccesible: false 0 010. Methods In order to retrieve all visible methods for a given class we can do the following:Class<String> stringclass = String.class; Method[] methods = stringclass.getMethods();Using the method java.lang.Class.getMethods() all visible or accessible methods for a given class are retrieved. We can also retrieve an specific method using its name and the type of the arguments he is expecting to receive, as an example:Method methodIndexOf = stringclass.getMethod( "indexOf", String.class ); For a given method (an instance of the type java.lang.reflect.Method), we can access all its properties. The following snippet shows a couple of them like name, default values, return type, modifiers, parameters, parameter types or the exceptions thrown, we can also check if a method is accessible or not:// All methods for the String class for( Method method : methods ) { System.out.println( "****************************************************" ); System.out.println( "name: " + method.getName() ); System.out.println( "defaultValue: " + method.getDefaultValue() );System.out.println( "generic return type: " + method.getGenericReturnType() ); System.out.println( "return type: " + method.getReturnType() );System.out.println( "modifiers: " + method.getModifiers() );// Parameters Parameter[] parameters = method.getParameters(); System.out.println( parameters.length + " parameters:" ); // also method.getParameterCount() is possible for( Parameter parameter : parameters ) { System.out.println( "parameter name: " + parameter.getName() ); System.out.println( "parameter type: " + parameter.getType() ); } Class<?>[] parameterTypes = method.getParameterTypes(); System.out.println( parameterTypes.length + " parameters:" ); for( Class<?> parameterType : parameterTypes ) { System.out.println( "parameter type name: " + parameterType.getName() ); }// Exceptions Class<?>[] exceptionTypes = method.getExceptionTypes(); System.out.println( exceptionTypes.length + " exception types: " ); for( Class<?> exceptionType : exceptionTypes ) { System.out.println( "exception name " + exceptionType.getName() ); }System.out.println( "is accesible: " + method.isAccessible() ); System.out.println( "is varArgs: " + method.isVarArgs() );}It is also possible to instantiate given methods for specific objects passing the arguments that we want, we should assure that the amount and type of the arguments is correct:Object indexOf = methodIndexOf.invoke( stringer, "called" ); This last feature is very interesting when we want to execute specific methods at runtime under special circumstances. Also in the creation of Invocation handlers for dynamic proxies is very useful, we will see this point at the end of the tutorial. 11. Constructors Constructors can be used via reflection as well. Like other class methods they can be retrieved in runtime and several properties can be analyzed and checked like the accessibility, the number of parameters, their types, etc. In order to retrieve all visible constructors from a class, we can do something like:// get all visible constructors Constructor<?>[] constructors = stringGetClass.getConstructors();In the snippet above we are retrieving all visible constructors. If we want to get all the constructors, including the private ones, we can do something like://all constructors Constructor<?>[] declaredConstructors = stringclass.getDeclaredConstructors();General information about constructors such as parameters, types, names, visibility, annotations associated, etc. can be retrieved in the following way:for( Constructor<?> constructor : constructors ) { int numberParams = constructor.getParameterCount() ; System.out.println( "constructor " + constructor.getName() ); System.out.println( "number of arguments " + numberParams); // public, private, etc. int modifiersConstructor = constructor.getModifiers(); System.out.println( "modifiers " + modifiersConstructor ); // array of parameters, more info in the methods section Parameter[] parameters = constructor.getParameters(); // annotations array, more info in the annotations section Annotation[] annotations = constructor.getAnnotations();}Constructors can also be used to create new instances. This may be very useful in order to access private or not visible constructors. This should be done only under very special circumstances and depending on the system where the application is running may not work because of security reasons as explained at the beginning of this tutorial. In order to create a new instance of a class using a specific constructor we can do something like the following:// can be used to create new instances (no params in this case) String danibuizaString = (String)constructor.newInstance( );In has to be taken into consideration that the amount of parameters and their type should match the constructor instance ones. Also the accessibility of the constructor has to be set to true in order to invoke it (if it was not accesible). This can be done in the same way as we did for class methods and fields. 12. Getters and Setters Getters and setters are not different to any other class method inside a class. The main difference is that they are a standard way to access private fields. Here is a description of both: • Getters are used to retrieve the value of a private field inside a class. Its name starts with “get” and ends with the name of the property in camel case. They do not receive any parameter and their return type is the same than the property that they are returning. They are public. • Setters are used to modify the value of a private field inside a class. Its name starts with “set” and ends with the name of the property in camel case. They receive one parameters of the same type than the property that they are modifying and they do not return any value (void). They are public. For example, for the private property private int count; we can have the getter and setter methods:public int getCount(){ return this.count; }public void setCount(int count){ this.count = count; }Following these standards we can use reflection to access (read and modify) at runtime all the private properties of a class exposed via getters and setters. This mechanism is used by several known libraries like Spring Framework or Hibernate, where they expect classes to expose their properties using these kinds of methods. Here is an example of how to use getters and setters using reflection:Car car = new Car( "vw touran", "2010", "12000" );Method[] methods = car.getClass().getDeclaredMethods();// all getters, original values for( Method method : methods ) { if( method.getName().startsWith( "get" ) ) { System.out.println( method.invoke( car ) ); } }// setting values for( Method method : methods ) {if( method.getName().startsWith( "set" ) ) { method.invoke( car, "destroyed" ); } }// get new values for( Method method : methods ) { if( method.getName().startsWith( "get" ) ) { System.out.println( method.invoke( car ) ); } }Where the class Car. looks like the following:public class Car {private String name; private Object price; private Object year;public Car( String name, String year, String price ) { this.name = name; this.price = price; this.year = year; }public String getName() { return name; }public void setName( String name ) { this.name = name; }public Object getPrice() { return price; }public void setPrice( Object price ) { this.price = price; }public Object getYear() { return year; }public void setYear( Object year ) { this.year = year; }}The output will be something like:vw touran 2010 12000 destroyed destroyed destroyed13. Static elements Static classes, methods and fields behave completely different than instance ones. The main reason is that they do not need to be instantiated or created before they are invoked. They can be used without previous instantiation. This fact changes everything: static methods can be invoked without instantiating their container classes, static class fields are stateless (so thread safe), static elements are very useful in order to create singletons and factories… Summarizing, static elements are a very important mechanism in Java. In this chapter we are going to show the main differences between static and instance elements in relation to reflection: How to create static elements at runtime and how to invoke them. For example, for the next static inline class:public class StaticReflection {static class StaticExample { int counter; } ... In order to retrieve static inline classes we have the following options:// 1 access static class System.out.println( "directly " + StaticExample.class.getName() );//2 using for name directly throws an exception Class<?> forname = Class.forName("com.danibuiza.javacodegeeks.reflection.StaticReflection.StaticExample" );//3 using $ would work but is not that nice Class<?> forname = Class.forName("com.danibuiza.javacodegeeks.reflection.StaticReflection$StaticExample" );// 4 another way iterating through all classes declared inside this class Class<?>[] classes = StaticReflection.class.getDeclaredClasses(); for( Class<?> class1 : classes ) { System.out.println( "iterating through declared classes " + class1.getName() ); }The main difference is that the class is contained inside another class; this has nothing to do with reflection but with the nature of inline classes. In order to get static methods from a class, there are no differences with the access to instance ones (this applies to fields as well):// access static methods in the same way as instance ones Method mathMethod = Math.class.getDeclaredMethod( "round", double.class );In order to invoke static methods or fields we do not need to create or specify an instance of the class, since the method (or the field) belongs to the class itself, not to a single instance:// methods: object instance passed can be null since method is static Object result = mathMethod.invoke( null, new Double( 12.4 ) );// static field access, instance can be null Field counterField = Counter.class.getDeclaredField( "counter" ); System.out.println( counterField.get( null ) );14. Arrays The class java.lang.reflect.Array offers several functionalities for handling arrays; it includes various static reflective methods: • java.lang.reflect.Array.newInstance(Class, int): Creates a new instance of an array of the type passed as parameter with the length given in the second argument. Is similar to the method with the same name in the java.lang.Class class but this one contains parameters that allows the programmer to set the type of the array and its length. • java.lang.reflect.Array.set(Object, int, Object): Sets an element (passed index) of the given array with the object passed as argument. • java.lang.reflect.Array.getLength(Object): Returns the length of the array as int. • java.lang.reflect.Array.get(Object, int): Retrieves the element of the array positioned in the pased index. Returns an object. • java.lang.reflect.Array.getInt(Object, int): Similar method for the primitive type int. Returns an int. There are methods available for all primitive types. Here is an example of how we can use all these methods:// using the Array class it is possible to create new arrays passing the type and the length via reflection String[] strArrayOne = (String[])Array.newInstance( String.class, 10 );// it contains utility methods for setting values Array.set( strArrayOne, 0, "member0" ); Array.set( strArrayOne, 1, "member1" ); Array.set( strArrayOne, 9, "member9" );// and for getting values as well System.out.println( "strArrayOne[0] : " + Array.get( strArrayOne, 0 ) ); System.out.println( "strArrayOne[1] : " + Array.get( strArrayOne, 1 ) ); System.out.println( "strArrayOne[3] (not initialized) : " + Array.get( strArrayOne, 3 ) ); System.out.println( "strArrayOne[9] : " + Array.get( strArrayOne, 9 ) );// also methods to retrieve the lenght of the array System.out.println( "lenght strArrayOne: " + Array.getLength( strArrayOne ) );// primitive types work as well int[] intArrayOne = (int[])Array.newInstance( int.class, 10 );Array.set( intArrayOne, 0, 1 ); Array.set( intArrayOne, 1, 2 ); Array.set( intArrayOne, 9, 10 );// and specific getters and setters for primitive types for( int i = 0; i < Array.getLength( intArrayOne ); ++i ) { System.out.println( "intArrayOne[" + i + "] : " + Array.getInt( intArrayOne, i ) ); }The output of the program above would be: ...

How To Setup BPM and Rules Tooling For JBoss Developer Studio 8

The release of the latest JBoss Developer Studio (JBDS) brings with it the questions around how to get started with the various JBoss Integration and BPM product tool sets that are not installed out of the box. In this series of articles we will outline for you how to install each set of tools and explain which products they are supporting. This should help you in making an informed decision about what tooling you might want to install before embarking on your next JBoss integration project. There are four different software packs that offer tooling for various JBoss integration products:JBoss Business Process and Rules Development JBoss Data Virtualization Development JBoss Integration and SOA Development JBoss SOA 5.x DevelopmentThis article will outline how to get started with the JBoss BPM and Rules Development tooling and JBDS 8. JBDS 8 can be obtained through the Customer Portal or via the early access downloads on jboss.org. After installing JBDS, start it up and you will see a welcoming JBoss Central tab with at the bottom a tab to look at the available tool sets labeled Software/Update. You will notice that there are no JBoss Integration stacks offered to install upon first inspection. This is due to, at the time of this writing, the integration stacks being early access.Eventually they will be shown by default once testing finished and they release, but for now you can obtain them by checking the Early Access box in the bottom right corner. This will reveal the integration stack tooling offerings and we will select JBoss BPM & Rules Development. Click on the Install/Update button to start the installation and restart at the end to complete the process. If you are interested in what is being installed, it can be examined by digging into the menu as follows:Help -> Install new software…pull down the menu offered by Work with: select JBoss Developer Studio 8.x – Early Access everything under JBoss Business Process and Rules Development will be installedStay tuned for more articles in this series that will detail the installation of the remaining JBoss Integration Stack tools.Reference: How To Setup BPM and Rules Tooling For JBoss Developer Studio 8 from our JCG partner Eric Schabell at the Eric Schabell’s blog blog....

How to compress responses in Java REST API with GZip and Jersey

There may be cases when your REST api provides responses that are very long, and we all know how important transfer speed and bandwidth still are on mobile devices/networks. I think this is the first performance optimization point one needs to address, when developing REST apis that support mobile apps. Guess what? Because responses are text, we can compress them. And with today’s power of smartphones and tablets uncompressing them on the client side should not be a big deal… So in this post I will present how you can SELECTIVELY compress your REST API responses, if you’ve built it in Java with Jersey, which is  the JAX-RS Reference Implementation (and more)…     1. Jersey filters and interceptors Well, thanks to Jersey’s powerful Filters and Interceptors features, the implementation is fairly easy.  Whereas filters are primarily intended to manipulate request and response parameters like HTTP headers, URIs and/or HTTP methods, interceptors are intended to manipulate entities, via manipulating entity input/output streams. You’ve seen the power of filters in my posts:How to add CORS support on the server side in Java with Jersey, where I’ve shown how to CORS-enable a REST API and  How to log in Spring with SLF4J and Logback, where I’ve shown how to log requests and responses from the REST APIbut for compressing will be using a GZip WriterInterceptor. A writer interceptor is used for cases where entity is written to the “wire”, which on the server side as in this case, means when writing out a response entity. 1.1. GZip Writer Interceptor So let’s have a look at our GZip Writer Interceptor: GZip Writer Interceptor package org.codingpedia.demo.rest.interceptors;import java.io.IOException; import java.io.OutputStream; import java.util.zip.GZIPOutputStream;import javax.ws.rs.WebApplicationException; import javax.ws.rs.core.MultivaluedMap; import javax.ws.rs.ext.WriterInterceptor; import javax.ws.rs.ext.WriterInterceptorContext;@Provider @Compress public class GZIPWriterInterceptor implements WriterInterceptor { @Override public void aroundWriteTo(WriterInterceptorContext context) throws IOException, WebApplicationException { MultivaluedMap<String,Object> headers = context.getHeaders(); headers.add("Content-Encoding", "gzip"); final OutputStream outputStream = context.getOutputStream(); context.setOutputStream(new GZIPOutputStream(outputStream)); context.proceed(); } } Note:it implements the WriterInterceptor,  which is an interface for message body writer interceptors that wrap around calls to javax.ws.rs.ext.MessageBodyWriter.writeTo providers implementing WriterInterceptor contract must be either programmatically registered in a JAX-RS runtime or must be annotated with @Provider annotation to be automatically discovered by the JAX-RS runtime during a provider scanning phase. @Compress  is the name binding annotation, which we will discuss more detailed in the coming paragraph “The interceptor gets a output stream from the WriterInterceptorContext and sets a new one which is a GZIP wrapper of the original output stream. After all interceptors are executed the output stream lastly set to the WriterInterceptorContext will be used for serialization of the entity. In the example above the entity bytes will be written to the GZIPOutputStream which will compress the stream data and write them to the original output stream. The original stream is always the stream which writes the data to the “wire”. When the interceptor is used on the server, the original output stream is the stream into which writes data to the underlying server container stream that sends the response to the client.” [2] “The overridden method aroundWriteTo() gets WriterInterceptorContext as a parameter. This context contains getters and setters for header parameters, request properties, entity, entity stream and other properties.” [2]; when you compress your response you should set the “Content-Encoding” header to “gzip”1.2. Compress annotation Filters and interceptors can be name-bound. Name binding is a concept that allows to say to a JAX-RS runtime that a specific filter or interceptor will be executed only for a specific resource method. When a filter or an interceptor is limited only to a specific resource method we say that it is name-bound. Filters and interceptors that do not have such a limitation are called global. In our case we’ve built the @Compress annotation: Compress annotation package org.codingpedia.demo.rest.interceptors;import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy;import javax.ws.rs.NameBinding;//@Compress annotation is the name binding annotation @NameBinding @Retention(RetentionPolicy.RUNTIME) public @interface Compress {} and used it to mark methods on resources which should be gzipped (e.g. when GET-ing all the podcasts with the PodcastsResource): @Compress annotation usage on resource method @Component @Path("/podcasts") public class PodcastsResource {@Autowired private PodcastService podcastService;........................... /* * *********************************** READ *********************************** */ /** * Returns all resources (podcasts) from the database * * @return * @throws IOException * @throws JsonMappingException * @throws JsonGenerationException * @throws AppException */ @GET @Compress @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML }) public List<Podcast> getPodcasts( @QueryParam("orderByInsertionDate") String orderByInsertionDate, @QueryParam("numberDaysToLookBack") Integer numberDaysToLookBack) throws IOException, AppException { List<Podcast> podcasts = podcastService.getPodcasts( orderByInsertionDate, numberDaysToLookBack); return podcasts; } ........................... } 2. Testing 2.1. SOAPui Well, if you are testing with SOAPui, you can issue the following request against the PodcastsResource. Request: Request example GET http://localhost:8888/demo-rest-jersey-spring/podcasts/?orderByInsertionDate=DESC HTTP/1.1 Accept-Encoding: gzip,deflate Accept: application/json, application/xml Host: localhost:8888 Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1.1 (java 1.5) Response: GZipped json response, automatically unzipped by SOAPui HTTP/1.1 200 OK Content-Type: application/json Content-Encoding: gzip Content-Length: 409 Server: Jetty(9.0.7.v20131107)[ { "id": 2, "title": "Quarks & Co - zum Mitnehmen", "linkOnPodcastpedia": "http://www.podcastpedia.org/quarks", "feed": "http://podcast.wdr.de/quarks.xml", "description": "Quarks & Co: Das Wissenschaftsmagazin", "insertionDate": "2014-10-29T10:46:13.00+0100" }, { "id": 1, "title": "- The Naked Scientists Podcast - Stripping Down Science", "linkOnPodcastpedia": "http://www.podcastpedia.org/podcasts/792/-The-Naked-Scientists-Podcast-Stripping-Down-Science", "feed": "feed_placeholder", "description": "The Naked Scientists flagship science show brings you a lighthearted look at the latest scientific breakthroughs, interviews with the world top scientists, answers to your science questions and science experiments to try at home.", "insertionDate": "2014-10-29T10:46:02.00+0100" } ] SOAPui recognizes the Content-Type: gzip header, we’ve added in the GZIPWriterInterceptor and automatically uncompresses the response and displays it readable to the human eye. Well, that’s it. You’ve learned how Jersey makes it straightforward to compress the REST api responses. Tip: If you want really learn how to design and implement REST API in Java read the following Tutorial – REST API design and implementation in Java with Jersey and SpringReference: How to compress responses in Java REST API with GZip and Jersey from our JCG partner Adrian Matei at the Codingpedia.org blog....

It’s All About Tests – Part 1

This post is the first of a series of three.Mindset of testing Techniques Tools and TipsThe Mindset Testing code is something that needs to be learned. It takes time to absorb how to do it well. It’s a craft that one should always practice and improve.   Back in the old days, developers did not test, they checked their code. Here’s a nice twit about it: Checking: code does what the coder intends it to do. Testing: code does what the customer needs it to do. #agile #tdd #bdd — Neil Killick (@neil_killick) November 7, 2014Today we have many tools and techniques to work with. XUnit frameworks, mock frameworks, UI automation, TDD, XP… But I believe that testing starts with the mind. State of mind. Why Testing Should I really answer that? Tests are your code harness and security for quality. Tests tell the story of your code. They prove that something works. They give immediate feedback if something went wrong. Working with tests correctly makes you more efficient and effective. You debug less and probably have less bugs, therefore you have more time to do actual work. Your design will be better (more about it later) and maintainable. You feel confident changing your code (refactor). More about it later. It reduces stress, as you are more confident with your code. What to Test I say everything. Perhaps you will skip the lowest parts of your system. The parts that reads/writes to the file system or the DB or communicate some external service. But even these parts can be tested. And they should. In following blogs I will describe some techniques how to do that. Test even the smallest thing. For example, if you have a DTO and you decide that a certain field will be initialized with some value, then make a test that only instantiate this class and then verify (assert) the expected value (and yes, I know, some parts really cannot be tested. but they should remain minimal). SRP Single Responsibility Principle. This is how I like to refer to the point that a test needs to check one thing. If it’s a unit test, then it should test one behavior of your method / class. Different behavior should be tested in a different test. If it’s a higher level of test (integration, functional, UI), then the same principle applies. Test one flow of the system. Test a click. Test adding elements to DB correctly, but not deleting in the same test. Isolation Isolated test helps us understand exactly what went wrong. Developing isolated test helps us concentrate on one problem at a time. One aspect of isolation is related to the SRP. When you test something, isolate the tested code from other part (dependencies). That way you test only that part of the code. If the test fails, you know were it was. If you have many dependencies in the test, it is much harder to understand what the actual cause of failure was. But isolation means other things as well. It means that no test would interfere another. It means that the running order of the tests doesn’t matter. For a unit test, it means that you don’t need a DB running (or internet connection for that matter). It means that you can run your tests concurrently without one interfere the other (maven allows exactly this). If you can’t do it (example: DB issues), then your tests are not isolated. Test Smells When the test is too hard to understand / maintain, don’t get mad on it! Say:thank you very much, my dear test, for helping me improve the codeIf it is too complicated to setup environment for the test, then probably the unit being tested has too many dependencies. If after running a method under test, you need to verify many aspects (verify, assert, etc.), the method probably does too much. The test can be your best friend for code improvement. Usually a really complicated test code means less structured production code. I usually see correlation between complicated test and code that doesn’t follow the SRP, or any other DOLID principles. Testable Code This is one of my favorites. Whenever I do code review I ask the other person: “How are you going to test it?”, “How do you know it works?” Whenever I code, I ask myself the same question. “How can I test this piece of code?” In my experience, thinking always on how to create testable code, yields much better design. The code “magically” has more patterns, less duplication, better OOD and behaves SOLIDly. Forcing yourself to constantly test your code, makes you think. It helps divide big, complicated problem into many (or few) smaller, more trivial ones. If your code is testable and tested, you have more confident on it. Confident on the behavior and confident to change it. Refactor it. Refactoring This item can be part of the why. It can be also part of the techniques. But I decided to give it special attention. Refactoring is part of the TDD cycle (but not only). When you have tests, you can be confident doing refactoring. I think that you need to “think about refactoring” while developing. Similar to “think how to produce testable code”. When thinking refactoring, testing comes along. Refactoring is also state of mind. Ask yourself: “Is the code I produced clean enough? Can I improve it?” (BTW, know when to stop…) This was the first post of a series of posts about testing. The following post will be about some techniques and approaches for testing.Reference: It’s All About Tests – Part 1 from our JCG partner Eyal Golan at the Learning and Improving as a Craftsman Developer blog....

Apache Lucene 5.0.0 is coming!

At long last, after a strong series of 4.x feature releases, most recently 4.10.2, we are finally working towards another major Apache Lucene release! There are no promises for the exact timing (it’s done when it’s done!), but we already have a volunteer release manager (thank you Anshum!). A major release in Lucene means all deprecated APIs (as of 4.10.x) are dropped, support for 3.x indices is removed while the numerous 4.x index formats are still supported for index backwards compatibility, and the 4.10.x branch becomes our bug-fix only release series (no new features, no API changes). 5.0.0 already contains a number of exciting changes, which I describe below, and they are still rolling in with ongoing active development. Stronger index safety Many of the 5.0.0 changes are focused on providing stronger protection against index corruption. All file access now uses Java’s NIO.2 APIs, giving us better error handling (e.g., Files.delete returns a meaningful exception) along with atomic rename for safer commits, reducing the risk of hideous “your entire index is gone” bugs like this doozie. Lucene’s replication module, along with distributed servers on top of Lucene such as Elasticsearch or Solr, must copy index files from one place to another. They do this for backup purposes (e.g., snapshot and restore), for migrating or recovering a shard from one node to another or when adding a new replica. Such replicators try to be incremental, so that if the same file name is present, with the same length and checksum, it will not be copied again. Unfortunately, these layers sometimes have subtle bugs (they are complex!). Thanks to checksums (added in 4.8.0), Lucene already detects if the replicator caused any bit-flips while copying, and this revealed a long standing nasty bug in the compression library Elasticsearch uses. With 5.0.0 we take this even further and now detect if whole files were copied to the wrong file name, by assigning a unique id to every segment and commit (segments_N file). Each index file now records the segment id in its header, and then these ids are cross-checked when the index is opened. The new Lucene50Codec also includes further index corruption detection. Even CorruptIndexException itself is improved! It will now always refer to the file or resource where the corruption was detected, as this is now a required argument to its constructors. When corruption is detected higher up (e.g., a bad field number in the field infos file), the resulting CorruptIndexException will now state whether there was also a checksum mismatch in the file, helping to narrow the possible source of the corruption. Finally, during merge, IndexWriter now always checks the incoming segments for corruption before merging. This can mean, on upgrading to 5.0.0 that merging may uncover long-standing latent corruption in an older 4.x index. Reduced heap usage 5.0.0 also includes several changes to reduce heap usage during indexing and searching. If your index has 1B docs, then caching a single FixedBitSet-based filter in 4.10.2 costs a non-trivial 125 MB of heap! But with 5.0.0, Lucene now supports random-writable and advance-able sparse bitsets (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in proportion to how many bits are set, not how many total documents exist in the index. These bitsets also greatly simplify how MultiTermQuery is rewritten (no more CONSTANT_SCORE_AUTO_REWRITE_METHOD), and they provide faster advance implementations than FixedBitSet‘s linear scan. Finally, they provide a more accurate cost() implementation, allowing Lucene to make better choices about how to drive the intersection at query time. Heap usage during IndexWriter merging is also much lower with the new Lucene50Codec, since doc values and norms for the segments being merged are no longer fully loaded into heap for all fields; now they are loaded for the one field currently being merged, and then dropped. The default norms format now uses sparse encoding when appropriate, so indices that enable norms for many sparse fields will see a large reduction in required heap at search time. An explain API for heap usage If you still find Lucene using more heap than you expected, 5.0.0 has a new API to print a tree structure showing a recursive breakdown of which parts are using how much heap. This is analogous to Lucene’s explain API, used to understand why a document has a certain relevance score, but applied to heap usage instead. It produces output like this: _cz(5.0.0):C8330469: 28MB postings [...]: 5.2MB ... field 'latitude' [...]: 678.5KB term index [FST(nodes=6679, ...)]: 678.3KB This is a much faster way to see what is using up your heap than trying to stare at a Java heap dump. Further changes There is a long tail of additional 5.0.0 changes; here are some of them:Old experimental postings formats (Sep/Fixed/VariableIntPostingsFormat) have been removed. PulsingPostingsFormat has also been removed, since the default postings format already pulses unique terms. FieldCache is gone (moved to a dedicated UninvertingReader in the misc module). This means when you intend to sort on a field, you should index that field using doc values, which is much faster and less heap consuming than FieldCache. Tokenizers and Analyzers no longer require Reader on init. NormsFormat now gets its own dedicated NormsConsumer/Producer Simplifications to FieldInfo (Lucene’s “low schema”): no more normType (it is always a DocValuesType.NUMERIC), no more isIndexed (just check IndexOptions) Compound file handling is simpler, and is now under codec control. SortedSetSortField, used to sort on a multi-valued field, is promoted from sandbox to Lucene’s core PostingsFormat now uses a “pull” API when writing postings, just like doc values. This is powerful because you can do things in your postings format that require making more than one pass through the postings such as iterating over all postings for each term to decide which compression format it should use. Version is no longer required on init to classes like IndexWriterConfig and analysis components.The changes I’ve described here are just a snapshot of what we have lined up today for a 5.0.0 release. 5.0.0 is still under active development (patches welcome!) so this list will change by the time the actual release is done.Reference: Apache Lucene 5.0.0 is coming! from our JCG partner Michael Mc Candless at the Changing Bits blog....

Java Code Geeks sponsor Philadelphia Java Users’ Group November 2014 Meeting

Here at Java Code Geeks, we love interacting with fellow geeks and we always strive to provide as much value as possible to our great audience. In our continuous effort to support the community and further evangelize Java all over the world, we accepted a recent proposal to sponsor the Philadelphia Java Users’ Group November 2014 Meeting. Dave Fecak, a JCG partner of ours, founder and president of the Philadelphia JUG, reached out to us and kindly asked us if we are willing to sponsor their upcoming event. We gladly accepted the invitation and provided the financial support for it. Dave took over the execution part of the event. The event was a great success and the speaker, Frederic Jambukeswaran, exceeded all expectations with his great presentation on Java Application Deployment (slides here).Unfortunately, we were not able to participate in the meeting (we are in a different continent), but we were there mentally and we were extremely happy (and a bit touched) to see, firsthand, people enjoying our assistance and help, even if that was purely monetary. We would like to thank Dave and all the User Group. Keep coding guys! ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: