Featured FREE Whitepapers

What's New Here?

java-logo

6 Reasons Not to Switch to Java 8 Just Yet

Java 8 is awesome. Period. But… after we had the chance to have fun and play around with it, the time has come to quit avoiding the grain of salt. All good things come with a price and in this post I will share the main pain points of Java 8. Make sure you’re aware of these before upgrading and letting go of 7. 1. Parallel Streams can actually slow you down Java 8 brings the promise of parallelism as one of the most anticipated new features. The .parallelStream() method implements this on collections and streams. It breaks them into subproblems which then run on separate threads for processing, these can go to different cores and then get combined when they’re done. This all happens under the hood using the fork/join framework. Ok, sounds cool, it must speed up operations on large data sets in multi-core environments, right? No, it can actually make your code run slower if not used right. Some 15% slower on this benchmark we ran, but it could be even worse. Let’s say we’re already running multiple threads and we’re using .parallelStream() in some of them, adding more and more threads to the pool. This could easily turn into more than our cores could handle, and slow everything down due to increased context switching. The slower benchmark, grouping a collection into different groups (prime / non-prime): Map<Boolean, List<Integer>> groupByPrimary = numbers .parallelStream().collect(Collectors.groupingBy(s -> Utility.isPrime(s))); More slowdowns can occur for other reasons as well. Consider this, let’s say we have multiple tasks to complete and one of them takes much longer than the others for some reason. Breaking it down with .parallelStream() could actually delay the quicker tasks from being finished and the process as a whole. Check out this post by Lukas Krecan for more examples and code samples. Diagnosis: Parallelism with all its benefits also brings in additional types of problems to consider. When already acting in a multi-threaded environment, keep this in mind and get yourself familiar with what’s going on behind the scenes. 2. The flip-side of Lambda Expressions Lambdas. Oh, lambdas. We can do pretty much everything we already could without you, but you add so much grace and get rid of boilerplate code so it’s easy to fall in love. Let’s say I rise up in the morning and want to iterate over a list of world cup teams and map their lengths (Fun fact: it sums up to 254): List lengths = new ArrayList();for (String countries : Arrays.asList(args)) { lengths.add(check(country)); } Now let’s get functional with a nice lambda: Stream lengths = countries.stream().map(countries -> check(country)); Baam! That’s super. Although… while mostly seen as a positive thing, adding new elements like lambdas to Java pushes it further away from its original specification. The bytecode is fully OO and with lambdas in the game, the distance between the actual code and runtime grows larger. Read more about the dark side of lambda expression on this post by Tal Weiss. On the bottom line this all means that what you’re writing and what you’re debugging are two different things. Stack traces grow larger and larger and make it harder to debug your code. Something simple like adding an empty string to list turns this short stack trace: at LmbdaMain.check(LmbdaMain.java:19) at LmbdaMain.main(LmbdaMain.java:34) Into this: at LmbdaMain.check(LmbdaMain.java:19) at LmbdaMain.lambda$0(LmbdaMain.java:37) at LmbdaMain$$Lambda$1/821270929.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.LongPipeline.reduce(LongPipeline.java:438) at java.util.stream.LongPipeline.sum(LongPipeline.java:396) at java.util.stream.ReferencePipeline.count(ReferencePipeline.java:526) at LmbdaMain.main(LmbdaMain.java:39) Another issue that lambdas raise has to do with overloading: since lambda arguments have to be cast into something when using them to call a method, and they can be cast to multiple types, it may cause ambiguous calls in some cases. Lukas Eder explains this with code samples right here. Diagnosis: Just stay aware of this, the traces might be a pain from time to time, but it will not keep us away from them precious lambdas. 3. Default Methods are distracting Default methods enable a default implementation of a function in the interface itself. This is definitely one of the coolest new features Java 8 brings to the table but it somewhat interferes with the way we used to do things. So why was this introduced anyway? And what not to do with it? The main motivation behind Default Methods was that if at some point we need to add a method to an existing interface, we could do this without rewriting the implementation. Making it compatible with older versions. For example, take this piece of code from Oracle’s Java Tutorials where they add an ability to specify a timezone: public interface TimeClient { // ... static public ZoneId getZoneId (String zoneString) { try { return ZoneId.of(zoneString); } catch (DateTimeException e) { System.err.println("Invalid time zone: " + zoneString + "; using default time zone instead."); return ZoneId.systemDefault(); } }default public ZonedDateTime getZonedDateTime(String zoneString) { return ZonedDateTime.of(getLocalDateTime(), getZoneId(zoneString)); } } And that’s it, problem solved. Or is it? Default Methods mix up a bit the separation of interface and implementation. In the wrong hands, As if type hierarchies don’t tend to tangle up on their own, there’s this new creature now that we need to tame. Read more about it on Oleg Shelajev’s post on RebelLabs. Diagnosis: When you hold a hammer everything looks like a nail, keep in mind to stick to their original use case, evolution of an existing interface when a refactor to introduce a new abstract class doesn’t make sense. Moving on to some things that are either missing, still with us or not exactly there yet: 4. Wherefore art thou Jigsaw? Project Jigsaw’s goal is to make Java modular and break the JRE to interoperable components. The motivation behind this first comes from a desire for a better, faster and stronger Java embedded. I’m trying to avoid mentioning the “Internet of Things”, but there I said it. Reduced JAR sizes, performance improvements and increased security are some more of the promises this ambitious project holds. So where is it? Jigsaw entered Phase 2 just recently, passed the exploratory phase and is now switching gears to a production quality design and implementation, says Mark Reinhold, Oracle’s Chief Java Architect. The project was first planned to be completed in Java 8 and was deferred to Java 9, expected to be one of its flagship new features. Diagnosis: If this is the main thing that you’re waiting for, Java 9 is due in 2016. In the meantime, take a closer look and maybe even get involved in the Jigsaw-dev mailing list. 5. Issues that are still around Checked Exceptions No one likes boilerplate code, that’s one of the reasons why lambdas got so popular. Thinking of boilerplate exceptions, regardless of whether or not you logically need to catch or have something to do with a checked exception, you still need to catch it. Even if it’s something that would never happen, like this exception that will never fire: try { httpConn.setRequestMethod("GET"); } catch (ProtocolException pe) { /* Why don’t you call me anymore? */ } Primitives They are still here, and it’s a pain to use them right. The one thing that separates Java from being a pure Object Oriented language, criticized to have no significant performance hit for their removal. None of the new JVM languages has them, just saying. Operator Overloading James Gosling, the father of Java, once said in an interview “I left out operator overloading as a fairly personal choice because I had seen too many people abuse it in C++”. Kind of makes sense but there are lots of split opinions around this. Other JVM languages do offer this feature but on the other hand, it could result in code that looks like this: javascriptEntryPoints <<= (sourceDirectory in Compile)(base => ((base / "assets" ** "*.js") --- (base / "assets" ** "_*")).get ) An actual line of code from the Scala Play Framework, ahm, I’m a bit dizzy now. Diagnosis: Are these real problems anyway? We all have our quirks and these are some of Java’s. A surprise might happen in future versions and it will change, but backwards compatibility among other things is keeping them right here with us. 6. Functional Programming – not quite there yet Functional programming has been possible with Java before, although it is pretty awkward. Java 8 improves on this with lambdas among other things. It’s most welcome but not as huge of a shift that was earlier portrayed. Definitely more elegant than in Java 7 but some bending over backwards is still needed to be truly functional. One of the most fierce reviews on this matter comes from Pierre-yves Saumont where in a series of posts he takes a close look at the differences between functional programing paradigms and the way to implement them in Java. So Java or Scala? The adoption of more functional modern paradigms in Java is a sign of approval for Scala who has been playing with lambdas for a while now. Lambdas do make a lot of noise, but there’s a lot more features like traits, lazy evaluation and immutables to name a few, that make quite a difference. Diagnosis: Don’t be distracted by the lambdas, functional programming is still a hassle in Java 8.Reference: 6 Reasons Not to Switch to Java 8 Just Yet from our JCG partner Alex Zhitnitsky at the Takipi blog....
arquillian-logo

RxJava + Java8 + Java EE 7 + Arquillian = Bliss

Microservices are an architectural style where each service is implemented as an independent system. They can use their own persistence system (although it is not mandatory), deployment, language, … Because a system is composed by more than one service, each service will communicate with other services, typically using a lightweight protocol like HTTP and following a Restful Web approach. You can read more about microservices here: http://martinfowler.com/articles/microservices.html Let’s see a really simple example. Suppose we have a booking shop where users can navigate through a catalog and when they find a book which they want to see more information, they click on the isbn, and then a new screen is opened with detailed information of the book and comments about it written by readers. This system may be composed by two services:One service to get book details. They could be retrieved from any legacy system like a RDBMS. One service to get all comments written in a book and in this case that information could be stored in a document base database.The problem here is that for each request that a user does we need to open two connections, one for each service. Of course we need a way do that jobs in parallel to improve the performance. And here lies one problem, how we can deal with this asynchronous requests? The first idea is to use Future class. For two services may be good but if you require four or five services the code will become more and more complex, or for example you may need to get data from one service and using it in another services or adapt the result of one service to be the input of another one. So there is a cost of management of threads and synchronization. It will be awesome to have some way to deal with this problem in a clean and easy way. And this is exactly what RxJava does. RxJava is a Java VM implementation of Reactive Extensions: a library for composing asynchronous and event-based programs by using observable sequences. With RxJava instead of pulling data from an structure, data is pushed to it which reacts with an event that are listened by a subscriber and acts accordantly. You can find more information in https://github.com/Netflix/RxJava. So in this case what we are going to implement is the example described here using RxJava, Java EE 7, Java 8 and Arquillian for testing. This post assumes you know how to write Rest services using Java EE specification. So let’s start with two services: @Singleton @Path("bookinfo") public class BookInfoService {@GET @Path("{isbn}") @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public JsonObject findBookByISBN(@PathParam("isbn") String isbn) {return Json.createObjectBuilder() .add("author", "George R.R. Martin") .add("isbn", "1111") .add("title", "A Game Of Thrones").build(); }} @Singleton @Path("comments") public class CommentsService {@GET @Path("{isbn}") @Produces(MediaType.APPLICATION_JSON) public JsonArray bookComments(@PathParam("isbn") String isbn) {return Json.createArrayBuilder().add("Good Book").add("Awesome").build();}} @ApplicationPath("rest") public class ApplicationResource extends Application { } And finally it is time to create a third facade service which receives communication from the client, sends to both services in parallel a request and finally zip both responses. zip is the process of combining sets of items emitted together via a specified function and sent it back to client (not to be confused with compression!). @Singleton @Path("book") public class BookService {private static final String BOOKSERVICE = "http://localhost:8080/bookservice"; private static final String COMMENTSERVICE = "http://localhost:8080/bookcomments";@Resource(name = "DefaultManagedExecutorService") ManagedExecutorService executor;Client bookServiceClient; WebTarget bookServiceTarget;Client commentServiceClient; WebTarget commentServiceTarget;@PostConstruct void initializeRestClients() {bookServiceClient = ClientBuilder.newClient(); bookServiceTarget = bookServiceClient.target(BOOKSERVICE + "/rest/bookinfo");commentServiceClient = ClientBuilder.newClient(); commentServiceTarget = commentServiceClient.target(COMMENTSERVICE + "/rest/comments");}@GET @Path("{isbn}") @Produces(MediaType.APPLICATION_JSON) public void bookAndComment(@Suspended final AsyncResponse asyncResponse, @PathParam("isbn") String isbn) { //RxJava code shown below } } Basically we create a new service. In this case URLs of both services we are going to connect are hardcoded. This is done for academic purpose but in production-like code you will inject it from a producer class or from properties file or any system you will use for this purpose. Then we create javax.ws.rs.client.WebTarget for consuming Restful Web Service. After that we need to implement the bookAndComment method using RxJava API. The main class used in RxJava is rx.Observable. This class is an observable as his name suggest and it is the responsible of firing events for pushing objects. By default events are synchronous and it is responsible of developer to make them asynchronous. So we need one asynchronous observable instance for each service: public Observable<JsonObject> getBookInfo(final String isbn) { return Observable.create((Observable.OnSubscribe<JsonObject>) subscriber -> {Runnable r = () -> { subscriber.onNext(bookServiceTarget.path(isbn).request().get(JsonObject.class)); subscriber.onCompleted(); };executor.execute(r);}); } Basically we create an Observable that will execute the specified function when a Subscriber subscribes to it. The function is created using a lambda expression to avoid creating nested inner classes. In this case we are returning a JsonObject as a result of calling the bookinfo service. The result is passed to onNext method so subscribers can receive the result. Because we want to execute this logic asynchronously, the code is wrapped inside a Runnable block. Also it is required to call the onCompleted method when all logic is done. Notice that because we want to make observable asynchronous apart of creating a Runnable, we are using an Executor to run the logic in separate thread. One of the great additions in Java EE 7 is a managed way to create threads inside a container. In this case we are using ManagedExecutorService provided by container to span a task asynchronously in a different thread of the current one. public Observable<JsonArray> getComments(final String isbn) { return Observable.create((Observable.OnSubscribe<JsonArray>) subscriber -> {Runnable r = () -> { subscriber.onNext(commentServiceTarget.path(isbn).request().get(JsonArray.class)); subscriber.onCompleted(); };executor.execute(r);}); } Similar to previous but instead of getting book info we are getting an array of comments. Then we need to create an observable in charge of zipping both responses when both of them are available. And this is done by using zip method on Observable class which receives two Observables and applies a function to combine the result of both of them. In this case a lambda expression that creates a new json object appending both responses. @GET @Path("{isbn}") @Produces(MediaType.APPLICATION_JSON) public void bookAndComment(@Suspended final AsyncResponse asyncResponse, @PathParam("isbn") String isbn) { //Calling previous defined functions Observable<JsonObject> bookInfo = getBookInfo(isbn); Observable<JsonArray> comments = getComments(isbn);Observable.zip(bookInfo, comments, (JsonObject book, JsonArray bookcomments) -> Json.createObjectBuilder().add("book", book).add("comments", bookcomments).build() ) .subscribe(new Subscriber<JsonObject>() { @Override public void onCompleted() { } @Override public void onError(Throwable e) { asyncResponse.resume(e); }@Override public void onNext(JsonObject jsonObject) { asyncResponse.resume(jsonObject); } }); } Let’s take a look of previous service. We are using one of the new additions in Java EE which is Jax-Rs 2.0 asynchronous REST endpoints by using @Suspended annotation. Basically what we are doing is freeing server resources and generating the response when it is available using the resume method. And finally a test. We are using Wildfly 8.1 as Java EE 7 server and Arquillian. Because each service may be deployed in different server, we are going to deploy each service in different war but inside same server. So in this case we are going to deploy three war files which is totally easy to do it in Arquillian. @RunWith(Arquillian.class) public class BookTest {@Deployment(testable = false, name = "bookservice") public static WebArchive createDeploymentBookInfoService() { return ShrinkWrap.create(WebArchive.class, "bookservice.war").addClasses(BookInfoService.class, ApplicationResource.class); }@Deployment(testable = false, name = "bookcomments") public static WebArchive createDeploymentCommentsService() { return ShrinkWrap.create(WebArchive.class, "bookcomments.war").addClasses(CommentsService.class, ApplicationResource.class); }@Deployment(testable = false, name = "book") public static WebArchive createDeploymentBookService() { WebArchive webArchive = ShrinkWrap.create(WebArchive.class, "book.war").addClasses(BookService.class, ApplicationResource.class) .addAsLibraries(Maven.resolver().loadPomFromFile("pom.xml").resolve("com.netflix.rxjava:rxjava-core").withTransitivity().as(JavaArchive.class)); return webArchive; }@ArquillianResource URL base;@Test @OperateOnDeployment("book") public void should_return_book() throws MalformedURLException {Client client = ClientBuilder.newClient(); JsonObject book = client.target(URI.create(new URL(base, "rest/").toExternalForm())).path("book/1111").request().get(JsonObject.class);//assertions } } In this case client will request all information from a book. In server part zip method will wait until book and comments are retrieved in parallel and then will combine both responses to a single object and sent back to client. This is a very simple example of RxJava. In fact in this case we have only seen how to use zip method, but there are many more methods provided by RxJava that are so useful as well like take(), map(), merge(), … (https://github.com/Netflix/RxJava/wiki/Alphabetical-List-of-Observable-Operators) Moreover in this example we have seen only an example of connecting to two services and retrieving information in parallel, and you may wonder why not to use Future class. It is totally fine to use Future and Callbacks in this example but probably in your real life your logic won’t be as easy as zipping two services. Maybe you will have more services, maybe you will need to get information from one service and then for each result open a new connection. As you can see you may start with two Future instances but finishing with a bunch of Future.get() methods, timeouts, … So it is in these situations where RxJava really simplify the development of the application. Furthermore we have seen how to use some of the new additions of Java EE 7 like how to develop an asynchronous Restful service with Jax-Rs. In this post we have learnt how to deal with the interconnection between services andhow to make them scalable and less resource consume. But we have not talked about what’s happening when one of these services fails. What’s happening with the callers? Do we have a way to manage it? Is there a way to not spent resources when one of the service is not available? We will touch this in next post talking about fault tolerance. We keep learning, Alex.Bon dia, bon dia! Bon dia al dematí! Fem fora la mandra I saltem corrents del llit. (Bon Dia! – Dàmaris Gelabert)Reference: RxJava + Java8 + Java EE 7 + Arquillian = Bliss from our JCG partner Alex Soto at the One Jar To Rule Them All blog....
devops-logo

Configuring Chef part 1

Below are the first steps in getting started with using chef. The three main components of chef are :                  Work station This is the developer’s machine will be used to author cookbooks and recipes and upload them to the chef-server using the command line utility called knife.Chef-Server This is the main server on which all the cookbooks, roles, policies are uploaded.Node This is the instance which would be provisioned by applying the cookbooks uploaded on the chef-server.So, lets get started:Set up the workstationinstall Chef in your workstation. To do that follow here: http://www.getchef.com/chef/install/Use hosted chef as chef-serverRegister on chef on the chef’s site at http://www.getchef.com You can use hosted Chef, it gives you the option to manage upto 5 nodes for free. Create your user and an organisation.In order to authenticate your workstation with the chef-server we would need these 3 things: -[validator].PEM -knife.rb -[username].PEM So, you need to download these 3 items in your workstation. (You can try reset keys option or download the starter kit.)Set up chef-repo in the workstationOpen your workstation, go to the folder which you want to be your base folder for writing cookbooks. Download the chef-repo from opscode git repo or use the starter kit provided on the chef site. Put these 3 files in your .chef folder inside the chef-repo folder in your workstation (Create .chef, if not already present).Now your workstation is set, authenticated with chef-server and your chef-repo is configured. So lets begin configuring a node on which the cookbooks would be applied.Setting up the nodeThe node could be an EC2 instance or could be provided by any other cloud provider or a vm. The first step is to bootstrap it.Bootstrap any instanceknife bootstrap [ip-address] --sudo -x [user-name] -P [password] -N "[node name]" Or for an AWS instance: knife bootstrap [AWS external IP] --sudo -x ec2-user -i [AWS key] -N "awsnode" These are things that happen during the bootstraping : 1.) Installs chef client and OHAI on the node 2.) Establishes authentication for ssh keys. 3.) Send the 3 keys to chef-client Once the node is bootstrapped, Its now time to author some cookbooks to apply on the node.Download a cookbookWe will download an already existing cookbook of apache webserver, using the following knife command (Remember all the knife commands should be executed from the base chef-repo directory).knife cookbook site download apache This will download the tar.gz zipped folder in your chef-repo, We will need to unzip and copy it to the cookbooks folder. (After unzipping it remove the zipped file) (use tar -xvf [file], then mv command) mv apache ../chef-repo/cookbooks Inside the apache folder we can find the “recipes” folder and inside that there is a file called as “default.rb” This “default.rb” ruby file contains the default recipe required to configure the apache server. Lets have a look at an excerpt from it. .... package "httpd" do action :install end .... So this cookbook is defining the default action on application of this recipe to be “install”, this will install the apache webserver on the node. More details about these we will cover in the next blog, for now lets just upload this coookbook.Upload a cookbook to the chef-serverknife cookbook upload apache Now, the cookbook is uploaded on to the chef-server. Once chef-server has the cookbook we can apply it to any of the nodes which are configured with the chef-server. First lets find what all nodes we have.To see all my nodesknife node listApply the run-list to the nodeIn order to apply the cookbook to a given node , we need to add it to the run-list of the node: knife node run_list add node-name "recipe[apache]" Now we have successfully uploaded a cookbook and added it to the run-list of a node with alias “node-name”. Next time when chef-client will run on the node, it will fetch the details of its run-list from the chef-server and download any cookbook required from the chef-server and run it. For now, lets ssh into the node and run the chef-client manualy to see the results.Run chef-client on the nodesudo chef-client If the chef-client run is successful, we can hit the IP address of the instance to see the default page of apache up and running. If you are using AWS, don’t forget to open the port 80. This was just a basic introduction to chef, in the next blog we will see the killer feature of chef, which is search and go into the details of node object, roles, environments.Reference: Configuring Chef part 1 from our JCG partner Anirudh Bhatnagar at the anirudh bhatnagar blog....
software-development-2-logo

Why You Need a Strategic Data Service

It’s no longer even a question that data is a strategic advantage. Every business is a data business now, and it’s no longer sufficient to store and archive data, you need to be able to act on it: protect, nurture, develop, buy and sell it. Billion-dollar businesses are built around it. But many businesses are running into the reality that their legacy platforms are not built to treat data as such a valuable asset. We continually see companies that are boxed out of opportunities because of software design decisions made years ago without the foresight to anticipate this trend. If you refer back to classic software design principles and best practices you’ll see blueprints for building data layer abstractions and compartmentalizing data functionality from the rest of the system. Yet to this day I see developers questioning why these abstractions are needed–wondering what the payoff is. But the day of reckoning is either here or approaching fast for most companies, and if you don’t have properly constructed data architecture it won’t be capable of supporting the business as it responds to this transition. Based on what I’ve seen, here are my observations on why data services are necessary for just about every business today.  Multiple Data Stores are Key One of the main reasons why any software abstraction exists is to allow you to easily swap one component out for another. You may outgrow a database or realize new business requirements that are outside the capabilities of your current solution, and have to switch. Writing your software to an interface whose underlying implementation can be swapped out allows you to do this. This is called decoupling and it’s just good software design.But we’re entering a world of data store specialization now. Different data stores have unique reasons for being, and they’re good at different things. Some exist for the sole purpose of storing very specific types of data and doing specific things with it. Eventually you’ll probably want or need to use those unique capabilities as a competitive advantage or even key value proposition. We’re seeing a diversification of data sources in the marketplace, particularly in the open source world. Data stores have specialties now. And you will probably want to use more than one of them at some point, if not now.   A perfect example use case for this is a message in a social network. This piece of data has a number of potential uses, and not all of them are easily achievable using a single data store. But that’s ok, because you’re decoupled (right?). Now you can record the message in your social graph database so that you can cluster users by interest and predict relationships. You can search for the message later, after you’ve written it to your distributed search data store, which is perfect for that. And you can do analytics, trending, and dashboards on top of your relational database, which holds the output of your machine learning models. Aside from just features, from a technical perspective you’ll often have to trade off between the consistency, availability, and partition tolerance from the CAP algorithm. So far, no one data store has been able to have its cake and eat it too–but with a Data Service, you CAN. Service with a Smile… A properly built data abstraction layer will probably end up being a stateful service (as opposed to stateless services, which don’t really do anything on its own). These services stand alone in your architecture, components that are capable of talking with other components – and having their own behavior. This comes in VERY handy when dealing with data. For example, some data stores will require you to verify write persistence after the fact if you care about availability. If your service stands on its own it can do this work at the appropriate time, transparently to whatever or whomever is using it. Or, you might want to mine the data as it comes in by having the Data Service pipe the data to machine learning models to categorize it or do sentiment analysis. Maybe you want to look up customer demographic data in Census data based on their location and predict income level using that information.  For implementing this type of data-related behavior, I’m a huge fan of using an actor system in a Data Service. Your Data Service can host an actor system (or whatever executes your workflow logic) to handle your entire data workflow–ensuring availability, mining, transmitting, whatever you need to do with it. You will eventually want to take data you receive and enrich it (if you don’t already today): geolocate it, classify it, compute on it, raise alerts, and so on. For example, you may want to take transactional data as it comes into the system and roll it up at different intervals so that you can run machine learning models on it to predict future trends. This is the perfect place to do it. The brains of your data service doesn’t have to be an actor model, there are plenty of other options out there for carrying out data work. Hadoop is a classic example, but newcomers like Spark and Storm will accomplish many of the same things. Most of these frameworks have hooks available to extend them, which is super important if they’re going to serve you well into the future. (Again, thought, the face that it’s compartmentalized into a Data Service will let you use even more than one of these if you need to.) The key is that the data processing and workflow should be controlled and orchestrated by the Data Service itself–the users of the Data Service shouldn’t need to worry about what happens to the data, they should just have the ability to get the data in and read it back out in some form. If you like it then you shoulda put an API on it Want to be a pure-play data company? These are the companies who only provide a public API and don’t have to support a complex user interface. Having a properly-built Data Service allows you to do this very easily. Many application frameworks will let you turn an interface into a standards-compliant REST API with almost zero work. Just stand up the service in a Web server and let the framework look at it and turn it into an API. Even if your company isn’t selling the API outright your customers will surely love it, if not demand it.It’s always disappointing to see companies building an API as a separate project when they could have had it almost for nothing. It’s an indication of a code base that wasn’t properly built in the first place–technical debt that has to be addressed before the business can move forward. The End? Certainly these are not the only reasons to locate your Data Service centrally in your architectural blueprint. (But seriously, you need more reasons?) I’d love to hear your comments and thoughts on this in the Hacker News thread.Reference: Why You Need a Strategic Data Service from our JCG partner Jason Kolb at the Jason Kolb blog blog....
software-development-2-logo

Seriously. The Devil Made me do It!

Just as eternal as the cosmic struggle between good and evil is the challenge between our two natures. Religion aside, we have two natures, the part of us that:thinks things through; make good or ethical decisions a.k.a. our angelic nature react immediately; make quick but often wrong decisions a.k.a. our devil natureGuess the powers that be left a bug in our brains so that it emphasizes fast decisions over good / ethical decisions. Quite often we make sub-optimal or ethically ambiguous decisions under pressure. You decide…  Situation: Your manager comes to you and says that something urgent needs to be fixed right away. Turns out the steaming pile of @#$%$ that you inherited from Bob is malfunctioning again. Of course Bob created the mess and then conveniently left the company; in fact, the code is so bad that the work-arounds have work-arounds. Bite the bullet, start re-factoring the program when things goes wrong.  It will take more time up front, but over time the program will become stable. Find another fast workaround and defer the problem to the future.  Find a good reason why the junior member of the team should inherit this problem.  Situation: You’ve got a challenging section of code to write and not much time to write it. Get away from the computer, think things through. Get input from your peers, maybe they have seen this problem before. Then plan the pathways out and write the code once cleanly. Taking time to plan seems counter intuitive, but it will save time. (see Not Planning is for Losers) Naw, just sit at the keyboard and bang it out already. How difficult can it be?  Situation: The project is late and you know that your piece is behind schedule. However, you also know that several other pieces are late as well. Admit that you are late and that the project can’t finish by the deadline.  Give the project manager and senior managers a chance to make a course correction. Say that you are on schedule but you are not sure that other people (be vague here) will have their pieces ready on time and it could cause you to become late. This situation is also known as Schedule Chicken…  Situation: You have been asked to estimate how long a critical project will take. You are only been given a short time to come up with the estimate. Tell the project manager that getting a proper estimate takes longer than a few hours. Without proper estimates the project is likely to be severely underestimated and this will come back to bite you and the project manager in the @$$.  (See Who needs Formal Measurement?) Tell the project manager exactly the date that senior management wants the project to be finished by.  You know this is what they want to hear, why deal with the problem now? This will become the project manager’s problem when the project is late.  The statistics show that we don’t listen to our better (angelic?) natures very often. So when push comes to shove and you have to make a sub-optimal or less than ethical decision, just remember: The devil made you do it! Run into other common situations? email meReference: Seriously. The Devil Made me do It! from our JCG partner Dalip Mahal at the Accelerated Development blog....
apache-cassandra-logo

Custom Cassandra Data Types

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers of Cassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail. Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending the org.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:     public ByteBuffer fromString(final String) throws MarshalException public TypeSerializer getSerializer() public int compare(Object, Object) This post’s example implementation of AbstractType is shown in the next code listing. UnitedStatesState.java – Extends AbstractType package dustin.examples.cassandra.cqltypes;import org.apache.cassandra.db.marshal.AbstractType; import org.apache.cassandra.serializers.MarshalException; import org.apache.cassandra.serializers.TypeSerializer;import java.nio.ByteBuffer;/** * Representation of a state in the United States that * can be persisted to Cassandra database. */ public class UnitedStatesState extends AbstractType { public static final UnitedStatesState instance = new UnitedStatesState();@Override public ByteBuffer fromString(final String stateName) throws MarshalException { return getStateAbbreviationAsByteBuffer(stateName); }@Override public TypeSerializer getSerializer() { return UnitedStatesStateSerializer.instance; }@Override public int compare(Object o1, Object o2) { if (o1 == null && o2 == null) { return 0; } else if (o1 == null) { return 1; } else if (o2 == null) { return -1; } else { return o1.toString().compareTo(o2.toString()); } }/** * Provide standard two-letter abbreviation for United States * state whose state name is provided. * * @param stateName Name of state whose abbreviation is desired. * @return State's abbreviation as a ByteBuffer; will return "UK" * if provided state name is unexpected value. */ private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName) { final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN"; String abbreviation; try { abbreviation = upperCaseStateName.length() == 2 ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation() : State.valueOf(upperCaseStateName).getStateAbbreviation(); } catch (Exception exception) { abbreviation = State.UNKNOWN.getStateAbbreviation(); } return ByteBuffer.wrap(abbreviation.getBytes()); } } The above class listing references the State enum, which is shown next. State.java package dustin.examples.cassandra.cqltypes;/** * Representation of state in the United States. */ public enum State { ALABAMA("Alabama", "AL"), ALASKA("Alaska", "AK"), ARIZONA("Arizona", "AZ"), ARKANSAS("Arkansas", "AR"), CALIFORNIA("California", "CA"), COLORADO("Colorado", "CO"), CONNECTICUT("Connecticut", "CT"), DELAWARE("Delaware", "DE"), DISTRICT_OF_COLUMBIA("District of Columbia", "DC"), FLORIDA("Florida", "FL"), GEORGIA("Georgia", "GA"), HAWAII("Hawaii", "HI"), IDAHO("Idaho", "ID"), ILLINOIS("Illinois", "IL"), INDIANA("Indiana", "IN"), IOWA("Iowa", "IA"), KANSAS("Kansas", "KS"), LOUISIANA("Louisiana", "LA"), MAINE("Maine", "ME"), MARYLAND("Maryland", "MD"), MASSACHUSETTS("Massachusetts", "MA"), MICHIGAN("Michigan", "MI"), MINNESOTA("Minnesota", "MN"), MISSISSIPPI("Mississippi", "MS"), MISSOURI("Missouri", "MO"), MONTANA("Montana", "MT"), NEBRASKA("Nebraska", "NE"), NEVADA("Nevada", "NV"), NEW_HAMPSHIRE("New Hampshire", "NH"), NEW_JERSEY("New Jersey", "NJ"), NEW_MEXICO("New Mexico", "NM"), NORTH_CAROLINA("North Carolina", "NC"), NORTH_DAKOTA("North Dakota", "ND"), NEW_YORK("New York", "NY"), OHIO("Ohio", "OH"), OKLAHOMA("Oklahoma", "OK"), OREGON("Oregon", "OR"), PENNSYLVANIA("Pennsylvania", "PA"), RHODE_ISLAND("Rhode Island", "RI"), SOUTH_CAROLINA("South Carolina", "SC"), SOUTH_DAKOTA("South Dakota", "SD"), TENNESSEE("Tennessee", "TN"), TEXAS("Texas", "TX"), UTAH("Utah", "UT"), VERMONT("Vermont", "VT"), VIRGINIA("Virginia", "VA"), WASHINGTON("Washington", "WA"), WEST_VIRGINIA("West Virginia", "WV"), WISCONSIN("Wisconsin", "WI"), WYOMING("Wyoming", "WY"), UNKNOWN("Unknown", "UK");private String stateName;private String stateAbbreviation;State(final String newStateName, final String newStateAbbreviation) { this.stateName = newStateName; this.stateAbbreviation = newStateAbbreviation; }public String getStateName() { return this.stateName; }public String getStateAbbreviation() { return this.stateAbbreviation; }public static State fromAbbreviation(final String candidateAbbreviation) { State match = UNKNOWN; if (candidateAbbreviation != null && candidateAbbreviation.length() == 2) { final String upperAbbreviation = candidateAbbreviation.toUpperCase(); for (final State state : State.values()) { if (state.stateAbbreviation.equals(upperAbbreviation)) { match = state; } } } return match; } } We can also provide an implementation of the TypeSerializer interface returned by the getSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implements TypeSerializer via extension of AbstractTypeSerializer: UnitedStatesStateSerializer.java – Implements TypeSerializer package dustin.examples.cassandra.cqltypes;import org.apache.cassandra.serializers.AbstractTextSerializer; import org.apache.cassandra.serializers.MarshalException;import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets;/** * Serializer for UnitedStatesState. */ public class UnitedStatesStateSerializer extends AbstractTextSerializer { public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();private UnitedStatesStateSerializer() { super(StandardCharsets.UTF_8); }/** * Validates provided ByteBuffer contents to ensure they can * be modeled in the UnitedStatesState Cassandra/CQL data type. * This allows for a full state name to be specified or for its * two-digit abbreviation to be specified and either is considered * valid. * * @param bytes ByteBuffer whose contents are to be validated. * @throws MarshalException Thrown if provided data is invalid. */ @Override public void validate(final ByteBuffer bytes) throws MarshalException { try { final String stringFormat = new String(bytes.array()).toUpperCase(); final State state = stringFormat.length() == 2 ? State.fromAbbreviation(stringFormat) : State.valueOf(stringFormat); } catch (Exception exception) { throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState."); } } } With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation’s lib directory as demonstrated in the next screen snapshot.With the JAR containing the custom CQL data type classes implementations in the Cassandra installation’s lib directory, Cassandra should be restarted so that it will be able to “see” these custom data type definitions. The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState. createAddress.cql CREATE TABLE us_address ( id uuid, street1 text, street2 text, city text, state 'dustin.examples.cassandra.cqltypes.UnitedStatesState', zipcode text, PRIMARY KEY(id) ); The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.The above screen snapshot demonstrates that the custom type dustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of the us_address table. A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:Note that while the INSERT statement inserted “New York” for the state, it is stored as “NY”.If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as “UK” (for unknown) [see the implementation of this in UnitedStatesState.getStateAbbreviationAsByteBuffer(String)]. One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or “UK” for unknown. No other values can be inserted into that column’s value. Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database. Conclusion This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.Reference: Custom Cassandra Data Types from our JCG partner Dustin Marx at the Inspired by Actual Events blog....
spring-logo

Auditing infrastructure for your app using Spring AOP, Custom annotations and Reflection

The next post will demonstrate how to write simple auditing using Spring AOP and annotations. The auditing mechanism will be clean, efficient and easy to maintain (and Kewwl!). I will demonstrate my example on a User management system (I assume you have general knowledge on reflection and AOP). We start with simple DB table to hold our auditing data:       `id`, `username` `user_type` `action` `target_user` `date` `user_ip` We need to populate 4 main fields(Username, UserType, Action, TargetUser) *Username – the user who performs the action *TargetUser – Target user the action is performed. Now let’s create new annotation to mark our wanna-be-audit method. We’re going to be very “creative” and use: @AuditAble @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.METHOD,ElementType.TYPE}) public @interface Auditable { AuditingActionType actionType(); } Annotated @AuditAble method example: @Override @Transactional @Auditable(actionType = AuditingActionType.INTERNAL_USER_REGISTRATION) public void createInternalUser(UserDTO userDTO) { userCreationService.createInternalUserOnDB(userDTO);} Our future Aspect (aop) will collect some auditing data from the method param’s using DTO’s. In our case the target username and the actionType will be collected as our auditing info. For that I created another annotation AuditingTargetUsername: @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.FIELD, ElementType.TYPE}) public @interface AuditingTargetUsername { String value() default ""; } So inside UserDTO we got: public abstract class UserDTO implements Serializable {@NotNull @AuditingTargetUsername private String userName;...} We annotated the userName with @AuditingTargetUsername. That information will be collected later on. Now let’s create our AOP’s aspect. Here all the auditing logic is collected and performed (Intercepting @Auditable methods, Extracting information from annotations, Using repository to save the final auditing record): @Aspect public class AuditingAspect {....@After("@annotation(auditable)") @Transactional public void logAuditActivity(JoinPoint jp, Auditable auditable) { String targetAuditingUser; String actionType = auditable.actionType().getDescription();String auditingUsername = Authentication auth = SecurityContextHolder.getContext().getAuthentication().getName() role = userService.getCurrentUser(false).getPermissionsList().toString(); auditingUsernameIp = request.getRemoteAddr(); } logger.info( "Auditing information. auditingUsername=" + auditingUsername + ", actionType=" + actionType + ", role=" + role + ", targetAuditingUser=" + targetAuditingUser + " auditingUsernameIp=" + auditingUsernameIp ); auditingRepository .save(new AuditingEntity(auditingUsername, role, actionType, targetAuditingUser, auditingUsernameIp, new Timestamp(new java.util.Date().getTime()))); } ill explains the main code areas: Pointcut – all @Auditable.annotations Advice – Type @After (We want to audit after the method is invoked) ActionType value is retrieved via the annotated method’s declaration: @Auditable(actionType = AuditingActionType.INTERNAL_USER_REGISTRATION) auditingUsername is the current user who performs the action (in our case the logged in user). I retrieved that via SecurityContext(Spring Security). Now we will extract the @targetAuditingUser field via reflection on runtime: targetAuditingUser = extractTargetAuditingUser(jp.getArgs()); ...public String extractTargetAuditingUserFromAnnotation(Object obj) { ... result = getTargetAuditingUserViaAnnotation(obj);...} Thats the logic to extract the annotated fields via reflection: private String getTargetAuditingUserViaAnnotation(Object obj) { class cl=obj.getClass() String result = null; try { for (Field f : cl.getDeclaredFields()) for (Annotation a : f.getAnnotations()) { if (a.annotationType() == AuditingTargetUsername.class) { f.setAccessible(true); Field annotatedFieldName = cl.getDeclaredField(f.getName()); annotatedFieldName.setAccessible(true); String annotatedFieldVal = (String) annotatedFieldName.get(obj); logger.debug("Found auditing annotation. type=" + a.annotationType() + " value=" + annotatedFieldVal.toString()); result = annotatedFieldVal; } } } catch (Exception e) { logger.error("Error extracting auditing annotations from obj" + obj.getClass()); } return result; } Result on DB:That’s it. We’ve got clean auditing infrastructure all you need is to annotate your method with @Auditable and annotate inside your DTO’s/Entities the desired information to be audited. Idan.Reference: Auditing infrastructure for your app using Spring AOP, Custom annotations and Reflection from our JCG partner Idan Fridman at the IdanFridman.com blog....
postgresql-logo

PostgreSQL’s Table-Valued Functions

Table-valued functions are an awesome thing. Many databases support them in one way or another and so does PostgreSQL. In PostgreSQL, (almost) everything is a table. For instance, we can write:                 CREATE OR REPLACE FUNCTION f_1 (v1 INTEGER, v2 OUT INTEGER) AS $$ BEGIN v2 := v1; END $$ LANGUAGE plpgsql; … and believe it or not, this is a table! We can write: select * from f_1(1); And the above will return: +----+ | v2 | +----+ | 1 | +----+ It’s kind of intuitive if you think about it. We’re just pushing out a single record with a single column. If we wanted two columns, we could’ve written: CREATE OR REPLACE FUNCTION f_2 (v1 INTEGER, v2 OUT INTEGER, v3 OUT INTEGER) AS $$ BEGIN v2 := v1; v3 := v1 + 1; END $$ LANGUAGE plpgsql; … and then: select * from f_2(1); And the above will return: +----+----+ | v2 | v3 | +----+----+ | 1 | 2 | +----+----+ That’s useful, but those are just single records. What if we wanted to produce a whole table? It’s easy, just change your functions to actually return TABLE types, instead of using OUT parameters: CREATE OR REPLACE FUNCTION f_3 (v1 INTEGER) RETURNS TABLE(v2 INTEGER, v3 INTEGER) AS $$ BEGIN RETURN QUERY SELECT * FROM ( VALUES(v1, v1 + 1), (v1 * 2, (v1 + 1) * 2) ) t(a, b); END $$ LANGUAGE plpgsql; When selecting from the above very useful function, we’ll get a table like so: select * from f_3(1); And the above will return: +----+----+ | v2 | v3 | +----+----+ | 1 | 2 | | 2 | 4 | +----+----+ And we can LATERAL join that function to other tables if we want: select * from book, lateral f_3(book.id) … which might yield, for example: +----+--------------+----+----+ | id | title | v2 | v3 | +----+--------------+----+----+ | 1 | 1984 | 1 | 2 | | 1 | 1984 | 2 | 4 | | 2 | Animal Farm | 2 | 4 | | 2 | Animal Farm | 4 | 6 | +----+--------------+----+----+ In fact, it appears that the keyword LATERAL is optional in this case, at least for PostgreSQL. Table-valued functions are very powerful! Discovering table-valued functions From jOOQ’s schema reverse-engineering perspective, things might get a bit tricky as can be seen in this Stack Overflow question. PostgreSQL deals with OUT parameters in a very similar way as with TABLE return types. This can be seen in the following query against the INFORMATION_SCHEMA: SELECT r.routine_name, r.data_type, p.parameter_name, p.data_type FROM information_schema.routines r JOIN information_schema.parameters p USING (specific_catalog, specific_schema, specific_name); … and the output: routine_name | data_type | parameter_name | data_type -------------+-----------+----------------+---------- f_1 | integer | v1 | integer f_1 | integer | v2 | integer f_2 | record | v1 | integer f_2 | record | v2 | integer f_2 | record | v3 | integer f_3 | record | v1 | integer f_3 | record | v2 | integer f_3 | record | v3 | integer As you can see, the output is really indistinguishable from that perspective. Luckily, we can also join the pg_catalog.pg_proc table, which contains the relevant flag to indicate if a function returns a set or not: SELECT r.routine_name, r.data_type, p.parameter_name, p.data_type, pg_p.proretset FROM information_schema.routines r JOIN information_schema.parameters p USING (specific_catalog, specific_schema, specific_name) JOIN pg_namespace pg_n ON r.specific_schema = pg_n.nspname JOIN pg_proc pg_p ON pg_p.pronamespace = pg_n.oid AND pg_p.proname = r.routine_name ORDER BY routine_name, parameter_name; Now, we’re getting: routine_name | data_type | parameter_name | data_type | proretset -------------+-----------+----------------+-----------+---------- f_1 | integer | v1 | integer | f f_1 | integer | v2 | integer | f f_2 | record | v1 | integer | f f_2 | record | v2 | integer | f f_2 | record | v3 | integer | f f_3 | record | v1 | integer | t f_3 | record | v2 | integer | t f_3 | record | v3 | integer | t We can see that f_3 is the only function actually returning a set of record, unlike f_1 and f_2, which only return a single record. Now, remove all those parameters that are not OUT parameters, and you have your table type: SELECT r.routine_name, p.parameter_name, p.data_type, row_number() OVER ( PARTITION BY r.specific_name ORDER BY p.ordinal_position ) AS ordinal_position FROM information_schema.routines r JOIN information_schema.parameters p USING (specific_catalog, specific_schema, specific_name) JOIN pg_namespace pg_n ON r.specific_schema = pg_n.nspname JOIN pg_proc pg_p ON pg_p.pronamespace = pg_n.oid AND pg_p.proname = r.routine_name WHERE pg_p.proretset AND p.parameter_mode = 'OUT' ORDER BY routine_name, parameter_name; Which will give us: routine_name | parameter_name | data_type | position | -------------+----------------+-----------+----------+ f_3 | v2 | integer | 1 | f_3 | v3 | integer | 2 | How to run such queries in jOOQ? Once the above code is generated, you can easily call the table-valued function in any jOOQ query. Consider again the BOOK example (in SQL): select * from book, lateral f_3(book.id) … and with jOOQ: DSL.using(configuration) .select() .from(BOOK, lateral(F_3.call(BOOK.ID))) .fetch(); The returned records then contain values for: record.getValue(F_3.V2); record.getValue(F_3.V3); All that typesafety is only available in the upcoming jOOQ 3.5, for free! (SQL Server, Oracle, and HSQLDB table-valued functions are already supported!)Reference: PostgreSQL’s Table-Valued Functions from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....
java-logo

Top 5 Java Performance Tuning Books – Best of Lot, Must read

Why Java developer should read a book on Performance tuning? When I first faced this question long time back, I thought I will do it later, but I never get back to that for a long time. I realize my mistake of having lack of knowledge on performance measurement, tuning and finding bottleneck only when I faced serious performance and scalability issues on our mission critical server side financial application written in Java. It’s true that when you really need it you learn most, but those times are not the best time to learn fundamentals, in fact those times are to apply and correct your misunderstanding. This is why I am sharing these Java performance books to all Java programmers and suggesting them to take sometime and go through at-least one book in full. By the way these books are in addition to my 5 must read books for Java programmers. Remember knowledge of Performance tuning is one important aspect of Senior Java developers, and can separate you from crowd. Ever since Java was introduced, almost 20 years back, it has faced criticism on being slow and lacking performance. Today, I don’t think Java is anywhere behind in terms of performance to native languages. Given Java’s ability to natively compile hot code using JIT (Just in time Compiler), it is almost at par with native applications written in C and C++, but a lot can be done by following best practices, avoiding common performance pitfalls and using latest tools and techniques. In this article, I am going to introduce with five + one good books on Java performance, which will not only teach you what to measure, how to measure but also explains fundamentals and concept behind those issues. You will not only learn about System and JVM on which your Java application run but also how to to writer faster coding using Java API. So what are we waiting for, let’s begin our journey to land of great books on Java performance tuning.Java Performance The Definitive Guide By Scott Oaks In one word, this is currently THE best book on Java Performance tuning. Multiple reasons for that, one of them is, this is the most updated book, covers up-to Java 7 update 40. In order to learn performance tuning, you should know tools, process, options and most importantly avoiding common performance pitfalls. This book scores good on this point, it has chapter introducing all the tools, a Java performance engineer should be aware of, including the one which is added in Java 7u40, e.g. Flight Recorder and Java Mission Control. It also has good chapters on explaining various garbage collection algorithms e.g. Concurrent Mark Sweep (CMS) and  G1 Garbage collector. You will learn how each of them works in different conditions, how to monitor them and how to tune them. It also include a full chapter on heap analysis and optimization. This will teach you common things like how to take heap dumps and histograms in Java, and then introduces many ways to decrease your heap memory footprint. It also has a chapter on JDBC and JPA performance. Key point it teaches you that how choosing the proper JDBC / JPA methods may far outweigh the gains from the SQL queries tuning.  Similarly it has a complete chapter explaining about multi-threading issues, pitfalls and impact on performance. It includes advanced topics like ForkJoinPool, and Java 8 Streams. It also touch base on cost of synchronization and false sharing, tuning JVM threads e.g. thread stack size, configure biased locking, thread priorities and thread spinning. By the way best is yet to be introduced, what I I like most in this book is Chapter 12. This chapter presents some classic core Java tuning tips and their impact. This includes buffered I/O, class loading, random number generation, JNI, exceptions, String performance, logging, Java Collections API, Java 8 lambdas vs anonymous classes and finally Java 8 stream and multiple filter performance. This is actually the fist chapter I read and I fall in love with its content and style. If you like to quickly gauge the book by yourself, I suggest starting with this chapter. It also touch base on Java EE performance, explaining possible issues with XML and JSON parsing, and object serialization.Java Performance By Binu John, Charlie Hunt This was my favourite Java performance book from long time, until recently when I read Java Performance The Definitive Guide By Scott Oaks. This book is very much similar like the one which replaced it from top. It has chapters explaining on how to take measurements of performance, explaining tools necessary to measure CPU, Memory and IO. Chapters on explaining How Garbage collection works, different GC algorithms e.g. Serial vs Parallel Garbage collectors, Concurrent Mark Sweep collectors etc. Binu John and Charlie Hunt has done excellent job to how to construct experiments that identify opportunities for optimization, interpret the results, and take effective action. To give you some credibility and hint what you can expect in this book, Charlie Hunt is the JVM performance lead engineer at Oracle. He is responsible for improving the performance of the HotSpot JVM and Java SE class libraries. He has also been involved in improving the performance of the Oracle GlassFish and Oracle WebLogic Server and Binu John is a senior performance engineer at Ning, Inc., where he focuses on improving the performance and scalability of the Ning platform to support millions of page views per month. Before that, he spent more than a decade working on Java-related performance issues at Sun Microsystems, where he served on Sun’s Enterprise Java Performance team. If you haven’t read any book on Java performance tuning and want to build a good foundation on dealing with performance problems, this is the book to buy. It’s worth of every single penny spent.System Performance : Enterprise and the Cloud By Brendan Gregg Systems performance analysis is an important skill for all computer users, whether you’re trying to understand why your laptop is slow, or optimizing the performance of a large-scale production environment. It is the study of both operating system (kernel) and application performance, and this book will tell you all you need to know about Linux performance monitoring and tuning. Programmer starts with optimizing algorithms, data structure, JVM, Garbage collectors but they eventually reach to a point where System performances comes into play. You may want to know why disk operations were so quick on your development box, but became a major issue on the production box, how CPU caching affect so much to your application, how can you leverage L1, L2 and L3 Cache, and Physical memory available in your machine. In my opinion, this is the book for every programmer and not just for Java programmer. Knowing how your system work, how paging, swapping and virtual memory work , how CPU gets data from disk,  how different kind of disk can make profound impact on IO bound Java applications is very important for any developer genuinely interested in performance tuning. I had often said, learn JVM but I can now say you must know your System well. Knowing operating system basics, system calls, memory, CPU, network and disk IO, caches will certainly goes long way and helps you to write high performance application in any programming language, including Java.Java Performance Tuning by Jack Shirazi This is one of the premier book on Java Performance tuning and writing code which execute faster in Java. When Jack first wrote this book, there was hardly anything available. Even Second edition of this book is almost 11 years old, released around 2003. Why would I recommend you something which has written 11 years back? because it contains some advices, practices and techniques which is timeless and worth knowing even today. Though you can not follow this book on its original spirit and you should always keep in mind that most of the things mentioned in this book is already addressed in subsequent Java releases. The best advice is in chapter 1 itself, this was the book which told me that performance of Java application will be limited by three main factors CPU, Memory and IO (both Disk and Network), and surprisingly many developers who knows how to use profiler doesn’t know this basic fact. It’s classical advice “Don’t Tune What You Don’t Need to Tune” is the best performance advice, I have ever read. You can take a look at Performance Check-list given in this book to get an understanding of what matters and what not. Though this book also introduce many profiling tools but I suggest latest books like  Java Performance The Definitive Guide By Scott Oaks and Java Performance By Binu John, Charlie Hunt has more up-to date list. You should remember that this book does not cover only up to JDK 1.4.0. but you should check chapters 4 – 12, which cover various performance tips and techniques you can apply to Java code. Jack also has a site http://www.javaperformancetuning.com/, which is great resource to learn performance tuning for Java developers.Java Performance and Scalability: A Quantitative Approach by Henry H. Liu Before Introducing the book, let me introduce the author, he is a PHD, works at Vmware and specialized in writing high performance and scalable enterprise Java applications, but I think he is more known for his work on Software Performance and Scalability: A Quantitative Approach. You can see this book a more specialized version of his earlier best seller.  As name suggest this books is focused on performance and scalability of Java application. This book is good for Java developers, architects and even managers. It is divided in two main parts, first part deals with basics of Java performance and Scalability and second part presents practices to improve performance and scalability. The basics part contains four chapters each of them separately explores scalability of software programs, computer hardware, and Java Virtual Machine. Second part contains chapters exploring how going from 32-bit to 64-bit affect performance and scalability of Java application. Chapter 6 is probably the most important chapter which explains how to tune Java for best possible performance and scalability. It introduces methodologies, practices, tools and tuning Java application keeping scalability in mind. Chapter 7 is another important chapter which explains how design, algorithms and implementations affects performance and scalability of any Java application. It also covers how to perform bottleneck analysis. Good thing is that he explains all this with sample programs, so you can follow guidelines while reading. Overall its a very good and unique book for Java performance engineer, and if you love to read multiple books to gain insights, this is the one you can read along with Java Performance The Definitive Guide and System Performance : Enterprise and the Cloud.The Well-Grounded Java Developer This is the bonus book for my readers, I won’t say this book only focuses on Java performance tuning but I would say this is the book every modern Java developer should have in his shelf. Ben Evans and Martjin Verburg doesn’t need any introduction. They are well known Java experts and founder of jClarity, which promises to solves performance problems in cloud environments. They have many years of experience in Java, which reflects in their book The Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming. I first come across this books on 2012, and after reading sample chapters, I was convinced to buy this book. This is the must have book for modern day Java developer. It explains new changes on Java including those in JDK 7 e.g. try-with-resources, NIO2.0, and concurrency changes; but most importantly It explains why it is so expensive to add new features to the JVM. Adding new library extensions such as fork/join or syntactic sugar like switch-on-string is relatively easy, but adding a JVM instruction like invokedynamic is very costly. Probably the best thing about this book is that it doesn’t stop at Java, and go one step further to introduce modern day JVM language e.g. Scala, Clojure and Groovy. It touches functional programming with new JVM languages and modern approaches to test, build and contentious integration of Java applications.That’s all on this list of good Java performance tuning books. I have recommended Effective Java a lot of time and as one must have book for Java developers, but same time I have also found that you should have a book dedicated to Java Performance tuning. After some years of work and experience in Java, you are bound to face performance challenges, and at this time you should at-least know fundamental, tools and process of finding bottleneck and improving performance of Java application. So, if you haven’t read any Java performance book, this is the time to read one.Reference: Top 5 Java Performance Tuning Books – Best of Lot, Must read from our JCG partner Javin Paul at the Javarevisited blog....
enterprise-java-logo

Hibernate Identity, Sequence and Table (Sequence) generator

Introduction In my previous post I talked about different database identifier strategies. This post will compare the most common surrogate primary key strategies:IDENTITY SEQUENCE TABLE (SEQUENCE)    IDENTITY The IDENTITY type (included in the SQL:2003 standard) is supported by:SQL Server MySQL (AUTO_INCREMENT) DB2 HSQLDBThe IDENTITY generator allows an integer/bigint column to be auto-incremented on demand. The increment process happens outside of the current running transaction, so a roll-back may end-up discarding already assigned values (value gaps may happen). The increment process is very efficient since it uses a database internal lightweight locking mechanism as opposed to the more heavyweight transactional course-grain locks. The only drawback is that we can’t know the newly assigned value prior to executing the INSERT statement. This restriction is hinderingthe “transactional write behind” flushing strategy adopted by Hibernate. For this reason Hibernates disables the JDBC batch support for entities using the IDENTITY generator. For the following examples we’ll enable Session Factory JDBC batching: properties.put("hibernate.order_inserts", "true"); properties.put("hibernate.order_updates", "true"); properties.put("hibernate.jdbc.batch_size", "2"); Let’s define an Entity using the IDENTITY generation strategy: @Entity(name = "identityIdentifier") public static class IdentityIdentifier {@Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; } Persisting 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new IdentityIdentifier()); } session.flush(); return null; } }); Will execute one query after the other (there is no JDBC batching involved): Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Aside from disabling JDBC batching, the IDENTITY generator strategy doesn’t work with the Table per concrete class inheritance model, because there could be multiple subclass entities having the same identifier and a base class query will end up retrieving entities with the same identifier (even if belonging to different types). SEQUENCE The SEQUENCE generator (defined in the SQL:2003 standard) is supported by:Oracle SQL Server PostgreSQL DB2 HSQLDBA SEQUENCE is a database object that generates incremental integers on each successive request. SEQUENCES are much more flexible than IDENTIFIER columns because:A SEQUENCE is table free and the same sequence can be assigned to multiple columns or tables A SEQUENCE may preallocate values to improve performance A SEQUENCE may define an incremental step, allowing us to benefit from a “pooled” Hilo algorithm A SEQUENCE doesn’t restrict Hibernate JDBC batching A SEQUENCE doesn’t restrict Hibernate inheritance modelsLet’s define a Entity using the SEQUENCE generation strategy: @Entity(name = "sequenceIdentifier") public static class SequenceIdentifier { @Id @GenericGenerator(name = "sequence", strategy = "sequence", parameters = { @org.hibernate.annotations.Parameter(name = "sequenceName", value = "sequence"), @org.hibernate.annotations.Parameter(name = "allocationSize", value = "1"), }) @GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE) private Long id; } I used the “sequence” generator because I didn’t want Hibernate to choose a SequenceHiLoGenerator or a SequenceStyleGenerator on our behalf. Adding 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new SequenceIdentifier()); } session.flush(); return null; } }); Generate the following queries: Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[insert into sequenceIdentifier (id) values (?)][1]} {[insert into sequenceIdentifier (id) values (?)][2]} Query:{[insert into sequenceIdentifier (id) values (?)][3]} {[insert into sequenceIdentifier (id) values (?)][4]} Query:{[insert into sequenceIdentifier (id) values (?)][5]} This table the inserts are batched, but we know have 5 sequence calls prior to inserting the entities. This can be optimized by using a HILO algorithm. TABLE (SEQUENCE) There is another database independent alternative to generating sequences. One or multiple tables can be used to hold the identifier sequence counter. But it means trading write performance for database portability. While IDENTITY and SEQUENCES are transaction-less, using a database table mandate ACID, for synchronizing multiple concurrent id generation requests. This is made possible by using row-level locking which comes at a higher cost than IDENTITY or SEQUENCE generators. The sequence must be calculated in a separate database transaction and this requires the IsolationDelegate mechanism, which has support for both local (JDBC) and global(JTA) transactions.For local transactions, it must open a new JDBC connection, therefore putting more pressure on the current connection pooling mechanism. For global transactions, it requires suspending the current running transaction. After the sequence value is generated, the actual transaction has to be resumed. This process has its own cost, so the overall application performance might be affected.Let’s define a Entity using the TABLE generation strategy: @Entity(name = "tableIdentifier") public static class TableSequenceIdentifier {@Id @GenericGenerator(name = "table", strategy = "enhanced-table", parameters = { @org.hibernate.annotations.Parameter(name = "table_name", value = "sequence_table") }) @GeneratedValue(generator = "table", strategy=GenerationType.TABLE) private Long id; } I used the newer “enhanced-table” generator, because the legacy “table” generator has been deprecated. Adding 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new TableSequenceIdentifier()); } session.flush(); return null; } }); Generate the following queries: Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[insert into sequence_table (sequence_name, next_val) values (?,?)][default,1]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][2,1,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][3,2,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][4,3,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][5,4,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][6,5,default]} Query:{[insert into tableIdentifier (id) values (?)][1]} {[insert into tableIdentifier (id) values (?)][2]} Query:{[insert into tableIdentifier (id) values (?)][3]} {[insert into tableIdentifier (id) values (?)][4]} Query:{[insert into tableIdentifier (id) values (?)][5]} The table generator allows JDBC batching but it resorts to SELECT FOR UPDATE queries. The row level locking is definitely less efficient than using a native IDENTITY or SEQUENCE. So, based on your application requirements you have multiple options to choose from. There isn’t one single winning strategy, each one having both advantages and disadvantages.Code available on GitHub.Reference: Hibernate Identity, Sequence and Table (Sequence) generator from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books