Featured FREE Whitepapers

What's New Here?


Bloom Filter Implementation in Java on GitHub

A Bloom Filter is a type of Set Data Structure. For those unaware, a Set Data Structure only has one main method, contains. It’s only used to determine if a specific element is included in a group of elements or not. Most Data Structures (like a Hash Map, Linked List, or Array) can create this function fairly easily. You simply need to search the data structure for the specific element. However, these types of Data Structures can pose a problem when the number of elements in the set exceeds the amount of memory available, as these types of data structures store all of the elements in memory. This is where the Bloom Filter becomes interesting. As the Bloom Filter doesn’t actually store all of the elements of the set in memory. Instead of placing each element into a the Data Structure, the Bloom Filter only stores an array of bytes. For each element added to the Bloom Filter, k bits are set in its array. These bits are typically determined by a hashing function. To check if an element is within the set, you simply check if the bits that would normally be one for this item are actually one. If they all are one (instead of zero), then the item is within the set. If any of the bits are not one, then the item is not within the set. With every Data Structure there is definitely a draw back to the Bloom Filter. By using the method above, the Bloom Filter can say an element is within the set when it actually isn’t. False positives are possible in the set, and they depend on several factors, such as:The size of the byte array The number of bits (k) set per element The number of items in the setBy tweaking the above values, you can easily get the false positive probability to respectable levels while still saving a large amount of space. After I discovered the Bloom Filter, I went looking for an implementation in Java. Sadly, a standard implementation doesn’t exist! So, I wrote a quick and simple version of the Bloom Filter for Java. You can find the source code on GitHub. My implementation uses:MD5 HashTo add an Object, the set takes the value of the hashCode() method to compute the MD5 hash. For subsequent values of k, the filter uses the previously computed MD5 hash (converted to an int) to generate the new MD5 hash.Backed by a simple byte array Implements the Set<Object> interface, although some of the methods in the interface will not work properly.Note that the project also used the SizeOf Library to get the number of byte used in memory. I also did a few quick expirements to compare the filter to a standard ArrayList in Java and a few performance checks.Time required to add an element to the set using different k values Size of the set versus the array list at different levelsAs to be expected, the larger the number of elements required to be in the set, the more useful the Bloom Filter becomes. It does get a bit tricky when determining how large the Bloom Filter should be and what the optimal k  value is for a given set, especially if the set is continually growing. For the tests, I simply added Objects (which have a size of 16 bytes) to each data structure, and I then use the SizeOf library to get the true amount of space used.From the above graph, its easy to see that the Bloom Filter is much more efficient on size once the array becomes larger than 100 objects. That trend continues at 1500 objects, with the Bloom Filter requiring 22808 bytes less than the ArrayList to store the same amount of elements.The above graph shows the time in seconds (on an early 2012 iMac) to add an element to the list with different numbers of bits set (k). As k increases, the time increases fairly slowly up to 10 bits. However, anything past 10 becomes very costly, with 100 bits set requiring a full second to complete. Feel free to check out the source code for the tests and the Bloom Filter implementation itself on GitHub.   Reference: Bloom Filter Implementation in Java on GitHub from our JCG partner Isaac Taylor at the Programming Mobile blog. ...

Debugging SQL query in MySQL

Recently I started writing SQL query for analyzing and debugging the production code. But I was surprised to see that some queries takes longer time to execute to achieve the same output. I did research and found some interesting stuff about how to debug the SQL query. I have a very simple table who’s definition is as following. In test environment, this table was populated with more then 1000K rows.           +-----------------------+--------------+------+-----+----------------+ | Field | Type | Null | Key | Extra | +-----------------------+--------------+------+-----+----------------+ | id | bigint(20) | NO | PRI | auto_increment | | dateCreated | datetime | NO | | | | dateModified | datetime | NO | | | | phoneNumber | varchar(255) | YES | MUL | | | version | bigint(20) | NO | | | | oldPhoneNumber | varchar(255) | YES | | | +-----------------------+--------------+------+-----+----------------+ I executed a very simple query to find the tuple which contains 5107357058 as phoneNumber. It took almost 4 seconds to fetch the result. select * from Device where phoneNumber = 5107357058; takes 4 sec. This simple query should have taken few milliseconds. I noticed that phoneNumber datatype is varchar but in query it is provided as number. When I modify the query to match the datatype, it took few milliseconds. select * from Device where phoneNumber = '5107357058'; takes almost no time. After googling and reading post on stackoverflow I found EXPLAIN SQL clouse which helps in debugging the query. The EXPLAIN statement provides information about the execution plan for a SELECTstatement. When I used it to get the information about the two queries I got the following results. mysql> EXPLAIN select * from Device where phoneNumber = 5107357058; +----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | Device | ALL | phoneNumber,idx_Device_phoneNumber | NULL | NULL | NULL | 6482116 | Using where | +----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec)mysql> EXPLAIN select * from Device where phoneNumber = '5107357058'; +----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+ | 1 | SIMPLE | Device | ref | phoneNumber,idx_Device_phoneNumber | phoneNumber | 258 | const | 2 | Using where | +----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+ 1 row in set (0.00 sec) The EXPLAINgives you different query attribute. While analysing the query you should take care of the following attribute.possible_keys : shows the indexes apply to the query key : which key use to find the record. NULL values shows that there was no key used for the query and SQL search linearly, eventually takes long time. rows : SQL query with less number of result-rows, is efficient. One should always try to improve the query and avoid using generic query clouse. The query performance is much evident when executed on large number of records. type : is “The join type”. Ref means All rows with matching index values are read from the table; All means the full table scan.The two outputs of EXPLAIN clearly indicate the subtle differences. The later query uses the string which is right datatype, results phoneNumber as key and checks only two rows. Whereas the former uses integer in the query which is different datatype and hence SQL converts to integer value to string value and compares with each record present in that table. This results NULL as key and 6482116 as row output. You can also see that the later query type value is ref and former query type value is All, which clearly indicates that the former is a bad query.   Reference: Debugging SQL query from our JCG partner Rakesh Cusat at the Code4Reference blog. ...

Your first message – discovering Akka

Akka is a platform (framework?) inspired by Erlang, promising easier development of scalable, multi-threaded and safe applications. While in most of the popular languages concurrency is based on memory shared between several threads, guarded by various synchronization mehods, Akka offers concurrency model based on actors. Actor is a lightweight object which you can interact with barely by sending messages to it. Each actor can process at most one message at a time and obviously can send messages to other actors. Within one Java virtual machine millions of actors can exist at the same time, building a hierarchical parent (supervisor) – children structure, where parent monitors the behaviour of children. If that’s not enough, we can easily split our actors between several nodes in a cluster – without modifying a single line of code. Each actor can have internal state (set of fields/variables), but communication can only occur through message passing, never through shared data structures (counters, queues). A combination of the features above lead to a much safer, more stable and scalable code – for the price of a radical paradigm shift in concurrent programming model. So many buzzwords and promises, let’s go forward with an example. And it’s not going to be a “Hello, world” example, but we are going to try to build a small, but complete solution. In the next few articles we will implement integration with random.org API. This web service allows us to fetch truly random numbers (as opposed to pseudo random generators) based on atmospheric noise (whatever that means). API isn’t really that complicated, please visit the following website and refresh it couple times: https://www.random.org/integers/?num=20&min=1000&max=10000&col=1&base=10&format=plain So where is the difficulty? Reading guidelines for automated clients we learn that:The client application should call the URL above at most from one thread – it’s forbidden to concurrently fetch random numbers using several HTTP connections. We should load random numbers in batches, not one by one in every request. The request above takes num=20 numbers in one call. We are warned about latency, response may arrive even after one minute The client should periodically check random number quota (the service is free only up to a given number of random bits per day)All these requirements make integration with random.org non-trivial. In this series I have just begun we will gradually improve our application, learning new Akka features step by step. We will soon realize that quite steep learning curve pays of quickly once we understand the basic concepts of the platform. So, let’s code! Today we will try to handle first two requirements, that is not more than one connection at any given point in time and loading numbers in batches. For this step we don’t really need Akka, simple synchronization and a buffer is just about enough: val buffer = new Queue[Int]def nextRandom(): Int = { this.synchronized { if(buffer.isEmpty) { buffer ++= fetchRandomNumbers(50) } buffer.dequeue() } }def fetchRandomNumbers(count: Int) = { val url = new URL("https://www.random.org/integers/?num=" + count + "&min=0&max=65535&col=1&base=10&format=plain&rnd=new") val connection = url.openConnection() val stream = Source.fromInputStream(connection.getInputStream) val randomNumbers = stream.getLines().map(_.toInt).toList stream.close() randomNumbers } This code works and is equivalent to the synchronized keyword in Java. The way nextRandom() works should be obvious: if the buffer is empty, fill it with 50 random numbers fetched from the server. At the end take and return the first value from the buffer. This code has several disadvantages, starting from the synchronized block on the first place. Rather costly synchronization for each and every call seems like an overkill. And we aren’t even in the cluster yet, where we would have to maintain one active connection per whole cluster, not only withing one JVM! We shall begin with implementing one actor. Actor is basically a class extending Actor trait and implementing receive method. This method is responsible for receiving and handling one message. Let’s reiterate what we already said: each and every actor can handle at most one message at a time, thus receive method is never called concurrently. If the actor is already handling some message, the remaining messages are kept in a queue dedicated to each actor. Thanks to this rigorous rule, we can avoid any synchronization inside actor, which is always thread-safe. case object RandomRequestclass RandomOrgBuffer extends Actor {val buffer = new Queue[Int]def receive = { case RandomRequest => if(buffer.isEmpty) { buffer ++= fetchRandomNumbers(50) } println(buffer.dequeue()) } } fetchRandomNumbers() method remains the same. Single-threaded access to random.org was achieved for free, since actor can only handle one message at a time. Speaking of messages, in this case RandomRequest is our message – empty object not conveying any information except its type. In Akka messages are almost always implemented using case classes or other immutable types. Thus, if we would like to support fetching arbitrary number of random numbers, we would have to include that as part of the message: case class RandomRequest(howMany: Int)class RandomOrgBuffer extends Actor with ActorLogging {val buffer = new Queue[Int]def receive = { case RandomRequest(howMany) => if(buffer.isEmpty) { buffer ++= fetchRandomNumbers(50) } for(_ <- 1 to (howMany min 50)) { println(buffer.dequeue()) } } Now we should try to send some message to our brand new actor. Obviously we cannot just call receive method passing message as an argument. First we have to start the Akka platform and ask for an actor reference. This reference is later used to send a message using slightly counter-intuitive at first ! method, dating back to Erlang days: object Bootstrap extends App { val system = ActorSystem("RandomOrgSystem") val randomOrgBuffer = system.actorOf(Props[RandomOrgBuffer], "buffer")randomOrgBuffer ! RandomRequest(10) //sending a messagesystem.shutdown() } After running the program we should see 10 random numbers on the console. Experiment a little bit with that simple application (full source code is available on GitHub, request-response tag). In particular notice that sending a message is non-blocking and the message itself is handled in a different thread (big analogy to JMS). Try sending a message of different type and fix receive method so that it can handle more than one type. Our application is not very useful by now. We would like to access our random numbers somehow rather than printing them (asynchronously!) to standard output. As you can probably guess, since communication with an actor can only be established via asynchronous message passing (actor cannot “return” result, neither it shouldn’t place it in any global, shared memory). Thus an actor will send the results back via reply message sent directly to us (to sender). But that will be part of the next article. This was a translation of my article “Poznajemy Akka: pierwszy komunikat” originally published on scala.net.pl.   Reference: Your first message – discovering Akka from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog. ...

Building Both Security and Quality In

One of the important things in a Security Development Lifecycle (SDL) is to feed back information about vulnerabilities to developers. This post relates that practice to the Agile practice of No Bugs. The Security Incident Response Even though we work hard to ship our software without security vulnerabilities, we never succeed 100%. When an incident is reported (hopefully responsibly), we execute our security response plan. We must be careful to fix the issue without introducing new problems. Next, we should also look for similar issues to the one reported. It’s not unlikely that there are issues in other parts of the application that are similar to the reported one. We should find and fix those as part of the same security update. Finally, we should do a root cause analysis to determine why this weakness slipped through the cracks in the first place. Armed with that knowledge, we can adapt our process to make sure that similar issues will not occur in the future. From Security To QualityThe process outlined above works well for making our software ever more secure. But security weaknesses are essentially just bugs. Security issues may have more severe consequences than regular bugs, but most regular bugs are expensive to fix once the software is deployed as well. So it actually makes sense to treat all bugs, security or otherwise, the same way. As the saying goes, an ounce of prevention is worth a pound of cure. Just as we need to build security in, we also need to build quality in general in. Building Quality In Using Agile Methods This has been known in the Agile and Lean communities for a long time. For instance, James Shore wrote about it in his excellent book The Art Of Agile Development and Elisabeth Hendrickson thinks that there should be so little bugs that they don’t need triaging.Some people object to the Zero Defects mentality, claiming that it’s unrealistic. There is, however, clear evidence of much lower defect rates for Agile development teams. Many Lean implementations also report successes in their quest for Zero Defects. So there is at least anecdotal evidence that a very significant reduction of defects is possible. This will require change, of course. Testers need to change and so do developers. And then everybody on the team needs to speak the same language and work together as a single team instead of in silos. If we do this well, we’ll become bug exterminators that delight our customers with software that actually works.   Reference: Building Both Security and Quality In from our JCG partner Remon Sinnema at the Secure Software Development blog. ...

Investigating Deadlocks – Part 2

One of the most important requirements when investigating deadlocks is actually having a deadlock to investigate. In my last blog I wrote some code called DeadlockDemo that used a bunch of threads to transfer random amounts between a list of bank accounts before grinding to a halt in a deadlock. This blog runs that code to demonstrates a few ways of obtaining a thread dump.  A thread dump is simply a report showing the status of all your application’s threads at a given point in time. The good thing about it is that it contains various bits of information that will allow you to figure out why you have a deadlock and hopefully allowing you to fix it, but more on that later. kill SIGQUIT If your Java application is running on a UNIX machine the first, and possibly easiest, way to grab hold of a thread dump is to use the UNIX kill command via a terminal. To do this, first get hold of your application’s process identifier or PID using the ps and grep commands. For example if you type: ps –e | grep java …then you’ll produce a list that looks something like this: 74941 ttys000 0:00.01 grep java 70201 ttys004 1:00.89 /usr/bin/java threads.deadlock.DeadlockDemo The PID for DeadlockDemo is, in this case, 70201 and is taken from the output above. Note that different flavours of UNIX or different ps command line args can produce slightly different results, so check your man pages. Having got hold of your PID, use it to issue a kill SIGQUIT command: kill -3 70201 The kill command is the UNIX command that disposes of unwanted processes Although the -3 above is the SIGQUIT (equivalent to a keyboard ctrl-D) argument, if Java receives this signal it will not quit, it will display a thread dump on its associated terminal. You can then grab hold of this and copy it into a text file for further analysis. jstack If you’re working in Windows then the UNIX command line isn’t available. To counter this problem Java comes with a utility that performs the equivalent of kill. This is called jstack and is available on both UNIX and Windows. It is used in the same way as the kill command demonstrated above: jstack <PID> Getting hold of a PID in Windows is a matter of opening the Windows Task Manager. Task Manager doesn’t display the PIDs by default and so you need to update its setup by using the view menu option and checking the PID (Process Identifier) option in the Select Columns dialogue box.Next, it’s just a matter of examining the process list and finding the appropriate instance of java.exe.Read java.exe’s PID and use it as a jstack argument as shown below: jstack 3492 Once the command has completed you can grab hold of the output and copy it into a text file for further analysis. jVisualVM jVisualVM is the ‘Rolls Royce’ way of obtaining a thread dump. It’s provided by Oracle as tool that allows you to get hold of lots of different info about a Java VM. This includes heap dumps, CPU usage, memory profiling and much more. jVisualVM’s actual program name is jvisualvm or jvisualvm.exe on Windows. Once running you’ll see something like this:To obtain a thread dump, find your application in the left hand applications panel, then right click and select: “Thread Dump”.A thread dump is then displayed in jvisualvm’s right-hand pane as shown below:Note that I have seen jvisualvm hang on several occasions when connecting to a local VM. When this happens ensure that its proxy settings are set to No Proxy Having obtained a thread dump, my next blog will now use it to investigate the what’s going wrong with the example DeadlockDemo code. For more information see the other blogs in this series.   Reference: Investigating Deadlocks – Part 2: Obtaining the Thread Dump from our JCG partner Roger Hughes at the Captain Debug’s Blog blog. ...

Spring MVC Form Validation (With Annotations)

This post provides a simple example of a HTML form validation. It is based on the Spring MVC With Annotations example. The code is available on GitHub in the Spring-MVC-Form-Validation directory. Data For this example we will use a bean and JSR303 validation annotations:               public class MyUser {@NotNull @Size(min=1,max=20) private String name;@Min(0) @Max(120) private int age;public MyUser(String name, int age) { this.name = name; this.age = age; }public MyUser() { name = ''; age = 0; }// Setters & Getters}   Pages Our form will contain input elements, but also the possibility to display error messages: <%@page contentType='text/html' pageEncoding='UTF-8'%> <%@ taglib prefix='form' uri='http://www.springframework.org/tags/form' %> <!doctype html> <html> <head> <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'> <title>My User Form!</title> </head> <body> <form:form method='post' action='myForm' commandName='myUser'> <table> <tr> <td>Name: <font color='red'><form:errors path='name' /></font></td> </tr> <tr> <td><form:input path='name' /></td> </tr> <tr> <td>Age: <font color='red'><form:errors path='age' /></font></td> </tr> <tr> <td><form:input path='age' /></td> </tr> <tr> <td><input type='submit' value='Submit' /></td> </tr> </table> </form:form> </body> </html> Our success page is: <%@page contentType='text/html' pageEncoding='UTF-8'%> <%@taglib prefix='form' uri='http://www.springframework.org/tags/form'%> <%@ taglib prefix='c' uri='http://java.sun.com/jsp/jstl/core' %> <!doctype html> <html> <head> <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'> <title>Form Processed Successfully!</title> </head> <body> Form processed for <c:out value='${myUser.name}' /> ! <br /> <a href='<c:url value='/'/>'>Home</a> </body> </html> Our home page: <%@page contentType='text/html' pageEncoding='UTF-8'%> <%@ taglib prefix='c' uri='http://java.sun.com/jsp/jstl/core' %> <!doctype html> <html lang='en'> <head> <meta charset='utf-8'> <title>Welcome !!!</title> </head> <body> <h1> Spring Form Validation !!! </h1> <a href='<c:url value='/myForm'/>'>Go to the form!</a> </body> </html>   Controller Notice that we need to use @ModelAttribute to make sure an instance of MyUser is always available in the model. In the validateForm(), we need to use @ModelAttribute to move the content of the form to the MyUser project. @Controller public class MyController {@RequestMapping(value = '/') public String home() { return 'index'; }@ModelAttribute('myUser') public MyUser getLoginForm() { return new MyUser(); }@RequestMapping(value = '/myForm', method = RequestMethod.GET) public String showForm(Map model) { return 'myForm'; }@RequestMapping(value = '/myForm', method = RequestMethod.POST) public String validateForm( @ModelAttribute('myUser') @Valid MyUser myUser, BindingResult result, Map model) {if (result.hasErrors()) { return 'myForm'; }model.put('myUser', myUser);return 'success';}}   Maven Dependencies We need the following dependencies. The Hibernate validator dependency is necessary to process JSR303 annotations: <dependency> <groupId>javax.validation</groupId> <artifactId>validation-api</artifactId> <version>1.0.0.GA</version> <type>jar</type> </dependency><dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-validator</artifactId> <version>4.3.0.Final</version> </dependency>   Running The Example Once compiled, the example can be run with mvn tomcat:run. Then, browse: http://localhost:8383//spring-mvc-form-validation/. If the end user enters invalid values, error messages will be displayed:  Reference: Spring MVC Form Validation (With Annotations) from our JCG partner Jerome Versrynge at the Technical Notes blog. ...

Lets Crunch big data

As developers our focus is on simpler, effective solutions and thus one of the most valued principle is “Keep it simple and stupid”. But with Hadoop map-reduce it was a bit hard to stick to this. If we are evaluating data in multiple Map Reduce jobs we would end up with code that is not related to business but more related to infra. Most of the non-trivial business data processing involves quite a few of map-reduce tasks. This means longer tread times and harder to test solutions. Google presented solution to these issues in their FlumeJava paper. The same paper has been adapted in implementing Apache-Crunch. In a nutshell Crunch is a java library which simplifies development on MapReduce pipelines. It provides a bunch of lazily evaluated collections which can be used to perform various operations in form of map reduce jobs. Here is what Brock Noland said in one of posts while introducing Crunch “Using Crunch, a Java programmer with limited knowledge of Hadoop and MapReduce can utilize the Hadoop cluster. The program is written in pure Java and does not require the use of MapReduce specific constructs such as writing a Mapper, Reducer, or using Writable objects to wrap Java primitives.“ Crunch supports reading data from various sources like sequence files, avro, text , hbase, jdbc with a simple read API <T> PCollection<T> read(Source<T> source) You can import data in various formats like json, avro, thrift etc and perform efficient joins, aggregation, sort, cartesian and filter operations. Additionally any custom operations over these collections is quite easy to cook. All you have to do is to implement the quite simple and to the point, DoFn interface. You can unit test you implementations of DoFn without any map-reduce constructs. I am not putting any example to use it. It is quite simple and the same can be found out on Apache-Crunch site. Alternatively you could generate a project from the available crunch-archetype. This will also generate a simple WordCount example. The archetype can be selected using : mvn archetype:generate -Dfilter=crunch-archetype The project has quite a few examples for its different aspects and is also available in Scala. So now lets CRUNCH some data !!!   Reference: Lets Crunch big data from our JCG partner Rahul Sharma at the The road so far… blog blog. ...

A Framework for Enterprise Software

For decades, the companies in this industry have produced sophisticated and complex products. They are difficult to assemble, require the stitching together of a variety of component parts, and its often a long time from the beginning of a project to when the end user sees value/success. Unfortunately, many times, the user is never satisfied. The product breaks or does not work quite as it was described it would. As these quality problems emerge, the companies in this industry turn to 3rd parties, asking the 3rd parties to help their clients have a better experience. The companies incorporate IP from those third parties and many times, they ask the 3rd parties to service the products. While this often improves the customer experience, it does not solve the problem of the negative externalities (and the implications on the users) of the product. In many cases, the negative externalities drive up costs for the users. Then, along comes the threat of a new approach and technology. It simplifies many of the previous issues. The product looks better on the surface and clients tend to be happier, sooner. It also solves the problem of the negative externalities. It’s a homerun all around. Does everyone move to the new approach/technology en masse? Well, you tell me: do you own a battery-powered car? ————————————- The story above is merely an illustration that history repeats itself and there is a lot to be learned from understanding and spotting patterns. I suppose most people who read this will think of enterprise software, as they read that story. And, when I get to the part about the new approach/technology, they start thinking of SaaS and Cloud. However, the answer to the question is the same, whether we are talking autos or enterprise software: The world does not move en masse in any direction, even though benefits are apparent. I continue to see rhetoric that postulates that the future of enterprise software is simply cloud and SaaS. While its hard to argue this at a conceptual level (given its lack of specificity), I think it trivializes a very complex topic. Not everything will be cloud/SaaS, although those will certainly be two possible delivery models. To really form a view of how enterprise software evolves over the next 10-20 years, I’ve constructed some over-arching hypotheses, which hopefully provides a framework for thinking about new business opportunities in enterprise software. Hypothesis 1: The current model of ‘pushing’ your product through a salesforce does not scale and is not optimal for clients/users. Usability will dominate, and I extend usability to include topics like time-to-value, ease of use, and self-service. Hypothesis 2: The model of paying Systems Integrators to make your products work together (or work in the first place) will enter a secular decline. There will continue to be a strong consulting market for application development, high-end strategy/segmentation, and complex project management. However, clients will no longer tolerate having to pay money just to make things work. Hypothesis 3: Enterprises cannot acquire skills fast enough to exploit new technology. So, on one hand, usability needs to address this. On the other hand, continuing education will need to offer a new method for driving skills development quickly. Continuing education is much more than ‘product training’. In fact, while ‘product training’ is the majority that is paid for today…I believe it will be the minority going forward. Hypothesis 4: There will be different models for software delivery: Cloud, SaaS, On-premise, Outsourced, etc. Therefore, just because a company offers something in a certain model does not mean that they will be successful. Clients will buy the best fit model, based on their business goal and related concerns (security, sustainability, etc). Hypothesis 5: Clients will optimize easy (implementation and ongoing support) and return (on investment and capital). Products that deliver on both are a no-brainer. Products that only hit one of them will be scrutinized. Products that deliver neither, will cease to exist. As I meet with new companies and even assess products that we are building, this is my current framework for thinking through how to identify the potential winners and losers.   Reference: A Framework for Enterprise Software from our JCG partner Rob Thomas at the Rob’s Blog blog. ...

Is there a better approach to Marker?

Since the launch of Java 1.5 there has been a wide range of discussion and debate on whether to use the Marker interface in Java. This is because Java 1.5 has introduced Annotations features which will pretty much solve the purpose of the Marker interface. This article will cover an interesting section of this debate.Definition:Marker Interface in Java is an empty interface without any methods, fields or constants. This is sometimes also referred as tag interface.  So why Marker interface is used? Valid question!! It doesn’t solve the purpose of the interface which defines a contract with the classes which implement this interface. The interfaces defines methods without their implementation as it tells the child classes what needs to be done but it leaves the decision on the child classes on how to implement this method. However in case of Marker interface there are no members. Marker interface is a way to declare the metadata about a class. It tells the JVM that the objects of the classes which implement the marker interface needs to be treated with special care in a different manner. Some of the out of box Marker interfaces are defined in the Java API are: java.io.Serializable java.lang.Cloneable java.util.RandomAccess java.util.EventListener We can also create our own version of marker interfaces in the same way as we create other interfaces. Let’s go in more depth with the Cloneable interface. When the object needs to be cloned in Java we use the Object’s clone() method. However note that this method is not a part of Cloneable interface i.e. when your class implement the Cloneable interface by default the clone method will not be implemented unlike any other standard interfaces. It can be done when we explicitly define it or call the object’s clone method. Therefore it is not possible to clone an object merely by virtue of the fact that it implements this interface. Even if the clone method is invoked reflectively, there is no guarantee that it will succeed. public Object clone() { Object clone = null; try { clone = super.clone(); } catch (CloneNotSupportedException e) { e.printStackTrace(); } return clone; } One key point here is that when you try to clone an object using the clone() method you will get CloneNotSupportedException unless you implement the Cloneable interface. JVM is very smart – isn’t it? Points to note: As mentioned earlier apart from using the built in marker interfaces we can also create application specific marker interfaces as it is a good way of tagging and logically classify your piece of code. This is mainly useful when trying to create a framework or developing API. Interesting points: Runnable is not a Marker interface. Though run is a special instruction to the JVM to start a method but Runnable is not a marker interface because it has a public void run() method inside it. Problem with the marker interface: A major problem with marker interfaces is that an interface defines a contract for implementing classes, and that contract is inherited by all subclasses. This means that you cannot un-implement a marker. If you create a subclass that you do not want to serialize (perhaps because it depends on transient state), you must resort to explicitly throwing NotSerializableException. Now let’s come back to the point. Is it a better approach to use Annotations than Marker interface? To answer this let’s look into Java Annotations in more details.Definition:Java annotations are special form of syntactic meta-data (data about data) which was introduced in Java 1.5. Like Java Classes, Interfaces even Annotations can be used on several Java elements.Unlike the Javadocs Annotations are more feature rich which helps in processing at runtime. Annotations are used in Package or Class declaration, method declaration, field declaration and variable declaration. It reduces coding effort and let’s developers concentrate on the business logic with the ease of development and hence increases automation. Annotations are demarcated from the standard Java elements by “@” symbol. Whenever the compiler comes across these annotations with any of the Java elements it extracts the information from the annotation and generates the code automatically. Uses of Annotations:Passing information to the compiler – used to detect errors or suppress warnings. E.g. @SuppressWarnings, @deprecated Compiler time and deployment time processing – Several tools can process annotation information to generate code XML file etc. Frameworks like Spring, Hibernate make heavy use of annotations. Run time processing – These annotations are processed only during the runtime.In similar way of Marker interfaces we also have marker annotations. A marker annotation does not have any methods or elements. The behaviour is same as the Marker interfaces. e.g. @Override is a built in Java Marker annotation type that can be implemented to a method to indicate the compiler the compiler that the method overrides a method in a superclass. It does not contain any other program element. If you use this annotation on a method that does not override a superclass method, the compiler issues a compilation error to alert you to this fact. This annotation type guards the programmer against making a mistake when overriding a method as it is quite possible that the developers might actually overload a method in the super class rather than overriding. It seems annotation is a better choice than the marker interface as the same effect can be achieved by the annotations.It can mark variables, methods, and/or classes. It can mark any class specifically, or via inheritance. A marker interface will mark all subclasses of the marked class. E.g. if we have to mark a class non serializable then we have to specifically mark it is as transient. It might be a debatable one as annotations not are not sub classable might be either an advantage or a disadvantage. Annotations are not inherited by default – isAnnotationPresent() tells you if the annotation is present on that particular class, but not if it’s present on a superclass or superinterface. So if you, as the implementer of whatever special functionality the annotation is intended to confer, want the annotation to behave as if it’s inherited – you have to check isAnnotationPresent() on not just this class, but every superclass, and every superinterface. You can add data to the mark. In other words, an annotation that is not blank has value — in that you are marking with more than just type.So each one of them having certain advantage and disadvantage I personally feel that the decision on whether to use marker interface or marker annotation should to be left with the developers as they have to decide considering the situation on the ground and judge on the advantages and disadvantages of both of them and decide on what suits the requirement the best.   Reference: Is there a better approach to Marker? from our JCG partner Mainak Goswami at the Idiotechie blog. ...

Java features applicability

Java language and standard library is powerful, but with great power comes great responsibility. After seeing a lot of user code misusing or abusing rare Java features on one hand and completely forgetting about most basic feature on the other, I decided to compose this summary. This is not a list of requirements and areas every Java developer should explore, know and use. It’s quite the opposite! I group Java features in three categories: day to day, occasionally and never (frameworks and libraries only). The rule is simple: if you find yourself using given feature more often then suggested, you are probably over-engineering or trying to build something too general and too reusable. If you don’t use given feature often enough (according to my subjective list), you’re probably missing some really interesting and important opportunities.Note that I only focus on Java, JVM and JDK. I do not suggest which frameworks and how likely you should use. Also I assume typical, server-side business-facing application. Day to day The following features of the Java language are suppose to be used every day. If you have never seen some of them or find yourself using them very rarely, you might take a closer look, they are really helpful:classes, interfaces, packages – seriously. Put your code in classes. You remember from the university that class is an encapsulated data + methods acting upon that data? Class with only state is barely a structure. Class with only methods is just a namespace enclosing functions. Also use interfaces whenever needed. But think twice before creating an interface with only one implementation. Maybe you don’t need a middleman? Nevertheless, put everything in packages, following well established naming convention. static methods – don’t be afraid of them. But use them only for stateless utility methods. Don’t encode any business logic inside static method, ever. ExecutorService – thread pools – creating and effectively using thread pools, understanding how queueing and Future<T> works is a must. Don’t reimplement thread pools, think about them every time someone says producer-consumer. Atomic-* family – don’t use synchronized to barely read/update some counter or reference atomically. Atomic-* family of classes use effective compare-and-swap low-level instructions to be amazingly efficient. Make sure you understand the guarantees these classes provide. design patterns – Not technically a Java language part, but essential. You should, know, understand, and use them willingly but sparingly. Just like with interfaces – don’t go overboard. GoF or even EI patterns should often occur in the code base. But let patterns emerge during your thought process, rather than you letting your thought process be driven by patterns. built-in collections, including concurrent – you absolutely must know and use built in collections, understanding the differences between List, Map and Set. Using thread-safe collections should not be an issue for you. Understand performance characteristics and have basic overview of the implementation behind them. This is really basic. Also know and use various BlockingQueue implementations. Concurrency is hard, don’t make it even harder by reimplementing some of this stuff yourself. Built-in annotations – annotations are here to stay, learn to use @Override (and @Deprecated to some degree) every day consistently. exceptions – use unchecked exceptions to signal abnormal, exceptional failure that requires action being taken. Learn to live with checked exceptions. Learn to read stack traces. try-with-resources – familiarize yourself with this fabulous language construct. Implement AutoCloseable if your class requires any cleanup. Blocking IO – using Reader/Writer, InputStream/OutputStream classes is something you should be really familiar with. Understand the difference between them, using buffering and other decorators without fear.This ends the list of everyday tools you should use. If you’ve never heard of some of them or used them only occasionally, study them more carefully as they might become your lifesavers. Occasionally Following are the language features you should not be afraid to use, but they should not be abused as well. If you find yourself exploiting them every day, if these are kind of features you see several times before lunch, there may be something wrong with your design. I am looking from a back-end, enterprise Java developer perspective. These types of features are useful, but not too often.inheritance and abstract classes – really, it turns out I don’t use inheritance that often and I don’t really miss it. Polymorphism driven by interfaces is by far more flexible, especially with a painful lack of traits in Java. Also prefer composition over inheritance. Too many levels of inheritance lead to very unmaintainable code.regular expressions – Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.. The world without regular expressions would be much more boring and cumbersome. They are wonderful for parsing regular languages (but not HTML) but its way too easy to overuse them. If you find yourself crafting, testing, fixing and coursing whole day in front of regular expressions, you are probably using wrong tool for the job. My all time favourite: public static boolean isNegative(int x) { return Integer.toString(x).matches('-[0-9]+'); }Semaphore, CountDownLatch, CyclicBarrier and others – they are all extremely useful better by an order of magnitude than infomous wait()/notify() pair. But even them won’t prevent you from concurrency bugs when abused. Consider thread-safe collections or some frameworks when you see these synchronization mechanism too often. generic types in user code- using built-in collections and other classes that have generic types should not only be a day to day practice, it should be obvious for you. But I mean developing code yourself taking or returning generic types. Something like this: public <T, F> ContractValidator<T extends Contract> T validate(Validator<T>, F object) It is sometimes necessary to use generics in your own code, but don’t go too meta-. Of course static typing and type safety should be your priority, but maybe you can avoid too many generic, complex types? Scripting languges in JVM – do you know JDK has a built-in JavaScript interpreter? And that you can plug virtually any other language like Groovy or JRuby? Sometimes it’s simpler to embed small script inside your application that can be changed even by the customer. It’s not often, but in very fast changing markets redeploying might not be an option. Just remember that if the total number of lines of scripted code exceeds 1% of the total amount of your code, you should start worrying about maintenance. Java NIO – it is hard to get it right and even harder to actually benefit from it. But in rare cases you actually have to use NIO to squeeze as much performance and scalability as you can. However prefer libraries that can do it for you. Also in normal circumstances blocking IO is typically enough. synchronized keyword – you should not use it too often for a simple reason. The more often it’s used, the more often it’s executed, thus impacting performance. Consider thread-safe collections and atomic primitive wrappers instead. Also make sure you always understand which object is used as a mutex.I consider features above valuable and important, but not necessarily on a day-to-day basis. If you see any of them every single day it might be a sign of over-engineered design or… inexperienced developer. Simplicity comes with experience. However, you might also have very unusual requirements, which applies to the third group as well. Never (think: framework and library developers only) You should know and understand the principles behind the features below in order to understand frameworks and libraries. And you must understand them to effectively us them, I see way too many questions on StackOverflow that could have been avoided if the person in question simply read the code of a library in use. But understanding doesn’t mean use. You should almost never use them directly, they are mostly advanced, dirty and complicated. Even one occurrence of such feature can lead to major headaches.sockets – seriously, sockets. You must understand how TCP/IP stack works, be very conscious with regards to threading, careful when interpreting the data, vigilant with streams. Stay away from using pure sockets, there are hundreds of libraries wrapping them and providing higher level abstractions – HTTP, FTP, NTP, SMB, e-mail… (e.g. see Apache Commons net). You’ll be amazed how hard it is to write decent HTTP client or server. And if you need to write a server for some proprietary protocol, definitely consider Netty. reflection – there is no place for introspecting classes and methods in business code. Frameworks can’t live without reflection, I can’t live with. Reflection makes your code slower, unsafe and ugly. Typically AOP is just enough. I would even say that passing instances of Class<T> around is a code smell. dynamic proxies and byte code manipulation – Proxy class is great, but just like reflection, should be used only by the frameworks and libraries that support you. They are a basic building block of lightweight AOP. If your business application (not framework or library, even Mockito uses these techniques!) requires byte code generation or manipulation (e.g. ASM or CGLIB) -you’re in a deep sh**t I will pray for you. class loaders – everything that has anything to do with class loaders. You must understand them, the hierarchy, bytecode, etc. But if you write your own class loaders, it’s a road to hell. Not that it’s so complicated, but it’s probably unnecessary. Leave it to application servers. Object.html#clone() – honestly, I don’t remember if I ever used that method in my entire (Java developer’s) life. I just… didn’t… And I can’t find any rationale behind using it. I either have an explicit copy constructor or better use immutable objects. Do you have any legitimate use cases for it? It seems so 1990s… native methods – there are a few in JDK, even for such small tasks like computing sine function. But Java is no longer the slowest kid in the class, it’s actually quite the opposite. Also I can’t imagine what kind of logic you need that can’t be achieved using standard library or 3rd-party libraries. Finally, native methods are quite hard to get right, and you can expect low-level, nasty errors, especially around memory management. custom collections – implementing brand new collection following all contracts defined in original JavaDoc is surprisingly hard. Frameworks like Hibernate use special persistent collections. Very rarely you need a collection so specific to your requirements that none of the built-in ones are good enough. ThreadLocal – Libraries and frameworks use thread locals quite often. But you should never try to exploit them for two unrelated reasons. First of all, ThreadLocal is often a hidden semi-global parameter you want to sneak-in. This makes your code harder to reason about and test. Secondly, ThreadLocals can easily introduce memory leaks when not cleaned up properly (see this, this, this and this and…) WeakReference and SoftReference – these classes are quite low-level and are great when implementing caches playing well with garbage collection. Luckily there are plenty of open-source caching libraries, so you don’t have to write one yourself. Understand what these classes do, but don’t use them. com.sun.* and sun.* packages, especially sun.misc.Unsafe – stay away from these packages, just… don’t go there. There is no reason to explore these proprietary, undocumented and not guaranteed to preserve backward compatibility classes. Just pretend they’re not there. And why would you use Unsafe?Of course the list above is completely subjective and most likely not definitive. I encourage you to comment and suggest, if you feel some items are in wrong place or maybe something is missing entirely. I would like to build a summary that can be given as a reference during code review or when a project is evaluated.   Reference: Java features applicability from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog. ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: