Featured FREE Whitepapers

What's New Here?

agile-logo

What we forget about the Scientific Method

I get fed up hearing other Agile evangelists champion The Scientific Method, I don’t disagree with them, I use it myself but I think they are sometimes guilty of overlooking how the scientific method is actually practiced by scientists and researchers. Too often the scientific approach is made to sound simple, it isn’t. First lets define the scientific method. Perhaps rather than call it “scientific method” it is better called “experimentation.” What the Agile Evangelists of my experience are often advocating is running an experiment – perhaps several experiments in parallel but more likely in sequence. The steps are something like this:    Propose a hypothesis, e.g. undertaking monthly software releases instead of bi-monthly will result in a lower percentage of release problems Examine the current position: e.g. find the current figures for release frequency and problems, record these Decide how long you want to run the experiment for, e.g. 6 months Introduce the change and reserve any judgement until the end of the experiment period Examine the results: recalculate the figures and compare these with the original figures Draw a conclusion based on observation and dataI agree with all of this, I think its great, but… Lets leave aside problems of measurement, problems of formulating the hypothesis, problems of making changes and propagation problems (i.e. the time it takes for changes to work though the system). These are not minor problems and they do make me wonder about applying the scientific method in the messy world of software and business but lets leave them to one side for the moment. Lets also leave aside the so-called Hawthorne Effect – the tendency for people to change and improve their behaviour because know they are in an experiment. Although the original Hawthorne experiments were shown to be flawed some time ago the effect might still be real. And the flaws found in the Hawthorne experiments should remind us that there may be other factors at work which we have not considered. Even with all these caveats I’m still prepared to accept an experimental approach to work has value. Sometimes the only way to know whether A or B is the best answer is to actually do A and do B and compare the results. But, this is where my objections start…. There are two important elements missing from the way Agile Evangelists talk about the scientific method. When real scientists – and I include social-scientists here – do an experiment there is more to the scientific method than the experiment and so there should be in work too. #1: Literature review – standing on the shoulders of others Before any experiment is planned scientists start by reviewing what has gone before. They go to the library, sometimes books but journals are probably more up to date and often benefit from stricter peer review. They read what others have found, they read what others have done before, the experiments and theories devised to explain the results. True your business, your environment, your team are all unique and what other teams find might not apply to you. And true you might be able to find flaws in their research and their experiments. But that does not mean you should automatically discount what has gone before. If other teams have found consistent results with an approach then it is possible yours will too. The more examples of something working the more likely it will work for you. Why run an experiment if others have already found the result? Now I’m happy to agree that the state of research on software development is pitiful. Many of those who should be helping the industry here, “Computer Science” and “Software Engineering” departments in Universities don’t produce what the industry needs. (Ian Sommerville’s recent critique on this subject is well worth reading “The (ir)relevance of academic software engineering research”). But there is research out there. Some from University departments and some from industry. Plus there is a lot of research that is relevant but is outside the computing and software departments. For example I have dug up a lot of relevant research in business literature, and specifically on time estimation in psychology journals (see my Notes on Estimation and Retrospective Estimation and More notes on Estimation Research.) As people used to dealing with binary software people might demand a simple “Yes this works” or “No it doesn’t” and those suffering from physics envy may demand rigorous experimental research but little of the research of this type exists in software engineering. Much of software engineering is closer to psychology, you can’t conduct the experiments that would give these answers. You have to use statistics and other techniques and look at probabilities. (Notice I’ve separated computer science from software engineering here. Much of computer science theory (e.g. sort algorithm efficiency, P and NP problems, etc.) can stand up with physics theory but does not address many of the problems practicing software engineers face.) #2: Clean up the lab I’m sure most of my readers did some science at school. Think back to those experiments, particularly the chemistry experiments. Part of the preparation was to check the equipment, clean any that might be contaminated with the remains of the last experiment, ensure the workspace was clear and so on. I’m one of those people who doesn’t (usually) start cooking until they have tidies the kitchen. I need space to chop vegetables, I need to be able to see what I’m doing and I don’t want messy plates getting in the way. There is a word for this: Mise en place, its a French expression which according to Wikipedia means: “is a French phrase which means “putting in place”, as in set up. It is used in professional kitchens to refer to organizing and arranging the ingredients (e.g., cuts of meat, relishes, sauces, par-cooked items, spices, freshly chopped vegetables, and other components) that a cook will require for the menu items that are expected to be prepared during a shift.” (Many thanks to Ed Sykes for telling me a great term.) And when you are done with the chemistry experiment, or cooking, you need to tidy up. Experiments need to include set-up and clean-up time. If you leave the lab a mess after every experiment you will make it more difficult for yourself and others next time. I see the same thing when I visit software companies. There is no point in doing experiments if the work environment is a mess – both physically and metaphorically. And if people leave a mess around when they have finished their work then things will only get harder over time. There are many experiments you simply can’t run until you have done the correct preparation. An awful lot of the initial advice I give to companies is simply about cleaning up the work environment and getting them into a state where they can do experiments. Much of that is informed by reference to past literature and experiments. For example:Putting effective source code control and build systems in place Operating in two week iterations: planning out two weeks of work, reviewing what was done and repeating Putting up a team board and using it as a shared to-do list Creating basic measurement tools, whether they be burn-down charts, cumulative flow diagrams or even more basic measurementsYou get the idea? Simply tidying up the work environment and putting a basic process in place, one based on past experience, one which better matches the way work actually happens can alone bring a lot of benefit to organizations. Some organizations don’t need to get into experiments, they just need to tidy up. And, perhaps unfortunately, that is where is stops for some teams. Simply doing the basics better, simply tidying up, removes a lot of the problems they had. It might be a shame that these teams don’t go further, try more but that might be good enough for them. Imagine a restaurant that is just breaking even, the food is poor, customers sometimes refuse to pay, the service shoddy so tips are small, staff don’t stay long which makes the whole problem worse, a vicious circle develops. In an effort to cut costs managers keep staffing low so food arrives late and cold. Finally one of the customers is poisoned and the local health inspector comes in. The restaurant has to do something. They were staggering on with the old ways until now but a crisis means something must be done. They clean the kitchen, they buy some new equipment, they let the chef buy the ingredients he wants rather than the cheapest, they rewrite the menu to simplify their offering. They don’t have to do much and suddenly the customers are happier: the food is warmer and better, the staff are happier, a virtuous circle replaces a vicious circle. How far the restaurant owners want to push this is up to them. If they want a Michelin star they will go a long way, but if this is the local greasy spoon cafe what is the point? – It is their decision. They don’t need experiments, they only need the opening of the scientific method, the bit that is too often overlooked. Some might call it “Brilliant Basics” but you don’t need to be brilliant, just “Good Basics.” (Remember my In Search of Mediocracy post?). I think the scientific method is sometimes, rightly or wrongly, used as a backdoor in an attempt to introduce change. To lower resistance and get individuals and teams to try something new: “Lets try X for a month and then decide if it works.” Thats can be a legitimate approach. But dressing it up in the language of science feels dishonest. Lets have less talk about “The Scientific Method” and more talk about “Tidying up the Kitchen” – or is it better in French? Mise en place…. come to think of it, don’t the Lean community have Japanese word for this? Pika pika.Reference: What we forget about the Scientific Method from our JCG partner Allan Kelly at the Agile, Lean, Patterns blog....
junit-logo

An alternative approach of writing JUnit tests (the Jasmine way)

Recently I wrote a lot of Jasmine tests for a small personal project. It took me some time until I finally got the feeling of getting the tests right. After this, I always have a hard time when switching back to JUnit tests. For some reason JUnit tests did no longer feel that good and I wondered if it would be possible to write JUnit tests in a way similar to Jasmine. Jasmine is a popular Behavior Driven Development testing framework for JavaScript that is inspired by RSpec (a Ruby BDD testing Framework). A simple Jasmine test looks like this:   describe('AudioPlayer tests', function() {   var player;  beforeEach(function() {     player = new AudioPlayer();   });      it('should not play any track after initialization', function() {     expect(player.isPlaying()).toBeFalsy();   });      ... }); The describe() function call in the first line creates a new test suite using the description AudioPlayer tests. Inside a test suite we can use it() to create tests (called specs in Jasmine). Here, we check if the isPlaying() method of AudioPlayer returns false after creating a new AudioPlayer instance. The same test written in JUnit would look like this: public class AudioPlayerTest {   private AudioPlayer audioPlayer;  @Before    public void before() {     audioPlayer = new AudioPlayer();   }  @Test   void notPlayingAfterInitialization() {     assertFalse(audioPlayer.isPlaying());   }      ... } Personally I find the Jasmine test much more readable compared to the JUnit version. In Jasmine the only noise that does not contribute anything to the test are the braces and the function keyword. Everything else contains some useful information. When reading the JUnit test we can ignore keywords like void, access modifiers (private, public, ..), annotations and irrelevant method names (like the name of the method annotated with @Before). In addition to that, test descriptions encoded in camel case method names are not that great to read. Besides increased readability I really like Jasmine’s ability of nesting test suites. Let’s look at an example that is a bit longer: describe('AudioPlayers tests', function() {   var player;  beforeEach(function() {     player = new AudioPlayer();   });      describe('when a track is played', function() {     var track;        beforeEach(function() {       track = new Track('foo/bar.mp3')       player.play(track);     });          it('is playing a track', function() {       expect(player.isPlaying()).toBeTruthy();     });          it('returns the track that is currently played', function() {       expect(player.getCurrentTrack()).toEqual(track);     });   });      ... }); Here we create a sub test suite that is responsible for testing the behavior when a Track is played by AudioPlayer. The inner beforeEach() call is used to set up a common precondition for all tests inside the sub test suite. In contrast, sharing common preconditions for multiple (but not all) tests in JUnit can become cumbersome sometimes. Of course duplicating the setup code in tests is bad, so we create extra methods for this. To share data between setup and test methods (like the track variable in the example above) we then have to use member variables (with a much larger scope). Additionally we should make sure to group tests with similar preconditions together to avoid the need of reading the whole test class to find all relevant tests for a certain situation. Or we can split things up into multiple smaller classes. But then we might have to share setup code between these classes… If we look at Jasmine tests we see that the structure is defined by calling global functions (like describe(), it(), …) and passing descriptive strings and anonymous functions. With Java 8 we got Lambdas, so we can do the same right? Yes, we can write something like this in Java 8: public class AudioPlayerTest {   private AudioPlayer player;      public AudioPlayerTest() {     describe("AudioPlayer tests", () -> {       beforeEach(() -> {         player = new AudioPlayer();       });      it("should not play any track after initialization", () -> {         expect(player.isPlaying()).toBeFalsy();       });     });   } } If we assume for a moment that describe(), beforeEach(), it() and expect() are statically imported methods that take appropriate parameters, this would at least compile. But how should we run this kind of test? For interest I tried to integrate this with JUnit and it turned out that this actually very easy (I will write about this in the future). The result so far is a small library called Oleaster. A test written with Oleaster looks like this: import static com.mscharhag.oleaster.runner.StaticRunnerSupport.*; ...@RunWith(OleasterRunner.class) public class AudioPlayerTest {   private AudioPlayer player;      {     describe("AudioPlayer tests", () -> {       beforeEach(() -> {         player = new AudioPlayer();       });            it("should not play any track after initialization", () -> {         assertFalse(player.isPlaying());       });     });   } } Only a few things changed compared to the previous example. Here, the test class is annotated with the JUnit @RunWith annotation. This tells JUnit to use Oleaster when running this test class. The static import of StaticRunnerSupport.* gives direct access to static Oleaster methods like describe() or it(). Also note that the constructor was replaced by an instance initializer and the Jasmine like matcher is replaced with by a standard JUnit assertion. There is actually one thing that is not so great compared to original Jasmine tests. It is the fact that in Java a variable needs to be effectively final to use it inside a lambda expression. This means that the following piece of code does not compile: describe("AudioPlayer tests", () -> {   AudioPlayer player;   beforeEach(() -> {     player = new AudioPlayer();   });   ... }); The assignment to player inside the beforeEach() lambda expression will not compile (because player is not effectively final). In Java we have to use instance fields in situations like this (like shown in the example above). In case you worry about reporting: Oleaster is only responsible for collecting test cases and running them. The whole reporting is still done by JUnit. So Oleaster should cause no problems with tools and libraries that make use of JUnit reports. For example the following screenshot shows the result of a failed Oleaster test in IntelliJ IDEA:If you wonder how Oleaster tests look in practice you can have a look at the tests for Oleaster (which are written in Oleaster itself). You can find the GitHub test directory here. Feel free to add any kind of feedback by commenting to this post or by creating a GitHub issue.Reference: An alternative approach of writing JUnit tests (the Jasmine way) from our JCG partner Michael Scharhag at the mscharhag, Programming and Stuff blog....
jsf-logo

How to get JSON response from JSF?

Many JavaScript widgets expect data and options in JSON format. Nowadays, it is really easy to choose a cool widget and wrap it in a composite component. But the first question is how to send an AJAX request and to recieve a response in a proper JSON format. This question is often raised by JSF users. All that you need is a XHTML facelet like this one:             <f:view encoding="UTF-8" contentType="text/html" xmlns="http://www.w3.org/1999/xhtml" xmlns:h="http://xmlns.jcp.org/jsf/html" xmlns:f="http://xmlns.jcp.org/jsf/core"> <h:outputText value="#{stationView.getClosestStations(param.longitude, param.latitude)}" escape="false"/> </f:view> Please consider the contentType=”text/html” (application/json will not work here) and escape=”false” in the h:outputText. The method getClosestStations() in the bean StationView produces an JSON output for a list of special Java objects. I advise to use the Gson library in order to serialize any Java object to JSON. Short example: String[] strings = {"abc", "def", "ghi"}; Gson gson = new Gson(); gson.toJson(strings); ==> prints ["abc", "def", "ghi"] The XHTML file above is located under the web context. Say, under the path /rest/stations.xhtml. Ajax call in your JavaScript code should look like this one: $.ajax({ url: requestContextPath + '/rest/stations.xhtml', type: "GET", data: { "longitude": x, "latitude": y }, dataType: "json", success: function (data) { $.each(data, function (i, station) { ... }); }, error: function () { ... } }); Please refer the jQuery docu for more information regarding $.ajax. Note: if you omit dataType: “json”, you have to parse the JSON string manually. success: function (data) { $.each($.parseJSON(data), function (i, station) { ... }); } The response is a pure JSON string (no HTML tags) like this one: [{"latitude":46.947045,"longitude":7.443922,"distanz":110,"name":"Bern, Bundesplatz"},{....},...] Need more examples for JSON response in JSF? In one of my next posts I will probably explain how to implement a cool autocomplete component without writing too much code.Reference: How to get JSON response from JSF? from our JCG partner Oleg Varaksin at the Thoughts on software development blog....
java-logo

Java Debuggers and Timeouts

How to use your debugger in the presence of timeouts in your code. My kingdom for a debugger! So you’ve been coding away merrily on a project and everything is going well until a bug appears. You reach into your developer’s toolbox and pull out a debugger. It’s great – you can set breakpoints, you can interrupt when there’s an exception and you can inspect expressions at runtime. Whatever challenge awaits, you can be sure that a debugger will help! Unfortunately life isn’t that easy. A lot of code needs to have some kind of timeout - an event that happens after a period of time. The problem with this is that timeouts tend to ruin the debugging experience. You’re sitting there looking at your breakpoint, thinking “Now why is x 2 instead of 1?” Poof! The timeout kicks in and you are no longer able to continue. Even worse the JVM itself quits! So you go through the process of increasing your timeout, debugging and fixing your problem. Afterwards you either return the timeout to its original setting and have to go through the same tedious process again or accidentally commit the fix into your source tree thus breaking a test or maybe even production. To me this seems less than ideal. “For somehow this is timeout’s disease, to trust no friends” There are many reasons that people introduce timeouts. I’ve listed a few below, a couple of good and a couple of bad, and I’m sure you can think of a few more yourself.Checking that an asynchronous event has been responded to within a certain period of time. Avoiding starvation of a time based resource, such as a thread pool. You’ve got a race condition that needs a quick fix. You are waiting for an event to happen and decide to hard code an assumption about how long it’ll take. (Can be most frequently spotted in tests)Now obviously if your timeout has been introduced as a hack then it’s a good time to clean and boy-scout the code. If you need to rely on an event happening in tests then you should treat those tests as clients of your API and be able to know when the event has occurred. This might involve injecting a mock which gets called when an event happens or subscribing to a stream of events. If you’ve got a race condition – fix it! I know it’s painful and hard but do you really want a ticking timebomb in your codebase ready to generate a support call at 3am? Managing your timeouts Having said that we should remove the bad uses of timeouts, it’s pretty clear that are perfectly legitimate uses of timeouts. They are especially common in event driven and asynchronous code. It would still be good to be able to debug with them around. Good practice regardless of other factors is to be able to standardise your timeouts into configuration properties which can be set at runtime. This lets you easily alter them when running in a local IDE vs production. It can also help with managing the different performance properties that you encounter from differing hardware setups. Having externalised your timeouts into configuration from your code, you can then detect whether your code is running inside a debugger and set timeouts to significantly longer periods if this is the case. The trick to doing this is to recognise that a debugger involves running a Java agent, which modifies the command-line arguments of the program that it runs under. You can check whether these command-line arguments contain the right agent matcher. The following code snippet shows how to do this and has been tested to work under both eclipse and Intellij IDEA. RuntimeMXBean runtimeMXBean = ManagementFactory.getRuntimeMXBean(); String jvmArguments = runtimeMXBean.getInputArguments().toString(); boolean hasDebuggerAttached = jvmArguments.contains("-agentlib:jdwp"); I can see why some people would view it as a hack as well, you’re actively discovering something about your environment by looking at your own command-line arguments and then adapting around it. From my perspective, I’ve found this to be a useful technique. It does make it easier to debug in the presence of timeouts.Reference: Java Debuggers and Timeouts from our JCG partner Richard Warburton at the Insightful Logic blog....
javafx-logo

JavaFX Tip 8: Beauty Is Skin Deep

If you are developing a UI framework for JavaFX, then please make it a habit to always split your custom controls into a control class and a skin class. Coming from Swing myself this was not obvious to me right away. Swing also uses an MVC concept and delegates the actual component rendering to a UI delegate, but people extending Swing mostly subclassed one of its controls and added extensions / modifications to the subclass. Only very few frameworks actually worked with the UI delegates (e.g. MacWidgets). I have the luxury of being able to compare the implementation of the same product / control once done in Swing and once done in JavaFX and I noticed that the JavaFX implementation is so much cleaner, largely because of the splitting in controls and skins (next in row: CSS styling and property binding). In Swing I was exposing a lot of things to the framework user that I personally considered “implementation detail” but that became public API nevertheless. The JavaFX architecture makes it much more obvious where the framework developer draws the line between public and internal API. The Control The control class stores the state of the control and provides methods to interact with it. State information can be: the data visualized by the control (e.g. the items in TableView), visual attributes (show this, hide that), factories (e.g. cell factories). Interaction can be: scroll to an item, show a given time, do this, do that. The control class is the contract between your framework code and the application using the framework. It should be well designed, clean, stable, and final. The Skin This is the place to go nuts, the Wild West. The skin creates the visual representation of your control by composing already existing controls or by extending the very basic classes, such as Node or Region. Skins are often placed in separate packages with package names that imply that the API contained inside of them is not considered for public use. If somebody does use them then at their own risk because the framework developer (you) might decide to change them from release to release.Reference: JavaFX Tip 8: Beauty Is Skin Deep from our JCG partner Dirk Lemmermann at the Pixel Perfect blog....
enterprise-java-logo

From framework to platform

When I started my career as a Java developer close to 10 years ago, the industry is going through a revolutionary change. Spring framework, which was released in 2003, was quickly gaining ground and became a serious challenger to the bulky J2EE platform. Having gone through the transition time, I quickly found myself in favour of Spring framework instead of J2EE platform, even the earlier versions of Spring are very tedious to declare beans. What happened next is the revamping of J2EE standard, which was later renamed to JEE. Still, dominating of this era is the use of opensource framework over the platform proposed by Sun. This practice gives developers full control over the technologies they used but inflating the deployment size. Slowly, when cloud application become the norm for modern applications, I observed the trend of moving the infrastructure service from framework to platform again. However, this time, it is not motivated by Cloud application. Framework vs Platform I have never heard of or had to used any framework in school. However, after joining the industry, it is tough to build scalable and configurable software without the help of any framework. From my understanding, any application is consist of codes that implement business logic and some other codes that are helpers, utilities or to setup infrastructure. The codes that are not related to business logic, being used repetitively in many projects, can be generalised and extracted for reuse. The output of this extraction process is framework. To make it shorter, framework is any codes that is not related to business logic but helps to dress common concerns in applications and fit to be reused. If following this definition then MVC, Dependency Injection, Caching, JDBC Template, ORM are all consider frameworks. Platform is similar to framework as it also helps to dress common concerns in applications but in contrast to framework, the service is provided outside the application. Therefore, a common service endpoint can serve multiple applications at the same time. The services provided by JEE application server or Amazon Web Services are sample of platforms. Compare the two approaches, platform is more scalable, easier to use than framework but it also offers less control. Because of these advantage, platform seem to be the better approach to use when we build Cloud Application. When should we use platform over framework Moving toward platform does not guarantee that developers will get rid of framework. Rather, platform only complements framework in building applications. However, one some special occasions we have a choice to use platform or framework to achieve final goal.  From my personal opinion, platform is greater that framework when following conditions are matched:Framework is tedious to use and maintain The service has some common information to be shared among instances. Can utilize additional hardware to improve performance.In office, we still uses Spring framework, Play framework or RoR in our applications and this will not change any time soon. However, to move to Cloud era, we migrated some of our existing products from internal hosting to Amazon EC2 servers. In order to make the best use of Amazon infrastructure and improve software quality, we have done some major refactoring to our current software architecture. Here are some platforms that we are integrating our product to: Amazon Simple Storage Service (Amazon S3) &  Amazon Cloud Front We found that Amazon Cloud Front is pretty useful to boost average response time for our applications. Previously, we host most of the applications in our internal server farms, which located in UK and US. This lead to noticeable increase in response time for customers in other continents. Fortunately, Amazon has much greater infrastructure with server farms built all around the worlds. That helps to guarantee a constant delivery time for package, no matter customer locations. Currently, due to manual effort to setup new instance for applications, we feel that the best use for Amazon Cloud Front is with static contents, which we host separately from application in Amazon S3. This practice give us double benefit in performance with more consistent delivery time offered by the CDN plus the separate connection count in browser for the static content. Amazon Elastic Cache Caching has never been easy on cluster environment. The word “cluster” means that your object will not be stored and retrieve from system memory. Rather, it was sent and retrieved over the network. This task was quite tricky in the past because developers need to sync the records from one node to another node. Unfortunately, not all caching framework support this feature automatically. Our best framework for distributed caching was Terracotta. Now, we turned to Amazon Elastic Cache because it is cheap, reliable and save us the huge effort for setting up and maintain distributed cache. It is worth to highlight that distributed caching is never mean to replace local cache. The difference in performance suggest that we should only use distributed caching over local caching when user need to access real-time temporary data. Event Logging for Data Analytics In the past, we used Google Analytics for analysing user behaviour but later decided to build internal data warehouse. One of the motivation is the ability to track events from both browsers and servers. The Event Tracking system uses MongoDB as the database as it allow us to quickly store huge amount of events. To simplify the creation and retrieval of events, we choose JSON as the format for events. We cannot simply send this event directly to event tracking server due to browser prevention of cross-domain attack. For this reason, Google Analytic send the events to server under the form of a GET request for static resource. As we have the full control over how the application was built, we choose to let the events send back to application server first and route to event tracking server later. This approach is much more convenient and powerful. Knowledge Portal In the past, applications access data from database or internal file repository. However, to be able to scale better, we gathered all knowledge to build a knowledge portal. We also built query language to retrieve knowledge from this portal. This approach add one additional layer to the knowledge retrieval process but fortunately for us, our system does not need to serve real time data. Therefore, we can utilize caching to improve performance. Conclusion Above is some of our experience on transforming software architecture when moving to the Cloud. Please share with us your experience and opinion.Reference: From framework to platform from our JCG partner Nguyen Anh Tuan at the Developers Corner blog....
java-logo

How to use bloom filter to build a large in memory cache in Java

BackGround Cache is an important concept to solve day to day software problems. Your application may perform CPU intensive operations, which you do not want to perform again and again, instead you derive the result once and cache it in memory. Sometimes the bottleneck is IO, like you do not want to hit the database repeatedly and would like to cache the results and update the cache only if underlying data changes. Similarly there are other use cases where we need to perform a quick lookup to decide what to do with an incoming request. For example, consider this use case where you have to identify that one URL points to a malware site or not. There could be many URLs like that, to do it in an instance, if we cache all the malware URLs in memory, that would require a lot of space to hold them. Or another use case could be to identify if a user typed string has any reference to a place in USA. Like “museum in Washington” – in this string, Washington is a name of a place in USA. Should we keep all the places in USA in memory and then lookup? How big the cache size would be? Is it effective to do it without any database support? This is where we need to move away from basic map data structure and look for answers in more advanced data structure like bloomfilter. You can consider bloomfilter, like any other java collection where you can put items in it and ask it whether an item already present in it or not (like a HashSet). If Bloomfilter mentions that it does not contain the item, then definitely that item is not present. But if it mentions that it has seen the item, then that may be wrong. If we are careful enough, we can design a bloomfilter such that the probability of the wrong is controlled. Explanation Bloomfilter is designed as an array(A) of m bits. Initially all these bits are set to 0. To add item: In order to add any item, it needs to be feed through k hash functions. Each hash function will generate a number which can be treated as a position of the bit array (hash modulo array length can give us the index of the array) and we shall set that value of that position to 1. For example – first hash function(hash1) on item I produce a bit position x, similarly second and third hash functions produce position y and z. So we shall set: A[x]=A[y]=A[z] = 1 To find item: Similar process will be repeated, item will be hashed three times through three different hash functions. Each hash function will produce an integer which will be treated as a position of the array. We shall inspect those x,y, z positions of the bit array and see if they are set to 1 or not. If no, for sure no one ever tried to add this item into bloomfilter, but if all the bits are set, it could be a false positive. Things to tune From the above explanation, it becomes clear that to design a good bloomfilter we need to keep track of the following thingsGood hash functions that can generate wide range of hash values as quickly as possible The value of m (size of the bit array) is very important. If the size is too small, all the bits will be set to 1 quickly and false positives will grow largely. Number of hash functions(k) is also important so that the values get distributed evenly.If we can estimate how many items we are planning to keep in the bloom filter, we can calculate the optimal values of k and m. Skipping the mathematical details, the formula to calculate k and m are enough for us to write a good bloomfilter. Formula to determine m (number of bits for the bloom filter) is as bellow: m = - nlogp / (log2)^2; where p = desired false positive probability Formula to determine k (number of hash functions) is as bellow: k = m/n log(2) ; where k = number of hash functions, m=number of bits and n= number of items in the filter Hashing Hashing is an area which affects the performance of bloomfilter. We need to choose a hash function that is effective yet not time consuming. In the paper, “Less Hashing, Same Performance: Building a Better Bloom Filter” it is discussed how we can use two hash functions to generate K number of hash functions. First we need to calculate two hash function h1(x) and h2(x). Next, we can use these two hash functions to simulate k hash functions of the nature gi(x) = h1(x) + ih2(x); where i can range from {1..k} Google guava library uses this trick in their bloomfilter implementation, the hashing logic is outlined here : long hash64 = …; //calculate a 64 bit hash function //split it in two halves of 32 bit hash values int hash1 = (int) hash64; int hash2 = (int) (hash64 >>> 32); //Generate k different hash functions with a simple loop for (int i = 1; i <= numHashFunctions; i++) { int nextHash = hash1 + i * hash2; }Applications It is clear from the mathematical formulas that to apply bloomfilter to solve a problem, we need to understand the domain very well. Like we can apply bloomfilter to hold all the cities name in usa. This number is deterministic and we have prior knowledge, so we can determine n (total number of elements to be added to the bloomfilter). Fix p(probability of false positive) according to business requirements. In that case, we have a perfect cache which is memory efficient and lookup time is very low. Implementations Google guava library has an implementation of Bloomfilter. Check how the constructor of this class, which asks for expected items and false positive rate. import com.google.common.hash.BloomFilter; import com.google.common.hash.Funnels;//Create Bloomfilter int expectedInsertions = ….; double fpp = 0.03; // desired false positive probability BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("UTF-8")), expectedInsertions,fpp)Resources:http://en.wikipedia.org/wiki/Bloom_filter http://billmill.org/bloomfilter-tutorial/ http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf...
spring-logo

Spring, REST, Ajax and CORS

Assuming you’re working on a project based on JavaScript for the client side and who makes ajax requests to a server through rest web services, you may encounter some troubles especially if both sides are on a separate domain. Indeed, for security reasons, ajax requests from one domain A to a different domain B are not authorized. Fortunately, the W3C introduced what is known as CORS (Cross Origin Resource Sharing) which offers the possibility for a server to have a better control of cross domain requests. To do that, the server must add HTTP headers to the response, indicating to the client side which are the allowed origins. Moreover, if you use custom headers, you browser will not be able to read them for security matters, so you must specify which headers to expose. So, if in your JavaScript code you can’t retrieve your custom http header value, you should read what comes next List of headers: Access-Control-Allow-Origin Access-Control-Allow-Origin: <origin> | * The origin parameter specifies a URI that may access the resource.  The browser must enforce this.  For requests without credentials, the server may specify “*” as a wildcard, thereby allowing any origin to access the resource. Access-Control-Expose-Headers Access-Control-Expose-Headers: X-My-Header This header lets a server whitelist headers that browsers are allowed to access. It is very usefull when you add custom headers, because by adding them to the ” Access-Control-Expose-Headers” header you can be sure that your browser will be able to read them. Access-Control-Max-Age Access-Control-Max-Age: <delta-seconds> This header indicates how long the results of a preflight request can be cached. Access-Control-Allow-Methods Access-Control-Allow-Methods: <method>[, <method>]* Specifies the method or methods allowed when accessing the resource.  This is used in response to a preflight request.  The conditions under which a request is preflighted are discussed above. Access-Control-Allow-Headers Access-Control-Allow-Headers: <field-name>[, <field-name>]* Used in response to a preflight request to indicate which HTTP headers can be used when making the actual request. Now let’s see how to add this headers with Spring First we need to create a class implementing the Filter interface: package hello; import java.io.IOException; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.http.HttpServletResponse; import org.springframework.stereotype.Component;public class CORSFilter implements Filter {public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException { HttpServletResponse response = (HttpServletResponse) res; HttpServletRequest request= (HttpServletRequest) req;response.setHeader("Access-Control-Allow-Origin", "*"); response.setHeader("Access-Control-Allow-Methods", "POST, GET, OPTIONS, DELETE"); response.setHeader("Access-Control-Allow-Headers", "x-requested-with"); response.setHeader("Access-Control-Expose-Headers", "x-requested-with"); chain.doFilter(req, res); } }Now, we just have to add our filter to the servlet context: @Configuration public class ServletConfigurer implements ServletContextInitializer { @Override public void onStartup(javax.servlet.ServletContext servletContext) throws ServletException { servletContext.addFilter("corsFilter", new CORSFilter()); } }And that’s all folks, you’re now able to make cross domain requests and use custom http headers! ...
enterprise-java-logo

HBase: Generating search click events statistics for customer behavior

In this post we will explore HBase to store customer search click events data and utilizing same to derive customer behavior information based on search query string and facet filter clicks. We will cover to use MiniHBaseCluster, HBase Schema design, integration with Flume using HBaseSink to store JSON data.               In continuation to the previous posts on,Customer product search clicks analytics using big data, Flume: Gathering customer product search clicks data using Apache Flume, Hive: Query customer top search query and product views count using Apache Hive, ElasticSearch-Hadoop: Indexing product views count and customer top search query from Hadoop to ElasticSearch, Oozie: Scheduling Coordinator/Bundle jobs for Hive partitioning and ElasticSearch indexing, Spark: Real time analytics for big data for top search queries and top product viewsWe have explored to store search click events data in Hadoop and to query same using different technologies. Here we will use HBase to achieve the same: HBase mini cluster setup  HBase template using Spring Data  HBase Schema Design  Flume Integration using HBaseSink  HBaseJsonSerializer to serialize json data  Query Top 10 search query string in last an hour  Query Top 10 search facet filter in last an hour  Get recent search query string for a customer in last 30 daysHBase HBase  “is the Hadoop database, a distributed, scalable, big data store.” HBaseMiniCluster/MiniZookeperCluster To setup and start mini cluster, Check HBaseServiceImpl.java ... miniZooKeeperCluster = new MiniZooKeeperCluster(); miniZooKeeperCluster.setDefaultClientPort(10235); miniZooKeeperCluster.startup(new File("taget/zookeper/dfscluster_" + UUID.randomUUID().toString()).getAbsoluteFile()); ... Configuration config = HBaseConfiguration.create(); config.set("hbase.tmp.dir", new File("target/hbasetom").getAbsolutePath()); config.set("hbase.master.port", "44335"); config.set("hbase.master.info.port", "44345"); config.set("hbase.regionserver.port", "44435"); config.set("hbase.regionserver.info.port", "44445"); config.set("hbase.master.distributed.log.replay", "false"); config.set("hbase.cluster.distributed", "false"); config.set("hbase.master.distributed.log.splitting", "false"); config.set("hbase.zookeeper.property.clientPort", "10235"); config.set("zookeeper.znode.parent", "/hbase");miniHBaseCluster = new MiniHBaseCluster(config, 1); miniHBaseCluster.startMaster(); ... MiniZookeeprCluster is started on client port 10235, all client connections will be on this port. Make sure to configure the hbase server port not colliding with your other local hbase server. Here we are only starting one hbase region server in the test case. HBase Template using Spring Data We will be using Spring hbase template to connect to HBase cluster: <hdp:hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration" stop-proxy="false" delete-connection="false" zk-quorum="localhost" zk-port="10235"> </hdp:hbase-configuration> <bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HBaseTemplate" p:configuration-ref="hbaseConfiguration" /> HBase Table Schema Design We have search click event JSON data in the following format,{"eventid":"24-1399386809805-629e9b5f-ff4a-4168-8664-6c8df8214aa7","hostedmachinename":"192.168.182.1330","pageurl":"http://blahblah:/5&quot;,"customerid":24,"sessionid":"648a011d-570e-48ef-bccc-84129c9fa400","querystring":null,"sortorder":"desc","pagenumber":3,"totalhits":28,"hitsshown":7,"createdtimestampinmillis":1399386809805,"clickeddocid":"41","favourite":null,"eventidsuffix":"629e9b5f-ff4a-4168-8664-6c8df8214aa7","filters":[{"code":"searchfacettype_color_level_2","value":"Blue"},{"code":"searchfacettype_age_level_2","value":"12-18 years"}]}One way to handle the data is to directly store it under one column family and json column. It won’t be easy and flexible to scan the json data that way. Another option can be to store it under one column family but to have different columns. But storing filters data in single column will be hard to scan. The hybrid approach below is to divide it under multiple column family and dynamically generate columns for filters data. The converted schema is: { "client:eventid" => "24-1399386809805-629e9b5f-ff4a-4168-8664-6c8df8214aa7", "client:eventidsuffix" => "629e9b5f-ff4a-4168-8664-6c8df8214aa7", "client:hostedmachinename" => "192.168.182.1330", "client:pageurl" => "http://blahblah:/5", "client:createdtimestampinmillis" => 1399386809805, "client:cutomerid" => 24, "client:sessionid" => "648a011d-570e-48ef-bccc-84129c9fa400", "search:querystring" => null, "search:sortorder" => desc, "search:pagenumber" => 3, "search:totalhits" => 28, "search:hitsshown" => 7, "search:clickeddocid" => "41", "search:favourite" => null, "filters:searchfacettype_color_level_2" => "Blue", "filters:searchfacettype_age_level_2" => "12-18 years" } The following three column family are created:client: To store client and customer data specific information for the event. search: search information related to query string and pagination information is stored here. filters: To support additional facets in future etc. and more flexible scanning of data, the column names are dynamically created based on facet name/code and the column value is stored as facet filter value.To create the hbase table, ... TableName name = TableName.valueOf("searchclicks"); HTableDescriptor desc = new HTableDescriptor(name); desc.addFamily(new HColumnDescriptor(HBaseJsonEventSerializer.COLUMFAMILY_CLIENT_BYTES)); desc.addFamily(new HColumnDescriptor(HBaseJsonEventSerializer.COLUMFAMILY_SEARCH_BYTES)); desc.addFamily(new HColumnDescriptor(HBaseJsonEventSerializer.COLUMFAMILY_FILTERS_BYTES)); try { HBaseAdmin hBaseAdmin = new HBaseAdmin(miniHBaseCluster.getConf()); hBaseAdmin.createTable(desc); hBaseAdmin.close(); } catch (IOException e) { throw new RuntimeException(e); } ... Relevant column family has been added on table creation to support new data structure. In general, it is recommended to keep the number of column family as minimum as possible, keep in mind how you structure your data based on the usage. Based on above examples, we have kept the scan scenario like:scan client family in case you want to retrieve customer or client information based on total traffic information on the website. scan search information to see what free text search the end customers are looking for which are not met by the navigational search. See on which page the relevant product was clicked, do you need boosting to apply to push the product high. scan filters family to see how the navigational search is working for you. Is it giving end customers the product they are looking for. See which facet filters are clicked more and do you need to push to up a bit in the ordering to be available easily to the customer. scan between family should be avoided and use row key design to achieve specific customer info.Row key design info In our case the row key design is based on customerId-timestamp -randomuuid. As the row key is same for all the column family, we can use Prefix Filter to filter on row only relevant to a specific customer. final String eventId = customerId + "-" + searchQueryInstruction.getCreatedTimeStampInMillis() + "-" + searchQueryInstruction.getEventIdSuffix(); ... byte[] rowKey = searchQueryInstruction.getEventId().getBytes(CHARSET_DEFAULT); ... # 24-1399386809805-629e9b5f-ff4a-4168-8664-6c8df8214aa7 Each column family here will have same row key, and you can use prefix filter to scan rows only for a particular customer. Flume Integration HBaseSink is used to store search events data directly to HBase. Check details, FlumeHBaseSinkServiceImpl.java ... channel = new MemoryChannel(); Map<String, String> channelParamters = new HashMap<>(); channelParamters.put("capacity", "100000"); channelParamters.put("transactionCapacity", "1000"); Context channelContext = new Context(channelParamters); Configurables.configure(channel, channelContext); channel.setName("HBaseSinkChannel-" + UUID.randomUUID());sink = new HBaseSink(); sink.setName("HBaseSink-" + UUID.randomUUID()); Map<String, String> paramters = new HashMap<>(); paramters.put(HBaseSinkConfigurationConstants.CONFIG_TABLE, "searchclicks"); paramters.put(HBaseSinkConfigurationConstants.CONFIG_COLUMN_FAMILY, new String(HBaseJsonEventSerializer.COLUMFAMILY_CLIENT_BYTES)); paramters.put(HBaseSinkConfigurationConstants.CONFIG_BATCHSIZE, "1000"); paramters.put(HBaseSinkConfigurationConstants.CONFIG_SERIALIZER, HBaseJsonEventSerializer.class.getName());Context sinkContext = new Context(paramters); sink.configure(sinkContext); sink.setChannel(channel);sink.start(); channel.start(); ... Client column family is used only for validation by HBaseSink. HBaseJsonEventSerializer Custom serializer is created to store the JSON data: public class HBaseJsonEventSerializer implements HBaseEventSerializer { public static final byte[] COLUMFAMILY_CLIENT_BYTES = "client".getBytes(); public static final byte[] COLUMFAMILY_SEARCH_BYTES = "search".getBytes(); public static final byte[] COLUMFAMILY_FILTERS_BYTES = "filters".getBytes(); ... byte[] rowKey = searchQueryInstruction.getEventId().getBytes(CHARSET_DEFAULT); Put put = new Put(rowKey);// Client Infor put.add(COLUMFAMILY_CLIENT_BYTES, "eventid".getBytes(), searchQueryInstruction.getEventId().getBytes()); ... if (searchQueryInstruction.getFacetFilters() != null) { for (SearchQueryInstruction.FacetFilter filter : searchQueryInstruction.getFacetFilters()) { put.add(COLUMFAMILY_FILTERS_BYTES, filter.getCode().getBytes(),filter.getValue().getBytes()); } } ... Check further details, HBaseJsonEventSerializer.java The events body is converted to Java bean from Json and further the data is processed to be serialized in relevant column family. Query Raw Cell data To query the raw cell data: ... Scan scan = new Scan(); scan.addFamily(HBaseJsonEventSerializer.COLUMFAMILY_CLIENT_BYTES); scan.addFamily(HBaseJsonEventSerializer.COLUMFAMILY_SEARCH_BYTES); scan.addFamily(HBaseJsonEventSerializer.COLUMFAMILY_FILTERS_BYTES); List<String> rows = hbaseTemplate.find("searchclicks", scan, new RowMapper<String>() { @Override public String mapRow(Result result, int rowNum) throws Exception { return Arrays.toString(result.rawCells()); } }); for (String row : rows) { LOG.debug("searchclicks table content, Table returned row: {}", row); } Check HBaseServiceImpl.java  for details. The data is stored in hbase in the following format: searchclicks table content, Table returned row: [84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:createdtimestampinmillis/1404832918166/Put/vlen=13/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:customerid/1404832918166/Put/vlen=2/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:eventid/1404832918166/Put/vlen=53/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:hostedmachinename/1404832918166/Put/vlen=16/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:pageurl/1404832918166/Put/vlen=19/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/client:sessionid/1404832918166/Put/vlen=36/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/filters:searchfacettype_product_type_level_2/1404832918166/Put/vlen=7/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/search:hitsshown/1404832918166/Put/vlen=2/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/search:pagenumber/1404832918166/Put/vlen=1/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/search:querystring/1404832918166/Put/vlen=13/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/search:sortorder/1404832918166/Put/vlen=3/mvcc=0, 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923/search:totalhits/1404832918166/Put/vlen=2/mvcc=0] Query Top 10 search query string in last an hour To query only search string, we only need search column family. To scan within time range, either we can use the client column family createdtimestampinmillis column but it will be expansive scan. ... Scan scan = new Scan(); scan.addColumn(HBaseJsonEventSerializer.COLUMFAMILY_CLIENT_BYTES, Bytes.toBytes("createdtimestampinmillis")); scan.addColumn(HBaseJsonEventSerializer.COLUMFAMILY_SEARCH_BYTES, Bytes.toBytes("querystring")); List<String> rows = hbaseTemplate.find("searchclicks", scan, new RowMapper<String>() { @Override public String mapRow(Result result, int rowNum) throws Exception { String createdtimestampinmillis = new String(result.getValue(HBaseJsonEventSerializer.COLUMFAMILY_CLIENT_BYTES, Bytes.toBytes("createdtimestampinmillis"))); byte[] value = result.getValue(HBaseJsonEventSerializer.COLUMFAMILY_SEARCH_BYTES, Bytes.toBytes("querystring")); String querystring = null; if (value != null) { querystring = new String(value); } if (new DateTime(Long.valueOf(createdtimestampinmillis)).plusHours(1).compareTo(new DateTime()) == 1 && querystring != null) { return querystring; } return null; } }); ... //sort the keys, based on counts collection of the query strings. List<String> sortedKeys = Ordering.natural().onResultOf(Functions.forMap(counts)).immutableSortedCopy(counts.keySet()); ... Query Top 10 search facet filter in last an hour Based on dynamic column creation, you can scan the data to return the top clicked facet filters. The dynamic columns will be based on your facet codes which can be any of: #searchfacettype_age_level_1 #searchfacettype_color_level_2 #searchfacettype_brand_level_2 #searchfacettype_age_level_2 for (String facetField : SearchFacetName.categoryFacetFields) { scan.addColumn(HBaseJsonEventSerializer.COLUMFAMILY_FILTERS_BYTES, Bytes.toBytes(facetField)); } To retrieve to: ... hbaseTemplate.find("searchclicks", scan, new RowMapper<String>() { @Override public String mapRow(Result result, int rowNum) throws Exception { for (String facetField : SearchFacetName.categoryFacetFields) { byte[] value = result.getValue(HBaseJsonEventSerializer.COLUMFAMILY_FILTERS_BYTES, Bytes.toBytes(facetField)); if (value != null) { String facetValue = new String(value); List<String> list = columnData.get(facetField); if (list == null) { list = new ArrayList<>(); list.add(facetValue); columnData.put(facetField, list); } else { list.add(facetValue); } } } return null; } }); ... You will get the full list of all facets, you can process the data further to count top facets and ordering same. For full details check, HBaseServiceImpl.findTopTenSearchFiltersForLastAnHour Get recent search query string for a customer If we need to check what is customer is currently looking for, we can create a scan between two column family between “client” and “search”. Or another way is to design the row key in a way to give you relevant information. In our case the row key design is based on CustomerId_timestamp _randomuuid. As the row key is same for all the column family, we can use Prefix Filter to filter on row only relevant to a specific customer. final String eventId = customerId + "-" + searchQueryInstruction.getCreatedTimeStampInMillis() + "-" + searchQueryInstruction.getEventIdSuffix(); ... byte[] rowKey = searchQueryInstruction.getEventId().getBytes(CHARSET_DEFAULT); ... # 84-1404832902498-7965306a-d256-4ddb-b7a8-fd19cdb99923 To scan the data for a particular customer, ... Scan scan = new Scan(); scan.addColumn(HBaseJsonEventSerializer.COLUMFAMILY_SEARCH_BYTES, Bytes.toBytes("customerid")); Filter filter = new PrefixFilter(Bytes.toBytes(customerId + "-")); scan.setFilter(filter); ... For details check HBaseServiceImpl.getAllSearchQueryStringsByCustomerInLastOneMonth Hope this helps you to get hands on HBase schema design and handling data.Reference: HBase: Generating search click events statistics for customer behavior from our JCG partner Jaibeer Malik at the Jai’s Weblog blog....
java-logo

Abstraction in Java – The ULTIMATE Tutorial

                       Table Of Contents1. Introduction 2. Interfaces2.1. Defining Interfaces 2.2. Implementing Interfaces 2.3. Using Interfaces3. Abstract Classes3.1. Defining Abstract Classes 3.2. Extending Abstract Classes 3.3. Using Abstract Classes4. A Worked Example – Payments System4.1. The Payee Interface 4.2. The Payment System 4.3. The Employee Classes 4.4. The Application 4.5. Handling Bonuses 4.6. Contracting Companies 4.7. Advanced Functionality: Taxation5. Conclusion  1. Introduction In this tutorial we will give an introduction to Abstraction in Java and define a simple Payroll System using Interfaces, Abstract Classes and Concrete Classes. There are two levels of abstraction in Java – Interfaces, used to define expected behaviour and Abstract Classes, used to define incomplete functionality. We will now look at these two different types of abstraction in detail. 2. Interfaces An interface is like a contract. It is a promise to provide certain behaviours and all classes which implement the interface guarantee to also implement those behaviours. To define the expected behaviours the interface lists a number of method signatures. Any class which uses the interface can rely on those methods being implemented in the runtime class which implements the interface. This allows anyone using the interface to know what functionality will be provided without having to worry about how that functionality will actually be achieved. The implementation details are hidden from the client, this is a crucial benefit of abstraction. 2.1. Defining Interfaces You can use the keyword interface to define an interface: public interface MyInterface {void methodA();int methodB();String methodC(double x, double y);} Here we see an interface called MyInterface defined, note that you should use the same case conventions for Interfaces that you do for Classes. MyInterface defines 3 methods, each with different return types and parameters. You can see that none of these methods have a body; when working with interfaces we are only interested in defining the expected behaviour, not with it’s implementation. Note: Java 8 introduced the ability to create a default implementation for interface methods, however we will not cover that functionality in this tutorial. Interfaces can also contain state data by the use of member variables: public interface MyInterfaceWithState {int someNumber;void methodA();} All the methods in an interface are public by default and in fact you can’t create a method in an interface with an access level other than public. 2.2. Implementing Interfaces Now we have defined an interface we want to create a class which will provide the implementation details of the behaviour we have defined. We do this by writing a new class and using the implements keyword to tell the compiler what interface this class should implement. public class MyClass implements MyInterface {public void methodA() { System.out.println("Method A called!"); }public int methodB() { return 42; }public String methodC(double x, double y) { return "x = " + x + ", y = " y; }} We took the method signatures which we defined in MyInterface and gave them bodies to implement them. We just did some arbitrary silliness in the implementations but it’s important to note that we could have done anything in those bodies, as long as they satisfied the method signatures. We could also create as many implementing classes as we want, each with different implementation bodies of the methods from MyInterface. We implemented all the methods from MyInterface in MyClass and if we failed to implement any of them the compiler would have given an error. This is because the fact that MyClass implements MyInterface means that MyClass is guaranteeing to provide an implementation for each of the methods from MyInterface. This lets any clients using the interface rely on the fact that at runtime there will be an implementation in place for the method it wants to call, guaranteed. 2.3. Using Interfaces To call the methods of the interface from a client we just need to use the dot (.) operator, just like with the methods of classes: MyInterface object1 = new MyClass(); object1.methodA(); // Guaranteed to work We see something unusual above, instead of something like MyClass object1 = new MyClass(); (which is perfectly acceptable) we declare object1 to be of type MyInterface. This works because MyClass is an implementation of MyInterface, wherever we want to call a method defined in MyInterface we know that MyClass will provide the implementation. object1 is a reference to any runtime object which implements MyInterface, in this case it’s an instance of MyClass. If we tried to do MyInterface object1 = new MyInterface() we’d get a compiler error because you can’t instantiate an interface, which makes sense because there’s no implementation details in the interface – no code to execute. When we make the call to object1.methodA() we are executing the method body defined in MyClass because the runtime type of object1 is MyClass, even though the reference is of type MyInterface. We can only call methods on object1 that are defined in MyInterface, for all intents and purposes we can refer to object1 as being of type MyInterface even though the runtime type is MyClass. In fact if MyClass defined another method called methodD() we couldn’t call it on object1, because the compiler only knows that object1 is a reference to a MyInterface, not that it is specifically a MyClass. This important distinction is what lets us create different implementation classes for our interfaces without worrying which specific one is being called at runtime. Take the following interface: public interface OneMethodInterface {void oneMethod();} It defines one void method which takes no parameters. Let’s implement it: public class ClassA implements OneMethodInterface {public void oneMethod() { System.out.println("Runtime type is ClassA."); } } We can use this in a client just like before: OneMethodInterface myObject = new ClassA(); myObject.oneMethod(); Output: ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books