Featured FREE Whitepapers

What's New Here?


Distributed Crawling

Around 3 months ago, I have posted one article explaining our approach and consideration to build Cloud Application. From this article, I will gradually share our practical design to solve this challenge. As mentioned before, our final goal is to build a Saas big data analysis application, which will deployed in AWS servers. In order to fulfill this goal, we need to build distributed crawling, indexing and distributed training systems. The focus of this article is how to build the distributed crawling system. The fancy name for this system will be Black Widow. Requirements As usual, let start with the business requirement for the system. Our goal is to build a scalable crawling system that can be deployed on the cloud. The system should be able to function in an unreliable, high-latency network and can recover automatically from a partial hardware or network failure. For the first release, the system can crawl from 3 kind of sources, Datasift, Twitter API and Rss feeds. The data crawled back are called Comment. The Rss crawlers suppose to read public sources like website or blog. It is free of charge. DataSift and Twitter both provide proprietary APIs to access their streaming service. Datasift charges its users by comment count and the complexity of CSLD (Curated Stream Definition Language, their own query language). Twitter, in the other hand, offers free Twitter Sampler streaming. In order to do cost control, we need to implement mechanism to limit the amount of comments crawled from commercial source like Datasift. As Datasift provided Twitter comment, it is possible to have single comment coming from different sources. At the moment, we did not try to eliminate and accept it as data duplication. However, this problem can be eliminated manually by user configuration (avoid choosing both Twitter and Datasift Twitter together). For future extension, the system should be able to link up related comments to from a conversation. Food for Thought Centralized Architecture Our first thought when getting requirement is to build the crawling on the nodes, which we called Spawn and let the hub, which we called Black Widow to manage the collaboration of effort among nodes. This idea was quickly accepted by team members as it allows the system to scale well with the hub doing limited work. As any other centralized system, Black Widow suffers from single point of failure problem. To help easing this problem, we allow the node to function independently for a short period after losing connection to Black Widow. This will give the support team a breathing room to bring up backup server. Another bottle neck in the system is data storage. For the volume of data being crawled (easily reach few thousands records per seconds), NoSQL is clearly the choice for storing the crawled comments. We have experiences working with Lucene and MongoDB. However, after research and some minor experiments, we choose Cassandra as the NoSQL database. With that few thoughts, we visualize the distributed crawling system to be build following this prototype:In the diagram above, Black Widow, or the hub is the only server that has access to the SQL database system. This is where we store the configuration for crawling. Therefore, all the Spawns, or crawling nodes are fully stateless. It simply wakes up, registers itself to Black Widow and does the assigned jobs. After getting the comments, the Spawn stores it to Cassandra cluster and also push it to some queues for further processing. Brainstorming of possible issues To explain the design to non-technical people, we like to relate the business requirement to a similar problem in real life so that it can be easier to understand. The similar problem we choose would be collaborating of efforts among volunteers. Imagine if we need to do a lot of preparation work for the upcoming Olympic and decide to recruit volunteers all around the world to help. We do not know volunteers but the volunteers know our email, so they can contact us to register. Only then, we know their emails and may send tasks to them through email. We would not want to send one task to two volunteers or left some tasks unattended. We want to distribute the tasks evenly so that no volunteers are suffering too much. Due to cost issue, we would not contact them through mobile phone. However, because email is less reliable, when sending out tasks to volunteers, we would request a confirmation. The task is consider assigned only when the volunteer replied with confirmation. With above example, the volunteers represent Spawn nodes while email communication represent unreliable and high latency network. Here are some problems that we need to solve: 1/ Node failure For this problems, the best way is to check regularly. If a volunteer stop responding to the regular progress check email, the task should be re-assign to someone else. 2/ Optimization of tasks assigning Some tasks are related. Therefore assigning related tasks to the same person can help to reduce total effort. This happen with our crawling as well because some crawling configurations have similar search terms, grouping  them together to share the streaming channel will help to reduce final bill. Another concern is the fairness or ability to distribute the amount of works evenly among volunteers. The simplest strategy we can think of is Round Robin but with a minor tweak by remembering earlier assignments. Therefore, if a task is pretty similar to the tasks we assigned before, the task can be skipped from Round Robin selection and directly assign to the same volunteer. 3/ The hub is not working If due to some reasons, our email server is down and we cannot contact volunteer any more, it is better to let the volunteers stop working on the assigning tasks. The main concern here is over-running of cost or wasted efforts. However, stopping working immediately is too hasty as temporary infrastructure issue may cause the communication problem. Hence, we need to find a reasonable amount of time for the node to continue functioning after being detached from the hub. 4/ Cost control Due to business requirement, there are two kinds of cost control that we need to implement. First is the total of comments being crawled per crawler and second is the total of comments crawled by all crawlers belong to the same user. This is where we have a debate about the best approach to implement cost control. It is very straight forward to implement the limit for each crawler. We can simply pass this limit to the Spawn node and it will automatically stop the crawler when the limit is reached. However, for the limit per user, it is not so straight forward and we have two possible approaches. For the simpler choice, we can send all the crawlers of one user to the same node. Then, similar to the earlier problem, the Spawn node knows  the amount of comments collected and stops all crawlers when limit reached. This approach is simple but it limits the ability to distribute jobs evenly among nodes. The alternative approach is to let all the nodes retrieve and update a global counter. This approach creates huge network traffic internally and add considerable delay to comment processing time. At this point, we temporarily choose the global counter approach. This can be considered again if the performance become a huge concern. 5/ Deploy on the cloud As any other Cloud application, we can not put too much trust in the network or infrastructure. Here is how we make our application conform to the check-list mentioned in last article:Stateless: Our spawn node is stateless but the hub is not. Therefore, in our design, the nodes do actual work and the hub only collaborates efforts. Idempotence: We implement hashCode and equal methods for every crawler configuration. We store the crawler configurations in the Map or Set. Therefore, the crawler configuration can be sent multiple times without any other side effect. Moreover, our node selection approach ensure that the job will be sent to the same node. Data Access Object: We apply the JsonIgnore filter on every model objects to make sure no confidential data flying around in the network. Play Safe: We implement health-check API for each node and the hub itself. The first level of support will get notified immediately when anything wrong happened.6/ Recovery We try our best to make the system heal itself from partial failure. There are some type of failure that we can recover from:Hub failure: Node register itself to the hub when it start up. From then, it is the one way communication when only the hub send jobs to node and also poll for status update. The node is consider detached if it failed to get any contact from Hub for a pre-defined period. If a node is detached, it will clear all the job configurations and start registering itself to the hub again. If the incident is caused by hub failure, a new hub will fetch crawling configurations from database and start distributing jobs again. All the existing jobs on Spawn nodes will be cleared when the Spawn node go to detached mode. Node failure: When hub fail to poll a node, it will do a hard reset by removing all working jobs and re-distribute from beginning again to the working nodes. This re-distribution process help to ensure optimized distribution. Job failure: There are two kind of failures happened when the hub do sending and polling jobs. If a job is failed in the polling process but the Spawn node is still working well, Black Widow can re-assign the job to the same node again. The same thing can be done if the job sending failed.Implementation Data Source and Subscriber In the initial thought, each crawler can open it own channel to retrieve data but this does not make sense any more when inspecting further. For Rss, we can scan all URLs once and find out the keywords that may belong to multiple crawlers. For Twitter, it supports up to 200 search terms for one single query. Therefore, it is possible for us to open single channel that serve multiple crawlers. For Datasift, it is quite rare, but due to human mistake or luck, it is possible to have crawlers with identical search terms. This situation lead us to split out crawler to two entities: subscriber and data source. Subscriber is in charge of consuming the comments while data source is in charge of crawling the comments. With this design, if there are two crawlers with similar keywords, a single data source will be created to serve two subscribers, each processing the comments their own ways. Data source will be created when and only when no similar data source exist. It starts working when having the first subscriber subscribe to it and retire when the last subscriber unsubscribe from it. With the help of Black Widow to send similar subscribers to the same node, we can minimize the amount of data sources created and indirectly, minimize the crawling cost. Data Structure The biggest concern of data structure is Thread Safe issue. In the Spawn node, we must store all running subscribers and data sources in memory. There are a few scenarios that we need to modify or access these data:When a subscriber hit the limit, it automatically unsubscribe from data source, which may lead to deactivation of data source. When Black Widow send a new subscriber to Spawn nodes. When Black Widow send a request to unsubscribe an existing subscriber. Health check API expose all running subscribers and data sources. Black Widow regularly polls the status of each assigned subscriber. The Spawn node regularly checks and disables orphan subscribers (subscriber which is not polled by Black Widow).Another concern of data structure is idempotence of operations. Any of operation above can be missing or being duplicated. To handle this problem, here is our approachImplement hashCode and equals method for every subscriber and data source. We choose the Set or Map to store collection of subscribers and data sources. For records with identical hash code, Map will replace the record when there is new insertion but Set will skip the new record. Therefore, if we use Set, we need to ensure new records can replace old record. We use synchronized in data access code. If Spawn node receive a new subscriber that similar to existing subscriber, it will compare and prefer to update existing subscriber instead of replacing. This avoid the process of unsubscribing and subscribing identical subscribers, which may interrupt data source streaming.Routing As mentioned before, we need to find a routing mechanism that serve two purposes:Distribute the jobs evenly among Spawn nodes. Route similar jobs to the same nodes.We solved this problem by generating an unique representation of each query  named uuid. After that, we can use a simple modular function to find out the note to route: int size = activeBwsNodes.size(); int hashCode = uuid.hashCode(); int index = hashCode % size; assignedNode = activeBwsNodes.get(index);With this implementation, subscribers with similar uuid will always be sent to the same node and each node has equals chance of being selected to serve a subscriber. This whole practice can be screwed up when there is change to the collection of active Spawn nodes. Therefore, Black Widow must clear up all running jobs and reassign from beginning whenever there is a node change. However, node change should be quite rare in production environment. Handshake Below is the sequence diagram of Black Widow and Node collaboration:Black Widow does not know Spawn node. It wait for the Spawn node to register itself to the Black Widow. From there, Black Widow has the responsibility to poll the node to maintain connectivity. If Black Widow fail to poll a node, it will remove the node from the its container. The orphan node will eventually go to detached mode because it is not being polled any more. In this mode, Spawn node will clear existing jobs and try to register itself again. The next diagram is the subscriber life-cycle:Similar to above, Black Widow has the responsibility of polling the subscribers it send to Spawn node. If a subscriber is not being polled by Black Widow anymore, Spawn node will treat the subscriber as orphan and remove it. This practice help to eliminate the threat of Spawn node running obsoleted subscriber. On Black Widow, when a subscriber polling fails, it will try to get a new node to assign the job. If the Spawn node of the subscriber still available, it is likely that the same job will go to the same node again due to our routing mechanism we used. Monitoring In a happy scenario, all the subscribers are running, Black Widow is polling and nothing else happen. However, this is not likely to happen in real life. There will be changes in Black Widow and Spawn nodes from time to time, triggered by various events. For Black Widow, there will be changes under following circumstances:Subscriber hit limit Found new subscriber Existing subscriber disabled by user Polling of subscriber fails Polling of Spawn node failsTo handle changes, Black Widow monitoring tool offers two services: hard reload and soft reload. Hard Reload happen on node change while Soft Reload happen on subscriber change. Hard Reload process takes back all running jobs, redistribute from beginning over available nodes. Soft Reload process removes obsoleted jobs, assigns new jobs and re-assigns failed jobs.Compare to Black Widow, the monitoring of Spawn node is simpler. The two main concerns are maintaining connectivity to Black Widow and removing orphan subscribers.Deployment Strategy The deployment strategy is straight forward. We need to bring up Black Widow and at least one Spawn node. The Spawn node should know the URL of Black Widow. From then, the Health Check API will give use the amount of subscribers per node. We can integrate Health Check with AWS API to automatically bring up new Spawn node if existing nodes are overloaded. The Spawn node image will need to have Spawn application running as service. Similarly, when the nodes are not utilized, we can bring down redundant Spawn nodes. Black Widow need special treatment due to its importance. If Black Widow fails, we can restart the application. This will cause all existing jobs on Spawn nodes to become orphan and all the Spawn nodes go to detached mode. Slowly, all the nodes will clean up itself and try to register again. Under default configuration, the whole restarting process will happen within 15 minutes. Threats and possible improvement When choosing centralized architecture, we know that Black Widow is the biggest risk to the system. While Spawn node failure only causes a minor interruption in the affected subscribers, Black Widow failure finally lead to Spawn nodes restart, which will take much longer time to recover. Moreover, even the system can recover from partial, there still be interruption of service in recovery process. Therefore, if the polling requests failed too often due to unstable infrastructure, the operation will be greatly hampered. Scalability is another concern for centralized architecture. We have not had a concrete amount of maximum Spawn nodes that the Black Widow can handle. Theoretically, this should be very high because Black Widow only do minor processing, most of its effort are on sending out HTTP requests. It is possible that network is the main limit factor for this architecture. Because of this, we let the Black Widow polling the nodes rather than the nodes polling Black Widow (other people do this, like Hadoop). With this approach, Black Widow may work at its own pace, not under pressure of Spawn nodes. One of the first question we got is whether it is a Map Reduce problem and the answer is No. Each subscriber in our Distributed Crawling System processes its own comments and does not reporting result back to Black Widow. That why we do not use any Map Reduce product like Hadoop. Our monitor is business logic aware rather than purely infrastructure monitoring, that why we choose to build ourselves over using monitoring tools like Zoo Keeper or AKKA. For future improvement, it is better to walk away from Centralized Architecture by having multiple hubs collaborating with each other. This should not be too difficult provided that the only time Black Widow accessing database is loading subscriber. Therefore, we can slice the data and let each Black Widow load a portion of it. Another point that make me feel pretty unsatisfied is the checking of global counter for user limit. As the check happened on every comment crawled, this greatly increases internal network traffic and limit the scalability of system. The better strategy should be divide of quota based on processing speed. Black Widow can regulate and redistribute quota for each subscriber (on different nodes).Reference: Distributed Crawling from our JCG partner Tony Nguyen at the Developers Corner blog....

Implementing the ‘Git flow’

Git can be used in a variety of ways which is cool. But still, when working within a team, it is good to have a consensus on a common, shared approach in order to avoid conflicts. This article quickly explains how we implemented the “git flow” pattern in one of our projects. Git-flow… …is a popular strategy which works around the master branch, but in a less “aggressive” way (than the GitHub flow pattern for instance). You have two main branches:  master branch contains the latest production code that has been deployed, and versioned with appropriate tags for each release. develop branch that gets branched off master and contains the latest code for the next feature that is being developed. For each new feature there might be lots of feature branches (always branched off the “develop” branch). Beside the main branches, there are so-called supporting branches:Beside those, there are supporting branches:feature branches contain the development state for a single feature of the overall product (i.e. a user story). They are merged off the develop branch. hotfix branches are branches for quick and severe bugfixes. they are usually branched off the master branch, fixed in the hotfix branch and then merged back in master and develop as well. release branch is a branch for preparing the next release. No new features will be developed on this branch but rather it contains some last fixes (also bugfixes) and adjustments for going into production.Production branch oriented Many people prefer to see master as their development branch and instead have a dedicated one for the production environment. Such a production oriented branching strategy has:master branch which contains the actual development code (corresponds to the “develop” branch in the git-flow model) production branch contains the deployed code.Supporting branches are:feature branches which contain the development of specific features and are branched off master and merged back into master hotfix branches (works like in the standard git-flow model) release branch (works like in the standard git-flow model)Usage In my opinion tools are great as they (mostly) give you some productivity boost. Nevetheless you should always understand what they do behind the scenes. This section lists the commands you’ll need to manually implement the production-oriented “git flow” pattern shown above. First of all you have to initialize an empty repository and eventually connect it immediately to your remote one. Obviously, feel free to skip this step if you already have one. $ git init $ git remote add origin git@..... Furthermore I’d suggest to also add a .gitignore file. You may start from an existing one based on your project type: Github .gitignore repository. “push” everything up to your remote repo. $ git push --set-upstream origin master Create a new Feature From master $ git pull $ git checkout -b userstory/login Do some commits and then publish the feature on the remote repo (if not a tiny one of a couple of hours) $ git push origin userstory/login Update feature from master Frequently update from origin/master to get the latest changes that have been pushed to the repo by your peers. $ git fetch origin master $ git rebase origin/master Alternatively checkout your master branch and execute $ git pull master $ git checkout <yourfeaturebranche> $ git rebase master Finish a feature Merge it back into master $ git checkout master $ git pull $ git merge --no-ff userstory/login --no-ff means no fast-forward to keep track from where certain changes have originated. TIP: In order to not forget the --no-ff flag you might want to configure it as the default behavior when merging into master by executing the following command: git config branch.master.mergeoptions "--no-ff" In case of conflicts, resolve them and then push the master $ git push and remove the userstory (locally and remote) $ git branch -d userstory/login $ git push origin :userstory/login Prepare release $ git checkout -b release/0.1.0 Publish production $ git checkout production $ git pull $ git merge --no-ff release/0.1.0 $ git tag v0.1.0 $ git push --tags origin production Delete the release/x.x.x branch. Create a hotfix $ git checkout production $ git checkout -b hotfix/login-does-not-work After testing it, merge back into production $ git checkout production $ git merge --no-ff hotfix/login-does-not-work $ git tag v0.1.1 $ git push --tags Obviously also merge those changes back to master as well $ git checkout master $ git merge --no-ff hotfix/login-does-not-work And then delete the hotfix branch $ git branch -d hotfix/login-does-not-work Ok..I’m a Jedi..give me some tools Git flow CLI tool Git Flow is a git command line extension to facilitate the usage of the “git flow” pattern.Download & install Git flow cheatsheetSo, if you mastered to use the git flow pattern manually, you’re ready to go with it. Haacked’s Git Aliases Phil Haack (former Microsoft employee and now working on GitHub for Windows @ GitHub) published an interesting set of 13 git aliases to boost your productivity. You might want to take a look at them:http://haacked.com/archive/2014/07/28/github-flow-aliases/To install them, simply copy&paste the aliases into your .gitconfig file. You should find it in your user profile directory (~ on unix systems; C:\users\<yourname>\ on Windows). Configuring Jenkins Please refer to my recent blog post “Git flow with Jenkins and GitLab” for further details on how to configure your build environment. How we use it – our pipeline We adopted the git flow pattern in one of our projects with a team getting in touch with git for the first time (they used TFS before). I introduced them to the Git basics and then they started straight ahead and surprisingly the switch was really easy. By using git flow we minimized the conflicting merges and thus potential problems in the development flow. So how did we use it? The team applied some kind of Scrum (we’re new to it, thus “some kind of” :)). We have two weeks iterations with an initial planning phase (usually on thursday morning) and we have the tester on the team (yay!).At the start of the sprint cycle, our devs take their user stories (on Trello) and create corresponding feature branches having the pattern userstory/<trello-card-#>-userstory-title for userstories, task/<trello-card-#>-title for tasks and bug/<trello-card-#>-title for bugs. The develop on the feature branches and fequently update them with master (see git flow usage above). If the story/task/bug’s implementation takes longer than a day or two, the branch gets pushed to the remote GitLab server (for backup reasons). Each of these pushes gets automatically build and tested by our Jenkins. Once finished with the implementation, the developer either merges it with master or creates a merge request on GitLab assigned to another developer for code reviewing. When master gets pushed to GitLab, Jenkins automatically takes it and publishes it to our dev server instance. Once every night, the master branch gets automatically published to our test server instance s.t. the tester in our team can continue to test the implemented stories and either mark them as done or reject them within our spring cycle. Furthermore a series of automated jMeter tests get executed that verify the correct functioning of our REST api as well as the performance of our endpoints. After the 2-weeks-cycle one of our devs prepares a release (see the kind of commands to execute in the “git flow usage” above) by merging master onto production. This is automatically detected by Jenkins which – again – publishes to our preproduction server instance which is also accessible by our customer.We do not use release branches as we don’t need them so far. There is no preparatory work to be done, although that might eventually change in the future. That’s the flow we came up with after a few iterations and dicussions within the team and with our tester. What’s your approach?? I’d be interested to hear in the comments.Reference: Implementing the ‘Git flow’ from our JCG partner Juri Strumpflohner at the Juri Strumpflohner’s TechBlog blog....

Objects Should Be Immutable

In object-oriented programming, an object is immutable if its state can’t be modified after it is created. In Java, a good example of an immutable object is String. Once created, we can’t modify its state. We can request that it creates new strings, but its own state will never change. However, there are not so many immutable classes in JDK. Take, for example, class Date. It is possible to modify its state using setTime(). I don’t know why the JDK designers decided to make these two very similar classes differently. However, I believe that the design of a mutable Date has a many flaws, while the immutable String is much more in the spirit of the object-oriented paradigm. Moreover, I think that all classes should be immutable in a perfect object-oriented world. Unfortunately, sometimes, it is technically not possible due to limitations in JVM. Nevertheless, we should always aim for the best. This is an incomplete list of arguments in favor of immutability:immutable objects are simpler to construct, test, and use truly immutable objects are always thread-safe they help to avoid temporal coupling their usage is side-effect free (no defensive copies) identity mutability problem is avoided they always have failure atomicity they are much easier to cache they prevent NULL references, which are badLet’s discuss the most important arguments one by one. Thread Safety The first and the most obvious argument is that immutable objects are thread-safe. This means that multiple threads can access the same object at the same time, without clashing with another thread. If no object methods can modify its state, no matter how many of them and how often are being called parallel — they will work in their own memory space in stack. Goetz et al. explained the advantages of immutable objects in more details in their very famous book Java Concurrency in Practice (highly recommended). Avoiding Temporal Coupling Here is an example of temporal coupling (the code makes two consecutive HTTP POST requests, where the second one contains HTTP body): Request request = new Request("http://example.com"); request.method("POST"); String first = request.fetch(); request.body("text=hello"); String second = request.fetch(); This code works. However, you must remember that the first request should be configured before the second one may happen. If we decide to remove the first request from the script, we will remove the second and the third line, and won’t get any errors from the compiler: Request request = new Request("http://example.com"); // request.method("POST"); // String first = request.fetch(); request.body("text=hello"); String second = request.fetch(); Now, the script is broken although it compiled without errors. This is what temporal coupling is about — there is always some hidden information in the code that a programmer has to remember. In this example, we have to remember that the configuration for the first request is also used for the second one. We have to remember that the second request should always stay together and be executed after the first one. If Request class were immutable, the first snippet wouldn’t work in the first place, and would have been rewritten like: final Request request = new Request(""); String first = request.method("POST").fetch(); String second = request.method("POST").body("text=hello").fetch(); Now, these two requests are not coupled. We can safely remove the first one, and the second one will still work correctly. You may point out that there is a code duplication. Yes, we should get rid of it and re-write the code: final Request request = new Request(""); final Request post = request.method("POST"); String first = post.fetch(); String second = post.body("text=hello").fetch(); See, refactoring didn’t break anything and we still don’t have temporal coupling. The first request can be removed safely from the code without affecting the second one. I hope this example demonstrates that the code manipulating immutable objects is more readable and maintainable, b ecause it doesn’t have temporal coupling. Avoiding Side Effects Let’s try to use our Request class in a new method (now it is mutable): public String post(Request request) { request.method("POST"); return request.fetch(); } Let’s try to make two requests — the first with GET method and the second with POST: Request request = new Request("http://example.com"); request.method("GET"); String first = this.post(request); String second = request.fetch(); Method post() has a “side effect” — it makes changes to the mutable object request. These changes are not really expected in this case. We expect it to make a POST request and return its body. We don’t want to read its documentation just to find out that behind the scene it also modifies the request we’re passing to it as an argument. Needless to say, such side effects lead to bugs and maintainability issues. It would be much better to work with an immutable Request: public String post(Request request) { return request.method("POST").fetch(); } In this case, we may not have any side effects. Nobody can modify our request object, no matter where it is used and how deep through the call stack it is passed by method calls: Request request = new Request("http://example.com").method("GET"); String first = this.post(request); String second = request.fetch(); This code is perfectly safe and side effect free. Avoiding Identity Mutability Very often, we want objects to be identical if their internal states are the same. Date class is a good example: Date first = new Date(1L); Date second = new Date(1L); assert first.equals(second); // true There are two different objects; however, they are equal to each other because their encapsulated states are the same. This is made possible through their custom overloaded implementation of equals() and hashCode() methods. The consequence of this convenient approach being used with mutable objects is that every time we modify object’s state it changes its identity: Date first = new Date(1L); Date second = new Date(1L); first.setTime(2L); assert first.equals(second); // false This may look natural, until you start using your mutable objects as keys in maps: Map<Date, String> map = new HashMap<>(); Date date = new Date(); map.put(date, "hello, world!"); date.setTime(12345L); assert map.containsKey(date); // false When modifying the state of date object, we’re not expecting it to change its identity. We’re not expecting to lose an entry in the map just because the state of its key is changed. However, this is exactly what is happening in the example above. When we add an object to the map, its hashCode() returns one value. This value is used by HashMap to place the entry into the internal hash table. When we call containsKey() hash code of the object is different (because it is based on its internal state) and HashMap can’t find it in the internal hash table. It is a very annoying and difficult to debug side effects of mutable objects. Immutable objects avoid it completely. Failure Atomicity Here is a simple example: public class Stack { private int size; private String[] items; public void push(String item) { size++; if (size > items.length) { throw new RuntimeException("stack overflow"); } items[size] = item; } } It is obvious that an object of class Stack will be left in a broken state if it throws a runtime exception on overflow. Its size property will be incremented, while items won’t get a new element. Immutability prevents this problem. An object will never be left in a broken state because its state is modified only in its constructor. The constructor will either fail, rejecting object instantiation, or succeed, making a valid solid object, which never changes its encapsulated state. For more on this subject, read Effective Java, 2nd Edition by Joshua Bloch. Arguments Against Immutability There are a number of arguments against immutability.“Immutability is not for enterprise systems”. Very often, I hear people say that immutability is a fancy feature, while absolutely impractical in real enterprise systems. As a counter-argument, I can only show some examples of real-life applications that contain only immutable Java objects: jcabi-http, jcabi-xml, jcabi-github, jcabi-s3, jcabi-dynamo, jcabi-simpledb The above are all Java libraries that work solely with immutable classes/objects. netbout.com and stateful.co are web applications that work solely with immutable objects. “It’s cheaper to update an existing object than create a new one”. Oracle thinks that “The impact of object creation is often overestimated and can be offset by some of the efficiencies associated with immutable objects. These include decreased overhead due to garbage collection, and the elimination of code needed to protect mutable objects from corruption.” I agree.If you have some other arguments, please post them below and I’ll try to comment.Reference: Objects Should Be Immutable from our JCG partner Yegor Bugayenko at the About Programming blog....

JUnit in a Nutshell: Test Runners

The fourth chapter of my multi-part tutorial about JUnit testing essentials explains the purpose of the tool’s exchangable test runners architecture and introduces some of the available implementations. The ongoing example enlarges upon the subject by going through the different possibilities of writting parameterized tests. Since I have already published an introduction to JUnit Rules, I decided to skip the announced sections on that topic. Instead I spend the latter a minor update.         Test Runners Architecture Don’t be afraid to give up the good to go for the great. John D. Rockefeller In the previous posts we have learned to use some of the xUnit testing patterns [MES] with JUnit. Those concepts are well supported by the default behavior of the tool’s runtime. But sometimes there is a need to vary or supplement the latter for particular test types or objectives. Consider for example integration tests, that often need to be run in specific environments. Or imagine a set of test cases comprising the specification of a subsystem, which should be composed for common test execution. JUnit supports the usage of various types of test processors for this purpose. Thus it delegates at runtime test class instantiation, test execution and result reporting to such processors, which have to be sub types of org.junit.Runner. A test case can specify its expected runner type with the @RunWith annotation. If no type is specified the runtime chooses BlockJUnit4ClassRunner as default. Which is responsible that each test runs with a fresh test instance and invokes lifecycle methods like implicit setup or teardown handlers (see also the chapter about Test Structure). @RunWith( FooRunner.class ) public class BarTest { The code snippet shows how the imaginary FooRunner is specified as test processor for the also imaginary BarTest. Usually there is no need to write custom test runners. But in case you have to, Michael Scharhag has written a good explanation of the JUnit’s runner architecture recently. It seems that usage of special test runners is straight forward, so let us have a look at a few: Suite and Categories Probably one of the best known processors is the Suite. It allows to run collections of tests and/or other suites in a hierarchically or thematically structured way. Note that the specifying class itself has usually no body implementation . It is annotated with a list of test classes, that get executed by running the suite: @RunWith(Suite.class) @SuiteClasses( { NumberRangeCounterTest.class, // list of test cases and other suites } ) public class AllUnitTests {} However the structuring capabilities of suites are somewhat limited. Because of this JUnit 4.8 introduced the lesser known Categories concept. This makes it possible to define custom category types like unit-, integration- and acceptance tests for example. To assign a test case or a method to one of those categories the Category annotation is provided: // definition of the available categories public interface Unit {} public interface Integration {} public interface Acceptance {}// category assignment of a test case @Category(Unit.class) public class NumberRangeCounterTest { [...] }// suite definition that runs tests // of the category 'Unit' only @RunWith(Categories.class) @IncludeCategory(Unit.class) @SuiteClasses( { NumberRangeCounterTest.class, // list of test cases and other suites } ) public class AllUnitTests {} With Categories annotated classes define suites that run only those tests of the class list, that match the specified categories. Specification is done via include and/or exclude annotations. Note that categories can be used in Maven or Gradle builds without defining particular suite classes (see the Categories section of the JUnit documentation). For more information on categories: John Ferguson Smart’s has written a detailled explanation about Grouping tests using JUnit categories. Since maintenance of the suite class list and category annotations is often considered somewhat tedious, you might prefer categorising via test postfix names à la FooUnitTest instead of FooTest. This allows to filter categories on type-scope at runtime. But this filtering is not supported by JUnit itself, why you may need a special runner that collects the available matching tests dynamically. A library that provides an appropriate implementation is Johannes Link‘s ClasspathSuite. If you happen to work with integration tests in OSGi environment Rüdiger‘s BundleTestSuite does something similar for bundles. After this first impressions of how test runners can be used for test bundling let us continue the tutorial’s example with something more exciting. Parameterized Tests The example used throughout this tutorial is about writing a simple number range counter, which delivers a certain amount of consecutive integers, starting from a given value. Additionally a counter depends on a storage type for preserving its current state. For more information please refer to the previous chapters. Now assume that our NumberRangeCounter, which is initialized by constructor parameters, should be provided as API. So we may consider it reasonable, that instance creation checks the validity of the given parameters. We could specify the appropriate corner cases, which should be acknowledged with IllegalArgumentExceptions, by a single test each. Using the Clean JUnit Throwable-Tests with Java 8 Lambdas approach, such a test verifying that the storage parameter must not be null might look like this: @Test public void testConstructorWithNullAsStorage() { Throwable actual = thrown( () -> new NumberRangeCounter( null, 0, 0 ) ); assertTrue( actual instanceof IllegalArgumentException ); assertEquals( NumberRangeCounter.ERR_PARAM_STORAGE_MISSING, actual.getMessage() ); } Note that I stick with the JUnit build-in functionality for verification. I will cover the pro and cons of particular matcher libraries (Hamcrest, AssertJ) in a separate post. To keep the post in scope I also skip the discussion, whether a NPE would be better than the IAE. In case we have to cover a lot of corner cases of that kind, the approach above might lead to a lot of very similar tests. JUnit offers the Parameterized runner implementation to reduce such redundancy. The idea is to provide various data records for the common test structure. To do so a public static method annotated with @Parameters is used to create the data records as a collection of object arrays. Furthermore the test case needs a public constructor with arguments, that match the data types provided by the records. The parameterized processor runs a given test for each record supplied by the parameters method. This means for each combination of test and record a new instance of the test class is created. The constructor parameters get stored as fields and can be accessed by the tests for setup, exercise and verification: @RunWith( Parameterized.class ) public class NumberRangeCounterTest { private final String message; private final CounterStorage storage; private final int lowerBound; private final int range; @Parameters public static Collection<Object[]> data() { CounterStorage dummy = mock( CounterStorage.class ); return Arrays.asList( new Object[][] { { NumberRangeCounter.ERR_PARAM_STORAGE_MISSING, null, 0, 0 }, { NumberRangeCounter.ERR_LOWER_BOUND_NEGATIVE, dummy, -1, 0 }, [...] // further data goes here... } ); } public NumberRangeCounterTest( String message, CounterStorage storage, int lowerBound, int range ) { this.message = message; this.storage = storage; this.lowerBound = lowerBound; this.range = range; } @Test public void testConstructorParamValidation() { Throwable actual = thrown( () -> new NumberRangeCounter( storage, lowerBound, range ) ); assertTrue( actual instanceof IllegalArgumentException ); assertEquals( message, actual.getMessage() ); }[...] } While the example surely reduces test redundancy it is at least debatable with respect to readability. In the end this often depends on the amount of tests and the structure of the particular test data. But it is definitively unfortunate, that tests, which do not use any record values, will be executed multiple times, too. Because of this parameterized tests are often kept in separate test cases, which usually feels more like a workaround than a proper solution. Hence a wise guy came up with the idea to provide a test processor that circumvents the described problems. JUnitParams The library JUnitParams provides the types JUnitParamsRunner and @Parameter. The param annotation specifies the data records for a given test. Note the difference to the JUnit annotation with the same simple name. The latter marks a method that provides the data records! The test scenario above could be rewritten with JUnitParams as shown in the following snippet: @RunWith( JUnitParamsRunner.class ) public class NumberRangeCounterTest { public static Object data() { CounterStorage dummy = mock( CounterStorage.class ); return $( $( ERR_PARAM_STORAGE_MISSING, null, 0, 0 ), $( ERR_LOWER_BOUND_NEGATIVE, dummy, -1, 0 ) ); } @Test @Parameters( method = "data" ) public void testConstructorParamValidation( String message, CounterStorage storage, int lowerBound, int range ) { Throwable actual = thrown( () -> new NumberRangeCounter( storage, lowerBound, range ) ); assertTrue( actual instanceof IllegalArgumentException ); assertEquals( message, actual.getMessage() ); } [...] } While this is certainly more compact and looks cleaner on first glance, a few constructs need further explanation. The $(...) method is defined in JUnitParamsRunner (static import) and is a shortcut for creating arrays of objects. Once accustomed to it, data definition gets more readable. The $ shortcut is used in the method data to create a nested array of objects as return value. Although the runner expects a nested data array at runtime, it is able to handle a simple object type as return value. The test itself has an additional @Parameters annotation. The annotation’s method declaration refers to the data provider used to supply the test with the declared parameters. The method name is resolved at runtime via reflection. This is the down-side of the solution, as it is not compile-time safe. But there are other use case scenarios where you can specify data provider classes or implicit values, which therefore do not suffer from that trade-off. For more information please have a look at the library’s quick start guide for example. Another huge advantage is, that now only those tests run against data records that use the @Parameters annotation. Standard tests are executed only once. This in turn means that the parameterized tests can be kept in the unit’s default test case.Wrap Up The sections above outlined the sense and purpose of JUnit’s exchangable test runners architecture. It introduced suite and categories to show the basic usage and carried on with an example of how test runners can ease the task of writing data record related tests. For a list of additional test runners the pages Test runners and Custom Runners at junit.org might be a good starting point. And if you wonder what the Theories runner of the title picture is all about, you might have a look at Florian Waibels post JUnit – the Difference between Practice and @Theory. Next time on JUnit in a Nutshell I finally will cover the various types of assertions available to verify test results. References [MES] xUnit Test Patterns, Gerard Meszaros, 2007Reference: JUnit in a Nutshell: Test Runners from our JCG partner Frank Appel at the Code Affine blog....

How much do you pay per line of code?

Yes, I know, “line of code” (LoC) is a very wrong metric. There are tons of articles written about it, as well as famous books. However, I want to compare two projects in which I have participated recently and discuss some very interesting numbers. Project #1: Traditionally Co-located The first project I was apart of was performed by a traditionally co-located group of programmers. There were about 20 of them (I’m not counting managers, analysts, product owners, SCRUM masters, etc.) The project was a web auctioning site with pretty high traffic numbers (over two million page views per day).   The code base size was about 200k lines, of which 150k was PHP, 35k JavaScript and the remainder CSS, XML, Ruby, and something else. I’m counting only non-empty and non-comment lines of code, using cloc.pl. It was a commercial project, so I can’t disclose its name.The team was co-located in one office in Europe where everybody was working “from nine ’til five”. We had meetings, lunches, desk-to-desk chats and lots of other informal communications. All tasks were tracked in JIRA. Project #2: Extremely Distributed The second project was an open source Java product, developed by an extremely distributed team of about 15 developers. We didn’t have any chats or any other informal communications. We discussed everything in Github issues. The code base was significantly smaller with only about 30k lines, of which about 90% was Java and the rest in XML.Maturity of Development Both projects hosted their code bases on Github. Both teams were developing in feature branches- even for small fixes. Both teams used build automation, continuous integration, pre-flight builds, static analysis and code reviews. This indicates the maturity of the project teams. Both projects satisfied the requirements of their users. I’m mentioning this to emphasize that both projects produced valuable and useful lines of code. There was no garbage and almost no code duplication. Show Me the Money In both projects, my role was called that of lead architect, and I know their economics and financials. Besides that, I had access to both Git repositories, so I can measure how many new lines (or changed lines) were introduced by both teams in, say, a three-month period. Now, let’s see the numbers. The first project (the one that was co-located) was paying approximately €50,000 annually to a good developer, which was about $5,600 per month or $35 per hour. The second one (the extremely distributed project) was paying $20-35 per hour, for completed tasks only according to one of the principles of XDSD. The first one, in three months, produced 59k new lines and removed 29k in changes in the master branch, which in totals 88k lines of code. The project resulted in about 10,000 man hours to produce these lines (20 programmers, three months, 170 working hours per month) — which equates to about $350k. Therefore, the project cost a whopping: $3.98 per line The second project, in the same three month period, produced 45k new lines and removed 9k, which comes to 54k in all. To complete this work, we spent only $7k (approximately 350 working hours in 650 tasks). Thus, the project costs merely: ¢13 per line This also means that programmers were writing approximately 270 lines per hour or over 2,000 per day. The Mythical Man-Month talks about 10 lines per day, which is 200 times less than we saw in our project. $350k vs $7k, $3.98 vs ¢13? What do you think? How to Validate the Numbers? If you’re curius, I’m using this script to get the numbers from Git: git log "--since=3 months" --pretty=tformat: --numstat \ | gawk '{ add += $1; subs += $2; } END { printf "added: %s removed: %s\n",add,subs,loc }' - You can validate the numbers for the second project here on Github: jcabi/jcabi-github. Conclusion What I’m trying to express with these numbers is that distributed programming is much more effective, money-wise, than a co-located team. Again, I can hear you saying that “line of code” is not a proper metric. But, come on, $0.13 vs. $3.98? Thirty times more expensive?It’s not about metrics any more. It’s about preventing wasteful man hours and the huge waste of money that comes with them? Can We Do the Same? Of course, the same results can’t be achieved by just telling your programmers to work from home and never come to the office. XDSD is not about that. XDSD is about strict quality principles, which should be followed by the entire team. And when these principles are in place — you pay thirty times less. By the way, this is what people say about their projects:$12–103: crazyontap.com $15–40: betterembsw.blogspot.nl over $5: joelonsoftware.comReference: How much do you pay per line of code? from our JCG partner Yegor Bugayenko at the About Programming blog....

Java Concurrency Tutorial – Locking: Intrinsic locks

In previous posts we reviewed some of the main risks of sharing data between different threads (like atomicity and visibility) and how to design classes in order to be shared safely (thread-safe designs). In many situations though, we will need to share mutable data, where some threads will write and others will act as readers. It may be the case that you only have one field, independent to others, that needs to be shared between different threads. In this case, you may go with atomic variables. For more complex situations you will need synchronization.       1. The coffee store example Let’s start with a simple example like a CoffeeStore. This class implements a store where clients can buy coffee. When a client buys coffee, a counter is increased in order to keep track of the number of units sold. The store also registers who was the last client to come to the store. public class CoffeeStore { private String lastClient; private int soldCoffees; private void someLongRunningProcess() throws InterruptedException { Thread.sleep(3000); } public void buyCoffee(String client) throws InterruptedException { someLongRunningProcess(); lastClient = client; soldCoffees++; System.out.println(client + " bought some coffee"); } public int countSoldCoffees() {return soldCoffees;} public String getLastClient() {return lastClient;} } In the following program, four clients decide to come to the store to get their coffee: public static void main(String[] args) throws InterruptedException { CoffeeStore store = new CoffeeStore(); Thread t1 = new Thread(new Client(store, "Mike")); Thread t2 = new Thread(new Client(store, "John")); Thread t3 = new Thread(new Client(store, "Anna")); Thread t4 = new Thread(new Client(store, "Steve")); long startTime = System.currentTimeMillis(); t1.start(); t2.start(); t3.start(); t4.start(); t1.join(); t2.join(); t3.join(); t4.join(); long totalTime = System.currentTimeMillis() - startTime; System.out.println("Sold coffee: " + store.countSoldCoffees()); System.out.println("Last client: " + store.getLastClient()); System.out.println("Total time: " + totalTime + " ms"); }private static class Client implements Runnable { private final String name; private final CoffeeStore store; public Client(CoffeeStore store, String name) { this.store = store; this.name = name; } @Override public void run() { try { store.buyCoffee(name); } catch (InterruptedException e) { System.out.println("interrupted sale"); } } } The main thread will wait for all four client threads to finish, using Thread.join(). Once the clients have left, we should obviously count four coffees sold in our store, but you may get unexpected results like the one above: Mike bought some coffee Steve bought some coffee Anna bought some coffee John bought some coffee Sold coffee: 3 Last client: Anna Total time: 3001 ms We lost one unit of coffee, and also the last client (John) is not the one displayed (Anna). The reason is that since our code is not synchronized, threads interleaved. Our buyCoffee operation should be made atomic. 2. How synchronization works A synchronized block is an area of code which is guarded by a lock. When a thread enters a synchronized block, it needs to acquire its lock and once acquired, it won’t release it until exiting the block or throwing an exception. In this way, when another thread tries to enter the synchronized block, it won’t be able to acquire its lock until the owner thread releases it. This is the Java mechanism to ensure that only on thread at a given time is executing a synchronized block of code, ensuring the atomicity of all actions within that block. Ok, so you use a lock to guard a synchronized block, but what is a lock? The answer is that any Java object can be used as a lock, which is called intrinsic lock. We will now see some examples of these locks when using synchronization. 3. Synchronized methods Synchronized methods are guarded by two types of locks:Synchronized instance methods: The implicit lock is ‘this’, which is the object used to invoke the method. Each instance of this class will use their own lock.Synchronized static methods: The lock is the Class object. All instances of this class will use the same lock.As usual, this is better seen with some code. First, we are going to synchronize an instance method. This works as follows: We have one instance of the class shared by two threads (Thread-1 and Thread-2), and another instance used by a third thread (Thread-3): public class InstanceMethodExample { private static long startTime; public void start() throws InterruptedException { doSomeTask(); } public synchronized void doSomeTask() throws InterruptedException { long currentTime = System.currentTimeMillis() - startTime; System.out.println(Thread.currentThread().getName() + " | Entering method. Current Time: " + currentTime + " ms"); Thread.sleep(3000); System.out.println(Thread.currentThread().getName() + " | Exiting method"); } public static void main(String[] args) { InstanceMethodExample instance1 = new InstanceMethodExample(); Thread t1 = new Thread(new Worker(instance1), "Thread-1"); Thread t2 = new Thread(new Worker(instance1), "Thread-2"); Thread t3 = new Thread(new Worker(new InstanceMethodExample()), "Thread-3"); startTime = System.currentTimeMillis(); t1.start(); t2.start(); t3.start(); } private static class Worker implements Runnable { private final InstanceMethodExample instance; public Worker(InstanceMethodExample instance) { this.instance = instance; } @Override public void run() { try { instance.start(); } catch (InterruptedException e) { System.out.println(Thread.currentThread().getName() + " interrupted"); } } } } Since doSomeTask method is synchronized, you would expect that only one thread will execute its code at a given time. But that’s wrong, since it is an instance method; different instances will use a different lock as the output demonstrates: Thread-1 | Entering method. Current Time: 0 ms Thread-3 | Entering method. Current Time: 1 ms Thread-3 | Exiting method Thread-1 | Exiting method Thread-2 | Entering method. Current Time: 3001 ms Thread-2 | Exiting method Since Thread-1 and Thread-3 use a different instance (and hence, a different lock), they both enter the block at the same time. On the other hand, Thread-2 uses the same instance (and lock) as Thread-1. Therefore, it has to wait until Thread-1 releases the lock. Now let’s change the method signature and use a static method. StaticMethodExample has the same code except the following line: public static synchronized void doSomeTask() throws InterruptedException { If we execute the main method we will get the following output: Thread-1 | Entering method. Current Time: 0 ms Thread-1 | Exiting method Thread-3 | Entering method. Current Time: 3001 ms Thread-3 | Exiting method Thread-2 | Entering method. Current Time: 6001 ms Thread-2 | Exiting method Since the synchronized method is static, it is guarded by the Class object lock. Despite using different instances, all threads will need to acquire the same lock. Hence, any thread will have to wait for the previous thread to release the lock. 4. Back to the coffee store example I have now modified the Coffee Store example in order to synchronize its methods. The result is as follows: public class SynchronizedCoffeeStore { private String lastClient; private int soldCoffees; private void someLongRunningProcess() throws InterruptedException { Thread.sleep(3000); } public synchronized void buyCoffee(String client) throws InterruptedException { someLongRunningProcess(); lastClient = client; soldCoffees++; System.out.println(client + " bought some coffee"); } public synchronized int countSoldCoffees() {return soldCoffees;} public synchronized String getLastClient() {return lastClient;} } Now, if we execute the program, we won’t lose any sale: Mike bought some coffee Steve bought some coffee Anna bought some coffee John bought some coffee Sold coffee: 4 Last client: John Total time: 12005 ms Perfect! Well, it really is? Now the program’s execution time is 12 seconds.  You sure have noticed a someLongRunningProcess method executing during each sale. It can be an operation which has nothing to do with the sale, but since we synchronized the whole method, now each thread has to wait for it to execute. Could we leave this code out of the synchronized block? Sure! Have a look at synchronized blocks in the next section. 5. Synchronized blocks The previous section showed us that we may not always need to synchronize the whole method. Since all the synchronized code forces a serialization of all thread executions, we should minimize the length of the synchronized block. In our Coffee store example, we could leave the long running process out of it. In this section’s example, we are going to use synchronized blocks: In SynchronizedBlockCoffeeStore, we modify the buyCoffee method to exclude the long running process outside of the synchronized block: public void buyCoffee(String client) throws InterruptedException { someLongRunningProcess(); synchronized(this) { lastClient = client; soldCoffees++; System.out.println(client + " bought some coffee"); } }public synchronized int countSoldCoffees() {return soldCoffees;}public synchronized String getLastClient() {return lastClient;} In the previous synchronized block, we use ‘this’ as its lock. It’s the same lock as in synchronized instance methods. Beware of using another lock, since we are using this lock in other methods of this class (countSoldCoffees and getLastClient). Let’s see the result of executing the modified program: Mike bought some coffee John bought some coffee Anna bought some coffee Steve bought some coffee Sold coffee: 4 Last client: Steve Total time: 3015 ms We have significantly reduced the duration of the program while keeping the code synchronized. 6. Using private locks The previous section used a lock on the instance object, but you can use any object as its lock. In this section we are going to use a private lock and see what the risk is of using it. In PrivateLockExample, we have a synchronized block guarded by a private lock (myLock): public class PrivateLockExample { private Object myLock = new Object(); public void executeTask() throws InterruptedException { synchronized(myLock) { System.out.println("executeTask - Entering..."); Thread.sleep(3000); System.out.println("executeTask - Exiting..."); } } } If one thread enters executeTask method will acquire myLock lock. Any other thread entering other methods within this class guarded by the same myLock lock, will have to wait in order to acquire it. But now, let’s imagine that someone wants to extend this class in order to add its own methods, and these methods also need to be synchronized because need to use the same shared data. Since the lock is private in the base class, the extended class won’t have access to it. If the extended class synchronizes its methods, they will be guarded by ‘this’. In other words, it will use another lock. MyPrivateLockExample extends the previous class and adds its own synchronized method executeAnotherTask: public class MyPrivateLockExample extends PrivateLockExample { public synchronized void executeAnotherTask() throws InterruptedException { System.out.println("executeAnotherTask - Entering..."); Thread.sleep(3000); System.out.println("executeAnotherTask - Exiting..."); } public static void main(String[] args) { MyPrivateLockExample privateLock = new MyPrivateLockExample(); Thread t1 = new Thread(new Worker1(privateLock)); Thread t2 = new Thread(new Worker2(privateLock)); t1.start(); t2.start(); } private static class Worker1 implements Runnable { private final MyPrivateLockExample privateLock; public Worker1(MyPrivateLockExample privateLock) { this.privateLock = privateLock; } @Override public void run() { try { privateLock.executeTask(); } catch (InterruptedException e) { e.printStackTrace(); } } } private static class Worker2 implements Runnable { private final MyPrivateLockExample privateLock; public Worker2(MyPrivateLockExample privateLock) { this.privateLock = privateLock; } @Override public void run() { try { privateLock.executeAnotherTask(); } catch (InterruptedException e) { e.printStackTrace(); } } } } The program uses two worker threads that will execute executeTask and executeAnotherTask respectively. The output shows how threads are interleaved since they are not using the same lock: executeTask - Entering... executeAnotherTask - Entering... executeAnotherTask - Exiting... executeTask - Exiting... 7. Conclusion We have reviewed the use of intrinsic locks by using Java’s built-in locking mechanism. The main concern here is that synchronized blocks that need to use shared data; have to use the same lock. This post is part of the Java Concurrency Tutorial series. Check here to read the rest of the tutorial.You can find the source code at Github.Reference: Java Concurrency Tutorial – Locking: Intrinsic locks from our JCG partner Xavier Padro at the Xavier Padró’s Blog blog....

jUnit: Rules

Rules add special handling around tests, test cases or test suites. They can do additional validations common for all tests in the class, concurrently run multiple test instances, set up resources before each test or test case and tear them down afterwards. The rule gets complete control over what will done with the test method, test case or test suite it is applied to. Complete control means that the rule decides what to do before and after running it and how to deal with thrown exceptions. First chapter shows how to use rules and second shows what build-in rules can do. The third chapter describes third party rules libraries I found and the last one explains how to create new rules. Using Rules This chapter shows how to declare and use rules inside a test case. Most rules can be applied to each test method separately, once to the whole test case or once to the whole test suite. Rules run separately for each test are called test rules and rules applied to the whole test case or suite are called class rules. We will use temporary folder rule as an example, so first subchapter explains what it does. Second subchapter declares it as test rule and third one as class rule. Last subchapter shows how to access the folder from inside the tests. Example Rule – Temporary Folder Temporary folder rule creates new empty folder, runs test or test case and then deletes the folder. You can either specify where to create the new folder, or let it be created in system temporary file directory. Temporary folder can be used as both test rule and class rule. Declaring Test Rules Test rules e.g., rules that run for each test method separately, have to be declared in public field annotated with @Rule annotation. Declare test rule: public class SomeTestCase { @Rule public TemporaryFolder folder = new TemporaryFolder(); } The above folder rule creates new folder before every test method and destroys it afterwards. All tests are able to use that directory, but they are not able to share files through it. Since we used constructor with no parameters, the folder will be created in system temporary file directory. Test rule does its work before methods annotated with @Before and after those annotated with @After. Therefore, they will have access to temporary folder too. Declaring Class Rules Class rules e.g., rules that run once for the whole test case or test suite, have to be declared in public static field and annotated with @ClassRule annotation. Declare test case rule: public class SomeTestCase { @ClassRule public static TemporaryFolder folder = new TemporaryFolder(); } The above folder rule creates new folder before running the first test method and destroys it after the last one. All tests are able to use that directory and they are able to see files created be previously running tests. Class rules are run before anything inside that class. E.g. methods annotated with @BeforeClass or @AfterClass will have access to temporary folder too. The rule runs before and after them. Using Rules Inside Tests Rules are classes as any other and tests are free to call their public methods and use their public fields. Those calls are used to add test specific configuration to the rule or read data out of it. For example, temporary folder can be accessed using newFile, newFolder or getRoot methods. First two create new file or folder inside the temporary folder and the getRoot method returns temporary folder itself. Create temporary file and folder: @Test public void test1() { // Create new folder inside temporary directory. Depending on how you // declared the folder rule, the directory will be deleted either // right after this test or when the last test in test case finishes. File file = folder.newFolder("folder"); }@Test public void test2() { // Create new file inside temporary folder. Depending on how you // declared the folder rule, the file will be deleted either // right after this test or when the last test in test case finishes. File file = folder.newFile("file.png"); } Default Rules JUnit comes with five directly useable rules: temporary folder, expected exception, timeout, error collector and test name. Temporary folder have been explained in previous chapter, so we will briefly explain only remaining four rules. Expected Exception Expected exception runs the test and catches any exception it throws. The rule is able to check whether the exception contains the right message, the right cause and whether it was thrown by the right line. Expected exception has private constructor and must be initialized using static none method. Each exception throwing test has to configure expected exception parameters and then call the expect method of the rule. The rule fails if:the test throws any exception before the expect method call, the test does not throw an exception after the expect method call, thrown exception does not have the right message, class or cause.The last test line throws an exception. Expected exception rule is configured right before causing the exception: @Rule public ExpectedException thrown= ExpectedException.none();@Test public void testException() { // Any exception thrown here causes failure doTheStuff(); // From now on, the rule expects NullPointerException exception // to be thrown. If the test finishes without exception or if it // throws wrong one, the rule will fail. thrown.expect(NullPointerException.class); // We well check also message thrown.expectMessage("Expected Message.");// this line is supposed to throw exception theCodeThatThrowsTheException(); } Bonus: the expected message method accepts also hamcrest matcher argument. That allows you to test the message prefix, suffix, whether it matches some regular expressions or anything else. Timeout The timeout rule can be used as both test rule and class rule. If it is declared as test rule, it applies the same timeout limit to each test in the class. If it is declared as class rule, it applies the timeout limit to the whole test case or test suite. Error Collector Error collector allows you to run multiple checks inside the test and then report all their failures at once after the test ends. Expected-vs-actual value assertions are evaluated using the checkThat method exposed by the rule. It accepts hamcrest matcher as an argument and thus can be used to check anything. Unexpected exceptions can be reported directly using addError(Throwable error) method. Alternatively, if you have an instance of Callable to be run, you can call it through checkSucceeds method which adds any thrown exception into errors list. Test Name Test name rule exposes test name inside the test. It might be useful when you need to create custom error reporting. Third Party Rules Libraries Rules are decoupled from the test class, so it is easy to write libraries of general purpose rules and share them between projects. This chapter describes three such libraries. System rules is rules collection for testing code that uses java.lang.System. It is well documented, available in maven and released under Common Public License 1.0 (the same as jUnit). System rules allows you to easily:test content of System.err and System.out, simulate input in System.in, configure system properties and revert their values back, test System.exit() calls – whether it was called and what return value was, customize java SecurityManager and revert it back.A big set of useful rules is available on aisrael account on github. Its documentation is somewhat limited, but you can always look at the code. All rules are released under MIT license:starting and stopping in-memory derby database, starting and stopping default java HttpServer, starting and stopping Jetty server, running stub jndi, some support for dbUnit tests.Another undocumented set of rules on github. I will not list them here, because their names are self-explanatory and they do not have specified license. Look at the rules directory to see their list. Custom Rule This chapter shows how to create new rules. They can be implemented from scratch by implementing the TestRule interface or by extending one of two convenience classes ExternalResource and Verifier available in jUnit. We will create a new rule from scratch and then rewrite it using ExternalResource class. New Rule New rule ensures that all files created by tests are properly deleted after each test finishes its work. The tests themselves have only one responsibility: report all new files using the ensureRemoval(file) method exposed by the rule. How to declare and use the DeleteFilesRule rule: @Rule public DeleteFilesRule toDelete = new DeleteFilesRule();@Test public void example() throws IOException { // output.css will be deleted whether the test passes, fails or throws an exception toDelete.ensureRemoval("output.css"); // the compiler is configured to create output.css file compileFile("input.less"); checkCorrectess("output.css"); } From Scratch Each rule, including class rules, must implement the @TestRule interface. The interface has exactly one method: public interface TestRule { Statement apply(Statement base, Description description); } Our job is to take statement supplied in the base parameter and turn it into another statement. The statement represents a set of actions e.g., test, test case or test suite to be run. It might have already been modified by other declared rules and includes before and after test or class methods. The second description parameter describes the input statement. It can tell test class name, test name, annotations placed on it, it knows whether we are dealing with test or test suite etc. We will not need it. We need to create a new statement which will do three things:Empty the list of files to be deleted. Run underlying test, test case or test suite represented by the base parameter. Delete all files reported by tests inside previously run statement.The statement is a class with one abstract method: public abstract class Statement { public abstract void evaluate() throws Throwable; } Since underlying statement can throw an exception, the code to delete all files must run from finally block: public class DeleteFilesRule implements TestRule { public Statement apply(final Statement base, final Description description) { return new Statement() { @Override public void evaluate() throws Throwable { emptyFilesList(); // clean the list of files try { base.evaluate(); // run underlying statement } finally { removeAll(); // delete all new files } } }; } } Both referenced methods emptyFilesList and removeAll are declared outside of new statement, directly inside the DeleteFilesRule class: public class DeleteFilesRule implements TestRule {private List<File> toDelete; private void emptyFilesList() { toDelete = new ArrayList<File>(); }private void removeAll() { for (File file : toDelete) { if (file.exists()) file.delete(); } }/* ... the apply method ... */ } The last thing we need is a public method able to add files to be deleted: public void ensureRemoval(String... filenames) { for (String filename : filenames) { toDelete.add(new File(filename)); } } Full Class public class DeleteFilesRule implements TestRule {private List<File> toDelete; public void ensureRemoval(String... filenames) { for (String filename : filenames) { toDelete.add(new File(filename)); } } private void emptyFilesList() { toDelete = new ArrayList<File>(); }private void removeAll() { for (File file : toDelete) { if (file.exists()) file.delete(); } }public Statement apply(final Statement base, final Description description) { return new Statement() { @Override public void evaluate() throws Throwable { emptyFilesList(); // clean the list of files try { base.evaluate(); // run underlying statement } finally { removeAll(); // delete all new files } } }; } } Extending Build-in Classes JUnit contains two convenience classes ExternalResource and Verifier meant to simplify the above process even more. External Resource The ExternalResource helps when you need to do some kind of preprocessing and postprocessing around the underlying test statement. If you need preprocessing, override the before method. If you need postprocessing, override the after method. The after is called from finally block, so it will be run no matter what. Our DeleteFilesRule could be rewritten like this: public class DeleteFilesRule2 extends ExternalResource { /* ... list, ensureRemoval and removeAll methods ... */@Override protected void before() throws Throwable { toDelete = new ArrayList<File>(); }@Override protected void after() { removeAll(); }} Verifier The Verifier has only one method verify to override. That method runs after the wrapped test finished its work and only if it did not thrown an exception. As the name suggests, the verifier is good if you want to run additional checks after the test. More About jUnit Previous post about jUnit 4 features:jUnit: Dynamic Tests GenerationReference: jUnit: Rules from our JCG partner Maria Jurcovicova at the This is Stuff blog....

EJB 3.x : Lifecycle and Concurrency models (part 2)

This is the second post of the two part series. The first part covered the life cycle and the concurrency behavior of Stateful and Stateless EJBs. I’ll cover Singleton EJBs in this post. The Singleton pattern is arguably the most used (some times misused!) pattern out there.         Java EE frees us from writing explicit code (like one on the above picture) to implement the Singleton pattern. Singleton EJBs were introduced in EJB 3.1 which itself was part of Java EE 6. All that’s required is a @javax.ejb.Singleton (class level) annotation (and a few more if you want to refine other aspects – read on) on a bean class to designate it as a Singleton session bean. There is one and only one instance of a Singleton EJB in a JVM – no matter how many clients access it. Its not like Stateful SB –  one bean instance attached to a single client throughout its life cycle, neither like Stateless SB – a new instance for each client request. What are the distinct states in the life cycle of a Singleton Session Bean? The life cycle for Singleton beans is the same as Stateless session beans – in fact it is one of the simpler aspects of this bean type:Does Not Exist ReadyHow do the states change? What triggers them? Here is a quick tabular snap shot and a high level diagram . . . State Transition Triggers CallbacksDNE to R When the instance is first accessed via JNDI/DI or automatically instantiated by the container using the @Startup or @DependsOn @PostConstructR to DNE Container shuts down – destroys the bean instance or in case an Exception occurs in the @PostConstruct annotated method @PreDestroyNote: DNE – Does Not Exist, R – Ready As stated earlier, life cycle is one of simpler features of Singleton beans. It’s critical to understand their concurrency aspects. Singleton Session Beans: Concurrency Management As stated – a Singleton has just one instance in the JVM. In a Java EE environment, concurrent access is inevitable – that’s why we are using a technology like Java EE in the first place ! One needs to make sure that the concurrency (locking) strategies w.r.t Singleton beans are well thought through, depending upon the use case and requirements.Singleton bean concurrency can be divided into 2 major categories:Container Managed (Default) Bean ManagedContainer Managed ConcurrencyAs the name suggests, the container applies sensible default configurations for the bean Can be controlled using annotations as well as XML (deployment descriptors) Explicitly declared using the @javax.ejb.ConcurrencyManagement annotation on the bean class itselfDefault value is javax.ejb.ConcurrencyManagementType.CONTAINERTwo possible locking strategies provided by the container – applicable on both bean class or its individual methods@javax.ejb.Lock with a value of javax.ejb.LockType.READ – allows concurrent access given no write locks @javax.ejb.Lock with a value of javax.ejb.LockType.WRITE (Default) – guarantees exclusive access – only a single thread can execute a bean method at a given point@javax.ejb.AccessTimeout can be specified on a bean class or method to ensure that a thread does not block or hold a lock for an indefinite time spanBean Managed ConcurrencyThe name clearly indicates – the concurrency aspects of the bean are left to the developer. Makes sense when finer concurrency control is required as compared to what’s been offered by the container via aforementioned constructs Usage of appropriate Java concurrency constructs required e.g. synchronized, volatile etc Hard to get right !Code example Let’s look into a simple code snippet in order to better make sense of the above stated facts: Scenario one – Container managed concurrency (default, locking type not explicitly specified) package com.abhirockzz.wordpress.ejb.lifecycle.singleton;import com.abhirockzz.wordpress.ejb.lifecycle.stateful.MyStatefulBean; import java.util.Date; import java.util.logging.Level; import java.util.logging.Logger; import javax.ejb.Singleton; import javax.ejb.Startup;@Singleton @Startup public class MySingletonBean {public void act() { System.out.println("Entered MySingletonBean/act() on " + new Date().toString() + " . Singleton instance " + this.hashCode() + " Thread : " + Thread.currentThread().getName()); try { Thread.sleep(2000); } catch (InterruptedException ex) { Logger.getLogger(MyStatefulBean.class.getName()).log(Level.SEVERE, null, ex); }System.out.println("Exit MySingletonBean/act() on " + new Date().toString() + " . Singleton instance " + this.hashCode() + " Thread : " + Thread.currentThread().getName());} } package com.abhirockzz.wordpress.ejb.lifecycle.singleton;import java.io.IOException; import java.util.Date; import javax.inject.Inject; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse;@WebServlet(name = "SingletonTestServlet", urlPatterns = {"/SingletonTestServlet"}) public class SingletonTestServlet extends HttpServlet {public SingletonTestServlet() { }@Inject MySingletonBean mySingleton;@Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { System.out.println("Entered SingletonTestServlet/doGet() on " + new Date().toString() + " . Servlet instance " + this.hashCode() + " Thread : " + Thread.currentThread().getName()); mySingleton.act(); }} Using Apache JMeter – I fired 2 concurrent threads at SingletonTestServlet (yes, just two.. this is more of a demonstration, not a load testing competition!)  Observations Looking at the logs, one can easily make out the following:The Servlet of course is not thread safe, hence two threads enter at the same time One of the threads enters the method in the Singleton bean class (marked in red) and further access is forbidden due to the default WRITE lock type enforced by the container As soon as the first thread finished execution, the second thread (marked in green) which was initially blocked, gets a chance to execute the Singleton bean method Pretty simple!Scenario two – Sticking with Container managed concurrency. Changing the explicit lock type from WRITE to READ import com.abhirockzz.wordpress.ejb.lifecycle.stateful.MyStatefulBean; import java.util.Date; import java.util.logging.Level; import java.util.logging.Logger; import javax.ejb.ConcurrencyManagement; import javax.ejb.ConcurrencyManagementType; import javax.ejb.Lock; import javax.ejb.LockType; import javax.ejb.Singleton; import javax.ejb.Startup;@Singleton @Startup @ConcurrencyManagement(ConcurrencyManagementType.CONTAINER) public class MySingletonBean {@Lock(LockType.READ) public void act() { System.out.println("Entered MySingletonBean/act() on " + new Date().toString() + " . Singleton instance " + this.hashCode() + " Thread : " + Thread.currentThread().getName()); try { Thread.sleep(2000); } catch (InterruptedException ex) { Logger.getLogger(MyStatefulBean.class.getName()).log(Level.SEVERE, null, ex); }System.out.println("Exit MySingletonBean/act() on " + new Date().toString() + " . Singleton instance " + this.hashCode() + " Thread : " + Thread.currentThread().getName());} } What happens when the the application is bombarded (pun intended !) with 2 concurrent threads. . . ?Two threads enter the Servlet at the same time – as expected One of the threads enters the method in the Singleton bean class (marked in red) The second thread (marked in green) also manages to enter the Singleton bean method at the same instant (check the time stamp) Again – pretty simple!Bean Managed concurrency is not something which I am depicting right now. As stated above, using BMC for a Singleton transfers the onus on to the developer and he is free to code concurrency features into the bean- this can be done simply using synchronized on each methods or other mechanisms e.g. from java.util.concurrent API. Suggested ReadingEJB (3.2) SpecificationCheers !Reference: EJB 3.x : Lifecycle and Concurrency models (part 2) from our JCG partner Abhishek Gupta at the Object Oriented.. blog....

Why you should control Visibility of Class and Interface in Java

One of the important aspects of software development is maintenance, and  it’s proven by experience that a piece of software which keeps visibility of its components low is more maintainable than one that exposes its components more. You won’t realize it upfront, but you will miss it badly, while redesigning your application. Since maintaining backwards compatibility is a “must have” requirement for many apps, you end up patching and repeating the same mistakes. You can not do much because lots of other applications are tightly integrated with your class and interfaces. Java has always put encapsulation on priority, provided support of access modifiers from the very beginning. It provides three ways to control visibility of any Type e.g. class or interface, by making them public,package-private or private. What happened to protected, can’t we use protected with class or interface. No you can’t, you can only use two access modifier with types, protected is not a legal modifier for a class or an interface. Also a top level class (a class whose name is same as of Java source file which contains it)  can be either public or package private (without any access modifier), it can not be private. Only a nested class can be private, public or package-private.  A public class is accessible to everyone, and it is most visible. Try to keep only key interfaces public, never let your implementation go public until you think it’s complete and mature. On the other hand, private Type is the least visible, and only nested class or interfaces can be private in Java. Since it’s the least visible, you have full control of this class to alter its behaviour with experiences, new technologies, tools and redesign. A clever midway is package-private visibility, which is also default visibility, there is no such keyword as package-private. Instead if you don’t provide any access modifier, then Java assumes that it is package-private, and subsequently makes it visible only on the same package. If your classes and interfaces are shared only between other classes in the same package, make them package-private. Since a client cannot access them, they are also relative safe to change. How to control Visibility of Class or Interface in Java Apart from reducing visibility of class or interface using access modifiers, there are a couple of other ways to do that, depending upon your runtime environment as well. At component level, such as in an Application Server like Websphere, Weblogic or JBoss, an implementation class can be proxied or wrapped to minimize external exposure. No matter what you do, there will always be some type, which needs to be exposed to external world, but with a proxy or wrapper, you can still manage them. Even though client programs, can load proxied implementation class, they will mostly get an immutable proxy or wrapper. For example getServletContext() from Java Servlet API (javax.servlet) returns an implementation of javax.servlet.ServletContext, which is usually an immutable proxy to fulfil promises made in ServletContext interface. It’s most likely that application server is running with different implementation of javax.servlet.ServletContext interface. Similar pattern can be used in the implementation of other externally exposed interfaces e.g. ServletRequest, ServletResponse, javax.ejb.EJBContext, javax.ejb.TimerService etc. Different application servers may use different implementations to support these global interfaces.Writing open source libraries is also a nice way to understand the need of controlling visibility of class and interface. Another interesting case is a component based Java application server e.g. JBoss, WebLogic or WebSphere. These servers provide low level services e.g. transaction management, security, persistence, object pooling etc. In short, a production system uses both application server’s code as well as application’s code to work perfectly. In order to be maintainable e.g. switching between different application servers, your app and server code should be loosely coupled and should maintain safe distance.  Application server’s internal implementation classes and interfaces should be completely hidden from the user applications for security purpose. If the application packages the same library that the server contains, care must be taken that the server does not inadvertently load the application’s version via thread context classloader. JDK Example of Controlling Visibility of Java Class One more interesting example of controlling visibility is my favourite EnumSet class. Java designers made it an abstract class to avoid instantiation, and provided factory methods as only way to create instance of that class e.g. EnumSet.of() or EnumSet.noneOf() methods. Internally they have two separate implementation in form of RegularEnumSet and JumboEnumSet, which is automatically chosen by static factory methods depending upon size of key universe. For example, if the number of values in a given Enum is less than 64, then RegularEnumSet is used, otherwise an instance of JumboEnumSet is returned. The beauty of this design is that, both of these implementations are package-private, which means that clients have no idea about them. They are completely transparent to users and there is additional security enforced by making these class abstract, because you cannot create an instance of an abstract class. This not only allows you to choose the most appropriate implementation, but also it would be very easy to replace them with a newer and better implementation. Though they are a really special class, and RegularEnumSet uses a long value to store enum constants. IMHO, this is a fantastic example of controlling visibility of classes from the JDK itself. In short, by minimizing visibility, which also leverages the benefit of Encapsulation, a well encapsulated code is more secure and maintainable. With the pace of technology, whatever you write today, becomes outdated in couple of years, so following basic principles of class design can help you get most from updated tools, libraries and JDK implementation.Reference: Why you should control Visibility of Class and Interface in Java from our JCG partner Javin Paul at the Javarevisited blog....

OptaPlanner – Vehicle routing with real road distances

In the real world, vehicles in a Vehicle Routing Problem (VRP) have to follow the roads: they can’t travel in a straight line from customer to customer. Most VRP research papers and demo’s happily ignore this implementation detail. As did I, in the past. Although using road distances (instead of air distances) doesn’t impact the NP-hard nature of a VRP much, it does result in a few extra challenges. Let’s take a look at those challenges.           Datasets with road distances First off, we need realistic datasets. Unfortunately, public VRP datasets with road distances are scarce in the VRP research community. The VRP Web has few small ones, such as a dataset of Bavaria with 29 locations, but nothing serious. So I had to generate some realistic datasets myself with the following requirements:Use Google Maps like roads with real distances in km between every pair of locations in the dataset.For example, use highways when reasonable over small roads.For every dataset, generate an air distance variant and a road distance variant, to compare results. Generate a similar dataset in multiple orders of magnitude, to compare scalability. Add reasonable vehicle capacities and customer demands, for the vehicle capacity constraint in VRP.I ended up generating datasets of Belgium with a location for cities, towns and subtowns. The biggest one has 2750 locations. I might add a road variant of the USA datasets later, those go up to 100 000 locations.By using the excellent Java library GraphHopper, based on OpenStreetMap, querying practical road distances was relatively easy. It’s also fast, as long as the entire road network (only 200MB for Belgium) can be loaded into memory. Loading the entire road network of North-America (6GB) is a bit more challenging. I ‘ll submit these datasets to the VRP Web, so others researchers can use them too. All this happens before OptaPlanner‘s VRP example starts solving it. During solving, the distances are already available in a lookup table. Once we start generating datasets with 1000 locations or more, pre-calculating all distances between every location pair can introduce memory and performance issues. I’ll explain those and the remedies in my next blog. Air distance vs Road distance For clarity, I’ll focus on the dataset belgium-n50-k10.vrp which has 50 locations and 10 vehicles with capacity 125 each. OptaPlanner was given 5 minutes to solve both variants (air and road distance). Using air distances (which calculates the euclidean distance based on latitude and longitude) results in:The total distance, 22.99 doesn’t mean much because it’s not in a common unit of measurement and because our vehicles can’t fly from point to point anyway. We need to apply this air distance solution on the real road network (shown below), to know the real distance:Now, let’s compare that air distance solution above with the road distance solution below.The road distance solution takes 108.45 km less, so it’s almost 5% better! And that’s on one of the most dense road networks in the world (Belgium’s road network): on more sparse road networks the gain might be more. Conclusion Using real distances instead of air distances does matter. Solving an VRP with air distances and then apply road distances is suboptimal. But can we really pre-calculate every locations pair in big datasets? Stay tuned.Reference: OptaPlanner – Vehicle routing with real road distances from our JCG partner Geoffrey De Smet at the OptaPlanner blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: