Featured FREE Whitepapers

What's New Here?


Ubuntu: Installing Apache Portable Runtime (APR) for Tomcat

After reading “Introducing Apache Tomcat 6? presentation by Mladen Turk I decided to enable Apache Portable Runtime (APR) native library for Tomcat. It was supposed to be as easy as sudo ./configure sudo make sudo make installbut as you may guess, it was a little bit more than that. 1. Installing Apache APR. “Most Linux distributions will ship packages for APR” – those of Linode don’t, I had a barebone Ubuntu 10.10 box without even “gcc” and “make”, let alone Apache APR. Thanks God, networking was not an issue, unlike last time. wget http://apache.spd.co.il/apr/apr-1.4.5.tar.gz tar -xzf apr-1.4.5.tar.gz rm apr-1.4.5.tar.gz cd apr-1.4.5/ sudo apt-get install make sudo ./configure sudo make sudo make install2. Installing Tomcat Native. wget http://off.co.il/apache//tomcat/tomcat-connectors/native/1.1.20/source/tomcat-native-1.1.20-src.tar.gz tar -xzf tomcat-native-1.1.20-src.tar.gz rm tomcat-native-1.1.20-src.tar.gz cd tomcat-native-1.1.20-src/jni/native sudo ./configure --with-apr=/usr/local/aprThe result was checking build system type... x86_64-unknown-linux-gnu .. checking for APR... yes .. checking for JDK location (please wait)... checking Try to guess JDK location... configure: error: can't locate a valid JDK locationOuch! “Can’t locate a valid JDK location” ? On my machine? $ which java /home/user/java/jdk/bin/java $ echo $JAVA_HOME /home/user/java/jdk $ java -version java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)But for some reason “tomcat-native-1.1.20-src/jni/native/configure” script didn’t see my “JAVA_HOME” variable no matter what and even installing “sun-java6-jdk” didn’t help much. After patching the “configure” script to dump locations it was looking for “valid JDK” I had: .. configure: [/usr/local/1.6.1] configure: [/usr/local/IBMJava2-1.6.0] configure: [/usr/local/java1.6.0] configure: [/usr/local/java-1.6.0] configure: [/usr/local/jdk1.6.0] configure: [/usr/local/jdk-1.6.0] configure: [/usr/local/1.6.0] configure: [/usr/local/IBMJava2-1.6] configure: [/usr/local/java1.6] configure: [/usr/local/java-1.6] configure: [/usr/local/jdk1.6] configure: [/usr/local/jdk-1.6] ..Ok then, here you have it now: sudo ln -s ~/java/jdk/ /usr/local/jdk-1.6 sudo ./configure --with-apr=/usr/local/apr sudo make sudo make installAnd with .. export LD_LIBRARY_PATH='$LD_LIBRARY_PATH:/usr/local/apr/lib' ..I now had a beautiful log message in “catalina.out“: .. Mar 7, 2011 11:51:02 PM org.apache.catalina.core.AprLifecycleListener init INFO: Loaded APR based Apache Tomcat Native library 1.1.20. Mar 7, 2011 11:51:02 PM org.apache.catalina.core.AprLifecycleListener init INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true]. Mar 7, 2011 11:51:03 PM org.apache.coyote.AbstractProtocolHandler init ..As soon as “evgeny-goldin.org” moves to its new location on the brand-new Linode box it will benefit from this performance optimization. I’ll describe the migration process and reasons for it a bit later, once it is done. Reference: Ubuntu: Installing Apache Portable Runtime (APR) for Tomcat from our JCG partner Evgeny Goldin at the Goldin++ blog....

Open Source Rules of Engagement

The Eclipse Development Process (EDP) defines–in its section on principles–three open source rules of engagement: Openness, Transparency, and Meritocracy: Open – Eclipse is open to all; Eclipse provides the same opportunity to all. Everyone participates with the same rules; there are no rules to exclude any potential contributors which include, of course, direct competitors in the marketplace. Transparent – Project discussions, minutes, deliberations, project plans, plans for new features, and other artifacts are open, public, and easily accessible. Meritocracy – Eclipse is a meritocracy. The more you contribute the more responsibility you will earn. Leadership roles in Eclipse are also merit-based and earned by peer acclaim. In more concise terms, transparency is about inviting participation; openness is about actually accepting it; and meritocracy is a means of limiting participation to those individuals who have demonstrated the desire and means to actually participate. Transparency is one of those things that I believe most people understand in principle. Do everything in public: bug reports are public, along with all discussion; mailing lists are public; team meetings are public and minutes are captured (and disseminated). By operating transparently, the community around the project can gain insight into the direction that the project is moving and adjust their plans accordingly. In practice, however, transparency is difficult to motivate in the absence of openness. What is the value to the community of discussing every little detail of an implementation in public? Does anybody really care? The fact of the matter is that a lot of people really don’t care. Most users of Eclipse are blissfully unaware that we even have bug tracking software and mailing lists. But some people do care, and transparency is a great way to hook those people who do are and get them to participate. A lot of open source projects understand transparency. A lot, however, don’t understand openness. They’re not the same thing. To be “open” means that a project is “open” to participation. More than that, an “open” invites and actively courts participation. Participation in an open source project takes many forms. It starts by creating bug reports and providing patches, tests, and other bits of code. Overtime, contribution increases, and–eventually–some contributors become full-blown members of the project. Courting the community for contributors should be one of the first-class goals of every open source project. But openness isn’t just about getting more help to implement your evil plans for world domination. It’s also about allowing participants to change your evil plans for world domination. Openness is about being open to new ideas, even–as the EDP states–if those new ideas come from your direct competitors in the marketplace. A truly open project actively courts diversity. Having different interests working together is generally good for the overall health of an open source project and the community that forms around it. Reference: Open Source Rules of Engagement from our JCG partner Wayne Beaton at the Eclipse Hints, Tips, and Random Musings blog....

Groovy 1.8.0 – meet JsonBuilder!

Groovy 1.8.0 released in April brought a lot of new features to the language, one of them is native JSON support through JsonSlurper for reading JSON and JsonBuilder for writing JSON. I recently used JsonBuilder in one of my projects and initially experienced some difficulties in understanding how it operates. My assumption was that JsonBuilder works similarly to MarkupBuilder but as I have quickly found out, it really doesn’t. Let’s take a simple example. Assume we have a class Message that we would like to serialize to XML markup and JSON. @groovy.transform.Canonical class Message { long id String sender String text } assert 'Message(23, me, some text)' == new Message( 23, 'me', 'some text' ).toString()Here I used Groovy 1.8.0 @Canonical annotation providing automatic toString(), equals() and hashCode() and a tuple (ordered) constructor. Let’s serialize a number of messages to XML. def messages = [ new Message( 23, 'me', 'some text' ), new Message( 24, 'me', 'some other text' ), new Message( 25, 'me', 'same text' )] def writer = new StringWriter() def xml = new groovy.xml.MarkupBuilder( writer ) xml.messages() { messages.each { Message m -> message( id : m.id, sender : m.sender, text : m.text )} } assert writer.toString() == """ <messages> <message id='23' sender='me' text='some text' /> <message id='24' sender='me' text='some other text' /> <message id='25' sender='me' text='same text' /> </messages>""".trim()Well, that was pretty straightforward. Let’s try to do the same with JSON. def json = new groovy.json.JsonBuilder() json.messages() { messages.each { Message m -> message( id : m.id, sender : m.sender, text : m.text )} } assert json.toString() == '{"messages":{"message":{"id":25,"sender":"me","text":"same text"}}}'Wow, where did all other messages go? Why only one last message in the list was serialized? How about this: json = new groovy.json.JsonBuilder() json.messages() { message { id 23 sender 'me' text 'some text' } message { id 24 sender 'me' text 'some other text' } } assert json.toString() == '{"messages":{"message":{"id":24,"sender":"me","text":"some other text"}}}'Same story. Initially I was puzzled, but then JsonBuilder source code showed that every invocation overrides the previous content: JsonBuilder(content = null) { this.content = content } def call(Map m) { this.content = m return content } def call(List l) { this.content = l return content } def call(Object... args) { this.content = args.toList() return this.content } def call(Closure c) { this.content = JsonDelegate.cloneDelegateAndGetContent(c) return content }As you see, one should invoke JsonBuilder exactly once, passing it a Map, List, varargs or Closure. This makes JsonBuilder very different from MarkupBuilder which can be updated as many times as needed. It could be caused by the JSON itself, whose format is stricter than free-form XML markup: something that started as a JSON map with a single Message, can not be made into array of Messages out of sudden. The argument passed to JsonBuilder (Map, List, varargs or Closure) can also be specified in constructor so there’s no need to invoke a builder at all. You can simply initialize it with the corresponding data structure and call toString() right away. Let’s try this! def listOfMaps = messages.collect{ Message m -> [ id : m.id, sender : m.sender, text : m.text ]} assert new groovy.json.JsonBuilder( listOfMaps ).toString() == '''[{"id":23,"sender":"me","text":"some text"}, {"id":24,"sender":"me","text":"some other text"}, {"id":25,"sender":"me","text":"same text"}]'''. readLines()*.trim().join()Now it works :) After converting the list of messages to the list of Maps and sending them to the JsonBuilder in one go, the String generated contains all messages from the list. All code above is available in Groovy web console so you are welcome to try it out. Btw, for viewing JSON online I recommend an excellent “JSON Visualization” application made by Chris Nielsen. “Online JSON Viewer” is another popular option, but I much prefer the first one. And for offline use “JSON Viewer” makes a good Fiddler plugin.P.S. If you need to read this JSON on the client side by sending, say, Ajax GET request, this can be easily done with jQuery.get(): <script type="text/javascript"> var j = jQuery; j( function() { j.get( 'url', { timestamp: new Date().getTime() }, function ( messages ){ j.each( messages, function( index, m ) { alert( "[" + m.id + "][" + m.sender + "][" + m.text + "]" ); }); }, 'json' ); }); </script>Here I use a neat trick of a j shortcut to avoid typing jQuery too many times when using $ is not an option. Reference: Groovy 1.8.0 – meet JsonBuilder! from our JCG partner Evgeny Goldin at the Goldin++ blog....

Best Practices for JavaFX Mobile Applications, Part 1

As everybody who is interested in JavaFX will know by now, JavaFX Mobile was released a short while ago. It was a hell of a ride, that’s for sure. I felt so exhausted, I did not even have the energy to blog during the release… But by now I feel recovered and want to start a little series about lessons we have learned while preparing the release and give some hints how to improve the performance of JavaFX Mobile applications. WARNING: The tips I am giving here are true for the current version of JavaFX Mobile, which is part of the JavaFX 1.1 SDK. In future versions the behavior will change, the current bad performance of the mentioned artifacts will be optimized away or at least significantly improved. Everything I am writing about here is a snap-shot, nothing should be understood as final! Item 1: Avoid unnecessary bindings Bindings are very convenient, without any doubt one of the most valuable innovations in JavaFX Script. Unfortunately they come with a price. The generated boiler-plate code is usually not as small and fast as a manual implementation would be. Especially complex dependency-structures tend to impose a severe penalty on performance and footprint. For this reason it is recommended to avoid bindings as much as possible. Often the same functionality can be implemented with triggers. One should not use bindings to avoid the hassle of dealing with the initialization order. And it certainly makes no sense to bind to a constant value. Lazy bindings are most of the time (but not always!) faster if a bound variable is updated more often then read, but they are still not as fast as manual implementations. Example A common use-case is a number of nodes which positions and sizes depend on the stage-size. A typical implementation uses bindings to achieve that. Here we will look at a simple example, which resembles such a situation. The scene consists of three rectangles which are laid out diagonally from the top-left to the bottom-right. The size of the rectangle is a quarter of the screen-size. Code Sample 1 shows an implementation with bindings. def rectangleWidth: Number = bind stage.width * 0.25; def rectangleHeight: Number = bind stage.height * 0.25;def stage: Stage = Stage { scene: Scene { content: for (i in [0..2]) Rectangle { x: bind stage.width * (0.125 + 0.25*i) y: bind stage.height * (0.125 + 0.25*i) width: bind rectangleWidth height: bind rectangleHeight } } }Code Sample 1: Layout calculated with bindings The first question one should think about is wether the bindings are really necessary. On a real device the screen-size changes only when the screen orientation is switched (provided that the device supports this functionality). If our application does not support screen rotation, the layout can be defined constant. One possible solution to reduce the number of bindings is shown in Code Sample 2. Two variables width and height are introduced and bound to stage.width and stage.height respectively. Their only purpose is to provide triggers for stage.width and stage.height, since we do not want to override the original triggers. Position and size of the rectangles are calculated manually in the triggers. def r = for (i in [0..2]) Rectangle {}def stage = Stage { scene: Scene {content: r} }def height = bind stage.height on replace { def rectangleHeight = height * 0.25; for (i in [0..2]) { r[i].height = rectangleHeight; r[i].y = height * (0.125 + 0.25*i) } }def width = bind stage.width on replace { def rectangleWidth = width * 0.25; for (i in [0..2]) { r[i].width = rectangleWidth; r[i].x = width * (0.125 + 0.25*i) } }Code Sample 2: Layout calculated in trigger Without any doubt, the code in Code Sample 1 is more elegant. But measuring the performance of both snippets in the emulator, it turned out the code in Code Sample 2 is almost twice as fast. Further below we are going to see about the second tip to increase performance of JavaFX Mobile applications. I think this and the previous one are the most important ones. WARNING: The tips I am giving here are true for the current version of JavaFX Mobile, which is part of the JavaFX 1.1 SDK. In future versions the behavior will change, the current bad performance of the mentioned artifacts will be optimized away or at least significantly improved. Everything I am writing about here is a snap-shot, nothing should be understood as final! Item 2: Keep the scenegraph as small as possible Behind the scenes of the runtime a lot of communication takes place to update the variables of the nodes in a scenegraph. The more elements a scenegraph has, the more communication is required. Therefore it is critical to keep the scenegraph as small as possible. Especially animations tend to suffer from a large scenegraph. It is bad practice to keep a node in the scenegraph at all times and control its visibility via the visible-flag or its opacity. Invisible nodes in the scenegraph are still part of the communication-circus in the background. Instead one should remove nodes from the scenegraph and add them only when required. This approach has one drawback though. Adding or removing nodes takes longer than setting the visibility. Therefore it might not be appropriate in situations were immediate responses are critical. Example 1 Often one has a set of nodes of which only one is visible. These can be for example different pages, or nodes to visualize different states of an element. One might be tempted to add all nodes to the scenegraph and set only the current as visible. Code Sample 1 shows a simplified version of this approach. Three colored circles are created to visualize some kind of state (red, yellow, green). Only one node is visible at any time. (Let’s ignore for a second that this could simply be achieved by changing the fill-color of a single circle. In real life applications one would probably have images or more complex shapes for visualizations and simply changing the color would not work.) def colors = [Color.GREEN, Color.YELLOW, Color.RED];var state: Integer;Stage { scene: Scene { content: for (i in [0..2]) Circle { centerX: 10 centerY: 10 radius: 10 fill: colors[i] visible: bind state == i } } }Code Sample 1: Using visibility to switch between nodes This results in three nodes in the scenegraph although only one is shown. This should be refactored to ensure that only the visible node is in the scenegraph. Code Sample 2 shows one possible implementation. def colors = [Color.GREEN, Color.YELLOW, Color.RED];var state: Integer on replace oldValue { insert nodes[state] into stage.scene.content; delete nodes[oldValue] from stage.scene.content; }def nodes = for (i in [0..2]) Circle { centerX: 10 centerY: 10 radius: 10 fill: colors[i] }def stage = Stage {scene: Scene{}}Code Sample 2: Adding and removing nodes when required The code in Code Sample 1 is more compact, but Code Sample 2 reduced the number of nodes in the scenegraph from three to one. While tuning some of the demos for the JavaFX Mobile release, we were able to reduce the number of nodes in the scenegraph by 50% and more, simply by ensuring that only visible nodes are part of it. Example 2 If nodes are shown and hidden with some kind of animation, adding and removing the node to the scenegraph becomes extremely simple. One only needs to implement an action at the beginning of the fadeIn-animation and at the end of the fadeOut-animation to add respectively remove the node. Code Sample 3 shows such a usage where a simple message-box is shown and hidden by changing the opacity. def msgBox = Group { opacity: 0.0 content: [ Rectangle {width: 150, height: 40, fill: Color.GREY}, Text {x: 20, y: 20, content: "Hello World!"} ] }def fadeIn = Timeline { keyFrames: [ KeyFrame { action: function() {insert msgBox into stage.scene.content} }, at (1s) {msgBox.opacity => 1.0 tween Interpolator.LINEAR} ] }def fadeOut = Timeline { keyFrames: KeyFrame { time: 1s values: msgBox.opacity => 0.0 tween Interpolator.LINEAR action: function() {delete msgBox from stage.scene.content} } }def stage = Stage {scene: Scene{}}Code Sample 3: Using fadeIn- and fadeOut-animations to add and remove nodes. Reference: Best Practices for JavaFX Mobile Applications & Best Practices for JavaFX Mobile Applications 2 from our JCG partner Michael Heinrichs at the Mike’s Blog....

Simple but powerful DSL using Groovy

In one of my projects we had very complicated domain model, which included more than hundred of different domain object types. It was a pure Java project and, honestly, Java is very verbose with respect to object instantiation, initialization and setting properties. Suddenly, the new requirement to allow users define and use own object models came up. So … the journey begun. We ended up with the idea that some kind of domain language for describing all those object types and relations is required. Here Groovy came on rescue. In this post I would like to demonstrate how powerful and expressive could be simple DSL written using Groovy builders. As always, let’s start with POM file for our sample project:4.0.0com.example dsl 0.0.1-SNAPSHOT jar UTF-8 junit junit 4.10 org.codehaus.groovy groovy-all 1.8.4 org.codehaus.gmaven gmaven-plugin 1.4 1.8 compile testCompile org.apache.maven.plugins maven-compiler-plugin 2.3.1 1.6 1.6I will use the latest Groovy version, 1.8.4. Our domain model will include three classes: Organization, User and Group. Each Organization has a mandatory name, some users and some groups. Each group can have some users as members. Pretty simple, so here are our Java classes. Organization.java package com.example;import java.util.Collection;public class Organization { private String name; private Collection< User > users = new ArrayList< User >(); private Collection< Group > groups = new ArrayList< Group >(); public String getName() { return name; }public void setName( final String name ) { this.name = name; }public Collection< Group > getGroups() { return groups; }public void setGroups( final Collection< Group > groups ) { this.groups = groups; }public Collection< User > getUsers() { return users; }public void setUsers( final Collection< User > users ) { this.users = users; } }User.java package com.example;public class User { private String name;public String getName() { return name; }public void setName( final String name ) { this.name = name; } }Group .java package com.example;import java.util.Collection;public class Group { private String name; private Collection< User > users = new ArrayList< User >();public void setName( final String name ) { this.name = name; }public String getName() { return name; }public Collection< User > getUsers() { return users; }public void setUsers( final Collection< User > users ) { this.users = users; } }Now, we have our domain model. Let think about the way regular user can describe own organization with users, groups and relations between all these objects. Primarily, we taking about some kind of human readable language simple enough for regular user to understand. Meet Groovy builders. package com.example.dsl.samplesclass SampleOrganization { def build() { def builder = new ObjectGraphBuilder( classLoader: SampleOrganization.class.classLoader, classNameResolver: "com.example" )return builder.organization( name: "Sample Organization" ) { users = [ user( id: "john", name: "John" ),user( id: "samanta", name: "Samanta" ),user( id: "tom", name: "Tom" ) ]groups = [ group( id: "administrators", name: "administrators", users: [ john, tom ] ), group( id: "managers", name: "managers", users: [ samanta ] ) ] } } }And here is small test case which verifies that our domain model is as expected: package com.example.dslimport static org.junit.Assert.assertEquals import static org.junit.Assert.assertNotNullimport org.junit.Testimport com.example.dsl.samples.SampleOrganizationclass BuilderTestCase { @Test void 'build organization and verify users, groups' () { def organization = new SampleOrganization().build()assertEquals 3, organization.users.size() assertEquals 2, organization.groups.size() assertEquals "Sample Organization", organization.name } }I am using this simple DSL again and again across many projects. It’s really simplifies a lot complex object models creation. Reference: Simple but powerful DSL using Groovy from our JCG partner Andrey Redko at the Andriy Redko {devmind} blog...

The problems in Hadoop – When does it fail to deliver?

Hadoop is a great piece of software. It is not original but that certainly does not take away its glory. It builds on parallel processing, a concept that’s been around for decades. Although conceptually unoriginal, Hadoop shows the power of being free and open (as in beer!) and most of all shows about what usability is all about. It succeeded where most other parallel processing frameworks failed. So, now you know that I’m not a hater. On the contrary, I think Hadoop is amazing. But, it does not justify some blatant failures on the part of Hadoop, may it be architectural, conceptual or even documentation wise. Hadoop’s popularity should not shield it from the need to re-enginer and re-work problems in the Hadoop implementation. The point below are based on months of exploring and hacking around Hadoop. Do dig in.Did I hear someone say “Data Locality”? Hadoop harps over and over again on data locality. In some workshops conducted by Hadoop milkers, they just went on and on about this. They say whenever possible, Hadoop will attempt to start a task on a block of data that is stored locally on that node via HDFS. This sounds like a super feature, doesn’t it? It saves so much of bandwidth without having to transfer TBs of data, right? Hellll, no. It does not. What this means is that first you have to figure out a way of getting data into HDFS, the Hadoop Distributed File System. This is non trivial, unless you live in the last decade and all your data exists as files. Assuming that you do, let’s transfer the TBs of data over to HDFS. Now, it will start doing it’s whole “data locality” thing. Ermm, OK. Am I hit by a wave of brilliance or isn’t it what’s is supposed to do anyway? Let’s get our facts straight. To use Hadoop, our problem should be able to execute in parallel. If the problem or a at least a sub-problem can’t be parallelized it won’t gain much out of Hadoop. This means the task algorithm is independent of any specific part of the data it processes. Further simplifying this would be saying, any task can process any section of the data. So, doesn’t that mean the “data locality” thing is the obvious thing to do? Why, would the Hadoop developers even write some code that would make a task process data in another node unless something goes horribly wrong. The feature would be if it was doing otherwise! If a task has finished operating on the node’s local data and then would transfer data from another node and process this data, that would be a worthy feature of the conundrum. At least that would be worthy of noise. Would you please put everything back into files Do you have nicely structured data in databases? Maybe, you became a bit fancy and used the latest and greatest NoSQL data store? Now let me write down what you are thinking. “OK, let’s get some Hadoop jobs to run on this, cause I want to find all this hidden gold mines in my data, that will get me a front page of Forbes.” I hear you. Let’s get some Hadoop jobs rolling. But wait! What the …..? Why are all the samples in text files. A plethora of examples using CSV files, tab delimited files, space delimited files, and all other kind of neat files. Why is everyone going back a few decades and using files again? Haven’t all these guys heard of DBs and all that fancy stuff. It seems that you were too early an adopter of Data Stores. Files are the heroes of the Hadoop world. If you want to use Hadoop quickly and easily, the best path for you right is to export your data neatly into files and run all those snazzy word count samples (Pun intended!). Because without files Hadoop can’t do all that cool “data locality” shit. Everything has to be in HDFS first. So, what would you do to analyze your data in the hypothetical FUHadoopDB? First of all, implement about 10+ classes necessary to split and transfer data into the Hadoop nodes and run your tasks. Hadoop needs to know how to get data from FUHadoopDB, so let’s assume this is acceptable. Now, if you don’t store it in HDFS, you won’t get the data locality shit. If this is the case, when the task runs, they themselves will have to pull data from the FUHadoopDB to process the data. But, if you want the snazzy data locality shit you need to pull data from FUHadoopDB and store it in HDFS. You will not incur the penalty of pulling data while the tasks are running, but you pay it at the preparation stage of the job, in the form of transferring the data into HDFS. Oh and did I mention the additional disk space you would need to store the same data in HDFS. I wanted to save that disk space, so I chose to make my tasks pull data while running the tasks. The choice is yours. Java is OS independent, isn’t it? Java has its flaws but for the most part it runs smoothly on most OSs. Even if there are some OS issues, it can be ironed out easily. The Hadoop folks have issued document mostly based on Linux environments. They say Windows is supported, but ignored those ignorant people by not providing adequate documentation. Windows didn’t even make it to the recommended production environments. It can be used as a development platform, but then you will have to deploy it on Linux. I’m certainly not a windows fan. But if I write a Java program, I’d bother to make it run on Windows. If not, why the hell are you using Java? Why the trouble of coming up with freaking bytecode? Oh, the sleepless nights of all those good people who came up with byte code and JVMs and what not have gone to waste. CS 201: Object Oriented Programming If you are trying to integrate Hadoop into your platform, think again. Let me take the liberty of typing your thoughts. “Let’s just extend a few interfaces and plugin my authentication mechanism. It should be easy enough. I mean these guys designed the world’s greatest software that will end world hunger.”. I hear you again. If you are planning to do this, don’t. It’s like OOP anti patterns 101 in there. So many places that would say “if (kerberos)” and execute some security specific function. One of my colleagues went through this pain, and finally decided to that it’s easier to write keberos based authentication for his software and then make it work with Hadoop. With great power comes great responsibility. Hadoop fails to fulfil this responsibility. Even with these issues, Hadoop’s popularity seems to be catching significant attention, and its rightfully deserved. Its ability to commodotize big data analytics should be exalted. But it’s my opinion that it got way too popular way too fast. The Hadoop community needs to have another go at revamping this great piece of software. Reference: The problems in Hadoop – When does it fail to deliver? from our JCG partner Mackie Mathew at the dev_religion blog...

15 Tenets For The Software Engineer

Many people talk about the things a software engineer needs to know in order to be successful in their job. Other people talk about the traits needed to be successful. Typically, these posts may read differently but there are many similarities between the two posts. In reality, a software can never really be successful without looking at both types of posts. The list of 15 tenets below is my hope to consolidate the ideas into one handy list for your review.Remember the basics. If you forget the basics of a programming language, you lose your foundational knowledge. That is never a good thing. Always assume the worst case. If you had formal computer science education, you learned about big-O notation. Knowing why an algorithm has no chance of performing well is a good thing. Figuring out why a particular use case seems much slower than others is how you stay successful. Test your code. Ensure you have tests for your code, whether you follow TDD or any other method. Depending on the type of test, you may want to target a different level of coverage, but you should still write as many tests as you can. Do not employ new technologies because they are new, use them because they solve a problem. As technologists, we tend to follow the hot new tools in the hope of finding a silver bullet. Utility is the key, not coolness. Read, a lot. If you are not reading about our industry, you will fall behind and that could have career threatening complications. Try new techniques and technologies, a lot. Yes, I said not to use new technologies just because they are new, but you do need to try new things in order to determine if something new is useful. Also, trying new things helps you learn and keep current in your industry. Fail, you will learn something. At the minimum, you will learn what does not work and you can refine your solutions. In some case, you can even consider the failure a small success. Ship the damn software. Sometimes you just need to get the job done, but you must be aware of technical debt. If you continuously just ship software without removing technical debt, you are well on your way to creating a nightmare when a major production issue arises. Do it the “right way”. Most developers have an idea of the “right way” to implement a design, but that may not always be what project management wants. This is almost a contradiction to the previous “ship the damn software” rule, but there is a balance that needs to be met. Leave the code better than how you found it. Instead of preaching the benefits of refactoring, think of whether you want to maintain the pile of code that keeps getting worse. If you clean it up a little each time you modify it, then it will not be a terrible mess. Think about concurrent access. If you are building a web application, and I don’t mean the scale of Facebook, weird issues may arise under load. Even an application with 100 concurrent users can start to see weird issues when there is concurrent reads and writes on things like HashMaps. This is just the start of the problems as well. Storage may be free, but I/O sucks. You may think that writing everything to disk is a great way to persist data. Generally it is, but if you use disk storage as a temporary storage area, your application could quickly grind to a slow crawl. Physical storage should be limited to that data that needs to persist for long periods of time, or when the data cannot reside in memory. Memory does not go as far as you may think. To start, many people will have their application and database residing on the same server. This is perfectly acceptable until both require a lot of RAM. As an example, you can easily run a Java application in Tomcat in 528MB. However, once you have to deal with scale of any sort and you add in the RAM required by the persistent storage (RDBMS, NoSQL, etc), you can quickly jump to 8GB. Obviously, this is highly dependent upon the number of users hitting the system and how much data you store in memory. Caching fixes everything until it crashes the server. If you are looking for ways to avoid a lot of database queries, you end up using some form of caching. The problem is that caching requires much more memory than your typical application usage, especially when dealing with data that scales with the number of users (see the previous point on memory). The worst problem with caching is that it can chew up so much memory that you run into an OutOfMemory error in java or similar errors in other languages. At that point, your server will either crash or become unresponsive and caching no longer helps because it has become part of the problem. Think like a consultant. As an employee, there tends to be an unwritten rule that the company can do things they would not do with consultants. Deadlines may be moved, scope may be increased, and the developer needs to find a way to meet these new constraints. As an employee, you need to use your power to state that the deadline can not move due to the amount of work required, or that scope cannot be increased without increasing the number of resources. Consultants tend to be allowed to manage a project differently than employees, and it is our job to change that.I know there are a bunch of other ideas that keep running through my head, but this is the best list I can create for now. What other rules would you include for software engineers? Reference: 15 Tenets For The Software Engineer from our JCG partner Rob Diana at the Regular Geek blog...

Shame driven development

I always aspire to write well-crafted code. During my day job, where all production code is paired on, I think our quality is pretty high. But it’s amazing how easy you forgive yourself and slip into bad habits while coding alone. Is shame the driving force behind quality while pairing? We have a number of archaic unit tests written using Easy Mock; all our more recent unit tests use JMock. This little piece of technical debt means that if you’re changing code where the only coverage is Easy Mock tests you first have to decide: do you fix up the tests or, can you hold your nose and live with / tweak the existing test for your purposes? This is not only distracting, but it means doing the right thing can be substantially slower. Changing our Easy Mock tests to JMock is, in principle, a relatively simple task. Easy Mock declares mocks in a simple way: private PricesService prices = createMock(PricesService.class);These can easily be converted into JMock-style: private Mockery context = new Mockery(); ... private final PricesService prices = context.mock(PricesService.class);EasyMock has a slightly different way of declaring expectations: prices.prefetchFor(asset); expect(prices.for(asset)).andReturn( Lists.newListOf("1.45", "34.74"));These need to be translated to JMock expectations: context.checking(new Expectations() {{ allowing(prices).prefetchFor(asset); allowing(prices).for(asset); will(returnValue(Lists.newListOf("1.45", "34.74"))); }});This process is pretty mechanical, so as part of 10% time I started using my scripted refactoring tool – Rescripter – to mechanically translate our EasyMock tests into JMock. Rescripter let’s you run code that modifies your Java source. But this is more than just simple search & replace or regular expressions: by using Eclipse’s powerful syntax tree parsing, you have access to a fully parsed representation of your source file – meaning you can find references to methods, locate method calls, names, parameter lists etc. This is exactly what you need given the nature of the translation from one library to another. This was inevitably fairly exploratory coding. I wasn’t really sure what would be possible and how complex the translation process would eventually become. But I started out with some simple examples, like those above. But, over time, the complexity grew as the many differences between the libraries made me work harder and harder to complete the translation.After a couple of 10% days on this I’d managed to cobble together something awesome: I’d translated a couple of hundred unit tests; but, this was done by 700 lines of the most grotesque code you’ve ever had the misfortune to lay your eyes upon! And then… and then last week, I got a pair partner for the day. He had to share this horror. Having spent 10 minutes explaining the problem to him and then 15 minutes explaining why it was throwaway, one-use code so didn’t have any unit tests. I was embarrassed. We started trying to make some small changes; but without a test framework, it was difficult to be sure what we were doing would work. To make matters worse, we needed to change core functions used in numerous places. This made me nervous, because there was no test coverage – so we couldn’t be certain we wouldn’t break what was already there. Frankly, this was an absolute nightmare. I’m so used to having test coverage and writing tests – the thought of writing code without unit tests brings me out in cold sweats. But, here I was, with a mess of untested code entirely of my own creation. Why? Because I’d forgiven myself for not “doing it right”. After all, it’s only throwaway code, isn’t it? It’s exploratory, more of a spike than production code. Anyway, once its done and the tests migrated this code will be useless – so why make it pretty? I’ll just carry on hacking away… It’s amazing how reasonable it all sounds. Until you realise you’re being a total and utter fucktard. Even if it’s one-use code, even if it has a relatively short shelf-life the only way to go fast, is to go well So I did what any reasonable human being would do. I spent my lunch hour fixing this state of affairs. The end result? I could now write unit tests in Jasmine to verify the refactoring I was writing. Not only could I now properly test drive new code. I could write tests to cover my existing legacy code, so I could refactor it properly. Amazing. And all of a sudden, the pace of progress jumped. Instead of long debug cycles and trying to manually find and trigger test scenarios, I had an easy to run, repeatable, automated test suite that gave me confidence in what I was doing. None of this is new to me: it’s what I do day-in day-out. And yet… and yet… somehow I’d forgiven myself while coding alone. The only conclusion I can draw is that we can’t be trusted to write code of any value alone. The shame of letting another human being see your sorry excuse for code is what drives up quality when pairing: if you’re not pair programming, the code you’re writing must be shameful. References: Shame driven development from our JCG partner David Green at the Actively Lazy blog....

How should REST services be documented?

REST has become the standard way of creating APIs and exposing resources on the internet. Traditional web services (using SOAP and various sets of WS-* standards) are still used a lot within enterprises, but have pretty much disappeared completely from the public API area and are replaced (or deprecated in favor of) REST based APIs. REST based APIs are generally easier to use and get started with the SOAP based services and usually don’t require all kinds of code generation to create the messages and the envelopes. However, one thing that is often missing or overlooked, when creating REST based APIs or services is the documentation part. REST doesn’t have a WSDL that is used to describe to service (see section on WADL further down in the article) and often it is said REST resources should be self-documenting. Even though this is easy to say, it’s generally a good idea to provide additional documentation. In this article I’ll show you what you should document and how you can provide this documentation together with your resource. Lets first look at an example resource. This resource, shown next, represent a report you can make to your local munipality. For instance you notice that a traffic light isn’t functioning, or there is a hole in the road. Your very modern municipality offers a REST based API you can use to report such events. Content-Type: application/vnd.opengov.org.report+json;charset=UTF-8 {"report": { "self": "report-1", "status": "New", "location": "Corner of ninth street", "x-coordinate": 52.34, "y-coordinate": 4.34, "description": "There is ugly graffiti sprayed on the mailbox at the corner on ninth street", "date": "25-11-2010", "time": "15:46" "images": [ {"href": "images/image1.png"}, {"href": "images/image1.png"} ], "related":[ {"href": "../report-4"}, {"href": "../report-7"}, {"href": "../report-9"} ] "links": [ {"relation": "invalidation", "href": "http://localhost:9002/opengov/invalidations/"}, {"relation": "duplication", "href": "http://localhost:9002/opengov/duplications/"} {"relation": "relation", "href": "http://localhost:9002/opengov/relations/"} ] "comments": [] } }REST services are often said to be self-documenting. If you look at this resource you can quickly pretty much already understand what this resource represents. It contains some general information about the resource: "self": "report-1", "status": "New", "location": "Corner of ninth street", "x-coordinate": 52.34, "y-coordinate": 4.34, "description": "There is ugly graffiti sprayed on the mailbox at the corner on ninth street", "date": "25-11-2010", "time": "15:46"Shows where you can find some related resources such as images related to this report: "images": [ {"href": "images/image1.png"}, {"href": "images/image1.png"} ]Or other reports related to this report: "related":[ {"href": "../report-4"}, {"href": "../report-7"}, {"href": "../report-9"} ]Finally from this resource you can also see how you can traverse the links in this resource to report a duplication, invalidate this report or add a related report: "links": [ {"relation": "invalidation", "href": "http://localhost:9002/opengov/invalidations/"}, {"relation": "duplication", "href": "http://localhost:9002/opengov/duplications/"} {"relation": "relation", "href": "http://localhost:9002/opengov/relations/"} ]As you can see this REST resource explains itself pretty well. For instance if we want to add a comment to this report we could just use a PUT with a comment message to the /reports/report-1/comments/ URL . But what does this comment message look like? How can we use the links? For this we do need some additional documentation to make our intent clear to the users of our service. What you should describe are the following items:URLs used to access or search for a report; Links relations that describe how various resources are linked together. Media Types that are used by this service;Let’s make such a description for this service. The first thing we describe is the URL on which this service can be accessed: URLs:http://localhost:9002/opengov/reports?location=xPos,yPos&radius=rSubmit a GET request to this URL to search for Reports. You can optionally specify a location and a radius to only return reports for a specific area. If no location and radius are specified the first 100 reports, sorted by date (newest first), are returned. The reports that are returned have the application/vnd.opengov.org.report+json media type. xPos: x-coordinate of the location. Accepts GPS coordinates. yPos: y-coordinate of the location. Accepts GPS coordinates. r: radius to search for in meters.This piece of documentation describes how to use our search service. We also explicitly define the media-type that this service returns. This way consumers already know how to act with the results from this search service. Another important aspect of the documentation are the links we’ve defined: Links: self: identifies the current resource. This (relative) URL can be used to directly access or modify a report. http://localhost:9002/opengov/invalidations/: This URL can be used to invalidate this resource. Use an HTTP PUT operation on this URL with media type application/vnd.opengov.org.invalidation+json. http://localhost:9002/opengov/duplications/: This URL can be used to mark a report as a duplicate. Use an HTTP PUT operation on this URL with media type application/vnd.opengov.org.duplication+json. http://localhost:9002/opengov/relations/: This URL can be used to relate two reports to each other. Use an HTTP PUT operation on this URL with media type application/vnd.opengov.org.invalidation+json.Here we describe what a specific link implies, how to use it, and what kind of media-types this link expects. That leaves us with describing the resources themselves. If you’re doing REST with XML, you should describe the message using an XSD. If you’re doing JSON you could use JSON-Schema, but I prefer to just describe the fields per media-type. Media types: application/vnd.opengov.org.report+json - status: The status of this report - location: Readable description of the location of this report - etc.If there are any restrictions on these values, this is a good place to describe them. Remember, we write this documentation for human consumption, we don’t require it to be parsable. With the items described above, you’ve got a very good starting point for creating documentation for REST services. What you should keep in mind are the following items:Follow the basic REST principles for the HTTP PUT, GET, DELETE and POST operations. Use href/links when linking to other resources. It doesn’t really matter if you use relative links or absolute links for this, although relative links are more flexible should you relocate your resource. Use media types to inform your consumer the type of resource they are dealing with. Use links with a specific relation property to tell your consumers what you can do with this resource. Add a simple description of the URLs, media types and links that are supported by your service.Since I expect someone coming with this question, I’ll answer it beforehand. Why not use a WADL to describe your REST API? A couple of reasons:A WADL focusses more on machine-to-machine communication. REST services are more often created to be explored through human interaction. Not by generating a client stub WADL is a good concept but leans to much towards WSDLs. WSDLs are often used to create RPC based services that hide the underlying protocol. REST APIs are used in a different manner where this approach doesn’t fit that well. An old article, but explains the why/why not of a WADL pretty well: http://bitworking.org/news/193/Do-we-need-WADLReference: How should REST services be documented? from our JCG partner Jos Dirksen at the Smart Java blog....

Git Tutorial – Getting Started

I was a long time Subversion user then. So, when I got introduced to Git few months back, I was *really* confused. First of all, I couldn’t visualize some of the concepts that Git talked about. But then, as I started using Git in my day-to-day work, it got much easy to use and understand. Now there isn’t a single day (except some of the weekends!) in which I don’ t use a single Git command. It became an inseparable tool at my work. So, just thought of writing my learning as a Git tutorial series to help a bit my fellow developers who want to switch over to Git. Let’ start. Say Hello to Git First of all, if you are coming from a Subversion/CVS background – mark my words – forget everything you learned about version controlling. Because Git has a completely different approach to version controlling. Lets see how it differs from other systems. Git is distributed. Which means, when you clone a Git repo, you’ll get a your own copy of that repo to work with in your local machine. In Git, you get your own code base, you make changes, commit as many times as you want without the fear of polluting the central repo and once you are confident push the code to central repository. Before diving too much further, let’s look at this awesome diagram which explains the Git workflow (thanks to osteele.com, I have a printout of this pasted at my desk).As you can see in the above diagram, in Git, code lives in 4 different placesremote repository - think of it as a Github repository or a remote server hosted in your company. As the name suggests, this code base does not live your local machine and You don’t talk with remote repository often. Only when you pull a code initially and pushing the changes when you are done with it. local repository – when you clone a remote Git repo / or create a new repo, the code base is created here in the local repository. All commits you do will come here first. This lives in your local machine. index - one of the most confusing thing you’ll ever hear in Git. This is something like a intermediate place between your working copy of code and your local repository. It’s like a staging area for your code. You use it to stage which files you want to track & commit. In my next post about My Git workflow you’ll see how index is used. This code base too, lives in your local machine. workspace - it’s your working directory where you create/edit/delete your files. This code resides in your local machine.Hope you get the basic concepts of Git. It is very important to understand it well before you start using Git. In my next post, I’ll be writing about my Git workflow. Till then, feel free to setup your Git environment. Reference: Git Tutorial – Getting Started from our JCG partner Veera Sundar at the Veera Sundar blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: