Featured FREE Whitepapers

What's New Here?

play-framework-logo

WhateverOrigin – Combat the Same Origin Policy with Heroku and Play! Framework

A little while ago, while coding Bitcoin Pie, I found the need to overcome the notorious Same Origin Policy that limits the domains javascript running on a client’s browser can access. Via Stack Overflow I found a site called Any Origin, that’s basically the easiest way to defeat Same Origin Policy without setting up a dedicated server. All was well, until about a week ago, Any Origin stopped working for some (but not all) https requests. It just so happened that in that time I had gained some experience with Play! and Heroku, which enabled me to quickly build an open source clone of Any Origin called Whatever Origin (.org!) (on github). For those unfamiliar with Play! and Heroku, let me give a short introduction: Heroku is one of the leading PaaS providers. PaaS is just a fancy way of saying “Let us manage your servers, scalability, and security … you just focus on writing the appliaction.” Heroku started as a Ruby shop, but they now support a variety of programming languages and platforms including python, java, scala, javascript/Node.Js. What’s extra cool about them is that they offer a huge set of addons ranging from simple stuff like Custom Domains and Logging through scheduling, email, SMS, and up to more powerful addons like Redis, Neo4j and Memcached. Now for the application part, I had recently found Play! Framework. Play is a Java/Scala framework for writing web applications that borrows from the Ruby on Rails / Django ideas of providing you with a complete pre-built solution, letting you focus on writing your actual business logic, while allowing you to customize everything later if needed. I encourage you to watch the 12 minute video on Play!’s homepage, it shows how to achieve powerful capabilities from literally scratch. Play! is natively supported at Heroku, so really all you need to do to get a production app running is:play new Write some business logic (Controllers/Views/whatnot) git init … git commit “heroku apps add” to create a new app (don’t forget to add “–stack cedar” to use the latest generation Cedar stack) “git push heroku master” to upload a new version of your app … it’s automatically built and deployed.Armed with these tools (which really took me only a few days to learn), I set out to build Whatever Origin. Handling JSONP requests is an IO-bound task – your server basically does an HTTP request, and when it completes, it sends the response to your client wrapped in some javascript/JSON magic. Luckily Play!’s support for Async IO is really sweet and simple. Just look at my single get method: public static void get(final String url, final String callback) { F.Promise<WS.HttpResponse> remoteCall = WS.url(url).getAsync(); await(remoteCall, new F.Action<WS.HttpResponse>() { public void invoke(WS.HttpResponse result) { String responseStr = getResponseStr(result, url); // code for getResponseStr() not included in this snippet to hide some ugly irrelevant details // http://blog.altosresearch.com/supporting-the-jsonp-callback-protocol-with-jquery-and-java/ if ( callback != null ) { response.contentType = "application/x-javascript"; responseStr = callback + "(" + responseStr + ")"; } else { response.contentType = "application/json"; } renderJSON(responseStr); } }); }The first line initiates an async fetch of the requested URL, followed by registration to the completion event, and releasing the thread. You could almost think this is Node.Js! What actually took me the longest time to develop and debug was JSONP itself. The information I found about it, and jQuery’s client-side support was a little tricky to find, and I spent a few hours struggling with overly escaped JSON and other fun stuff. After that was done, I simply pushed it to github, registered the whateverorigin.org domain for a measly $7 a year, and replaced anyorigin.com with whateverorigin.org in Bitcoin Pie’s code, and voila – the site was back online. I really like developing websites in 2011 – there are entire industries out there that have set out to make it easy for individuals / small startups to build amazing products. Reference: WhateverOrigin – Combat the Same Origin Policy with Heroku and Play! Framework from our JCG partner Ron Gross at the A Quantum Immortal blog...
java-logo

if – else coding style best practices

The following post is going to be an advanced curly-braces discussion with no right or wrong answer, just more “matter of taste”. It is about whether to put “else” (and other keywords, such as “catch”, “finally”) on a new line or not.             Some may write if (something) { doIt(); } else { dontDoIt(); }I, however, prefer if (something) { doIt(); } else { dontDoIt(); }That looks silly, maybe. But what about comments? Where do they go? This somehow looks wrong to me: // This is the case when something happens and blah // blah blah, and then, etc... if (something) { doIt(); } else { // This happens only 10% of the time, and then you // better think twice about not doing it dontDoIt(); }Isn’t the following much better? // This is the case when something happens and blah // blah blah, and then, etc... if (something) { doIt(); }// This happens only 10% of the time, and then you // better think twice about not doing it else { dontDoIt(); }In the second case, I’m really documenting the “if” and the “else” case separately. I’m not documenting the call to “dontDoIt()”. This can go further: // This is the case when something happens and blah // blah blah, and then, etc... if (something) { doIt(); }// Just in case else if (somethingElse) { doSomethingElse(); }// This happens only 10% of the time, and then you // better think twice about not doing it else { dontDoIt(); }Or with try-catch-finally: // Let's try doing some business try { doIt(); }// IOExceptions don't really occur catch (IOException ignore) {}// SQLExceptions need to be propagated catch (SQLException e) { throw new RuntimeException(e); }// Clean up some resources finally { cleanup(); }It looks tidy, doesn’t it? As opposed to this: // Let's try doing some business try { doIt(); } catch (IOException ignore) { // IOExceptions don't really occur } catch (SQLException e) { // SQLExceptions need to be propagated throw new RuntimeException(e); } finally { // Clean up some resources cleanup(); }I’m curious to hear your thoughts… References:  if – else coding style best practices from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....
spring-logo

REST Pagination in Spring

This is the seventh of a series of articles about setting up a secure RESTful Web Service using Spring 3.1 and Spring Security 3.1 with Java based configuration. This article will focus on the implementation of pagination in a RESTful web service. The REST with Spring series:Part 1 – Bootstrapping a web application with Spring 3.1 and Java based Configuration Part 2 – Building a RESTful Web Service with Spring 3.1 and Java based Configuration Part 3 – Securing a RESTful Web Service with Spring Security 3.1 Part 4 – RESTful Web Service Discoverability Part 5 – REST Service Discoverability with Spring Part 6 – Basic and Digest authentication for a RESTful Service with Spring Security 3.1Page as resource vs Page as representation The first question when designing pagination in the context of a RESTful architecture is whether to consider the page an actual resource or just a representation of resources. Treating the page itself as a resource introduces a host of problems such as no longer being able to uniquely identify resources between calls. This, coupled with the fact that outside the RESTful context, the page cannot be considered a proper entity, but a holder that is constructed when needed makes the choice straightforward: the page is part of the representation. The next question in the pagination design in the context of REST is where to include the paging information:in the URI path: /foo/page/1 the URI query: /foo?page=1Keeping in mind that a page is not a resource, encoding the page information in the URI is no longer an option. Page information in the URI query Encoding paging information in the URI query is the standard way to solve this issue in a RESTful service. This approach does however have one downside – it cuts into the query space for actual queries: /foo?page=1&size=10 The Controller Now, for the implementation – the Spring MVC Controller for pagination is straightforward: @RequestMapping( value = "admin/foo",params = { "page", "size" },method = GET ) @ResponseBody public List< Foo > findPaginated( @RequestParam( "page" ) int page, @RequestParam( "size" ) int size, UriComponentsBuilder uriBuilder, HttpServletResponse response ){ Page< Foo > resultPage = service.findPaginated( page, size ); if( page > resultPage.getTotalPages() ){ throw new ResourceNotFoundException(); } eventPublisher.publishEvent( new PaginatedResultsRetrievedEvent< Foo > ( Foo.class, uriBuilder, response, page, resultPage.getTotalPages(), size ) ); return resultPage.getContent(); }The two query parameters are defined in the request mapping and injected into the controller method via @RequestParam; the HTTP response and the Spring UriComponentsBuilder are injected in the Controller method to be included in the event, as both will be needed to implement discoverability. Discoverability for REST pagination Withing the scope of pagination, satisfying the HATEOAS constraint of REST means enabling the client of the API to discover the next and previous pages based on the current page in the navigation. For this purpose, the Link HTTP header will be used, coupled with the official “next“, “prev“, “first” and “last” link relation types. In REST, Discoverability is a cross cutting concern, applicable not only to specific operations but to types of operations. For example, each time a Resource is created, the URI of that resource should be discoverable by the client. Since this requirement is relevant for the creation of ANY Resource, it should be dealt with separately and decoupled from the main Controller flow. With Spring, this decoupling is achieved with events, as was thoroughly discussed in the previous article focusing on Discoverability of a RESTful service. In the case of pagination, the event – PaginatedResultsRetrievedEvent – was fired in the Controller, and discoverability is achieved in a listener for this event: void addLinkHeaderOnPagedResourceRetrieval( UriComponentsBuilder uriBuilder, HttpServletResponse response, Class clazz, int page, int totalPages, int size ){ String resourceName = clazz.getSimpleName().toString().toLowerCase(); uriBuilder.path( "/admin/" + resourceName ); StringBuilder linkHeader = new StringBuilder(); if( hasNextPage( page, totalPages ) ){ String uriNextPage = constructNextPageUri( uriBuilder, page, size ); linkHeader.append( createLinkHeader( uriForNextPage, REL_NEXT ) ); } if( hasPreviousPage( page ) ){ String uriPrevPage = constructPrevPageUri( uriBuilder, page, size ); appendCommaIfNecessary( linkHeader ); linkHeader.append( createLinkHeader( uriForPrevPage, REL_PREV ) ); } if( hasFirstPage( page ) ){ String uriFirstPage = constructFirstPageUri( uriBuilder, size ); appendCommaIfNecessary( linkHeader ); linkHeader.append( createLinkHeader( uriForFirstPage, REL_FIRST ) ); } if( hasLastPage( page, totalPages ) ){ String uriLastPage = constructLastPageUri( uriBuilder, totalPages, size ); appendCommaIfNecessary( linkHeader ); linkHeader.append( createLinkHeader( uriForLastPage, REL_LAST ) ); } response.addHeader( HttpConstants.LINK_HEADER, linkHeader.toString() ); }In short, the listener logic checks if the navigation allows for a next, previous, first and last pages and, if it does, adds the relevant URIs to the Link HTTP Header. It also makes sure that the link relation type is the correct one – “next”, “prev”, “first” and “last”. This is the single responsibility of the listener (the full code here). Test Driving Pagination Both the main logic of pagination and discoverability should be extensively covered by small, focused integration tests; as in the previous article, the rest-assured library is used to consume the REST service and to verify the results. These are a few example of pagination integration tests; for a full test suite, check out the github project (link at the end of the article): @Test public void whenResourcesAreRetrievedPaged_then200IsReceived(){ Response response = givenAuth().get( paths.getFooURL() + "?page=1&size=10" ); assertThat( response.getStatusCode(), is( 200 ) ); } @Test public void whenPageOfResourcesAreRetrievedOutOfBounds_then404IsReceived(){ Response response = givenAuth().get( paths.getFooURL() + "?page=" + randomNumeric( 5 ) + "&size=10" ); assertThat( response.getStatusCode(), is( 404 ) ); } @Test public void givenResourcesExist_whenFirstPageIsRetrieved_thenPageContainsResources(){ restTemplate.createResource(); Response response = givenAuth().get( paths.getFooURL() + "?page=1&size=10" ); assertFalse( response.body().as( List.class ).isEmpty() ); }Test Driving Pagination Discoverability Testing Discoverability of Pagination is relatively straightforward, although there is a lot of ground to cover. The tests are focused on the position of the current page in navigation and the different URIs that should be discoverable from each position: @Test public void whenFirstPageOfResourcesAreRetrieved_thenSecondPageIsNext(){ Response response = givenAuth().get( paths.getFooURL()+"?page=0&size=10" );String uriToNextPage = extractURIByRel( response.getHeader( LINK ), REL_NEXT ); assertEquals( paths.getFooURL()+"?page=1&size=10", uriToNextPage ); } @Test public void whenFirstPageOfResourcesAreRetrieved_thenNoPreviousPage(){ Response response = givenAuth().get( paths.getFooURL()+"?page=0&size=10" ); String uriToPrevPage = extractURIByRel( response.getHeader( LINK ), REL_PREV ); assertNull( uriToPrevPage ); } @Test public void whenSecondPageOfResourcesAreRetrieved_thenFirstPageIsPrevious(){ Response response = givenAuth().get( paths.getFooURL()+"?page=1&size=10" ); String uriToPrevPage = extractURIByRel( response.getHeader( LINK ), REL_PREV ); assertEquals( paths.getFooURL()+"?page=0&size=10", uriToPrevPage ); } @Test public void whenLastPageOfResourcesIsRetrieved_thenNoNextPageIsDiscoverable(){ Response first = givenAuth().get( paths.getFooURL()+"?page=0&size=10" ); String uriToLastPage = extractURIByRel( first.getHeader( LINK ), REL_LAST ); Response response = givenAuth().get( uriToLastPage ); String uriToNextPage = extractURIByRel( response.getHeader( LINK ), REL_NEXT ); assertNull( uriToNextPage ); }These are just a few examples of integration tests consuming the RESTful service. Getting All Resources On the same topic of pagination and discoverability, the choice must be made if a client is allowed to retrieve all the Resources in the system at once, or if the client MUST ask for them paginated. If the choice is made that the client cannot retrieve all Resources with a single request, and pagination is not optional but required, then several options are available for the response to a get all request. One option is to return a 404 (Not Found) and use the Link header to make the first page discoverable: Link=<http://localhost:8080/rest/api/admin/foo?page=0&size=10>; rel=”first“, <http://localhost:8080/rest/api/admin/foo?page=103&size=10>; rel=”last“ Another option is to return redirect – 303 (See Other) – to the first page of the pagination. A third option is to return a 405 (Method Not Allowed) for the GET request. REST Paginag with Range HTTP headers A relatively different way of doing pagination is to work with the HTTP Range headers – Range, Content-Range, If-Range, Accept-Ranges – and HTTP status codes – 206 (Partial Content), 413 (Request Entity Too Large), 416 (Requested Range Not Satisfiable). One view on this approach is that the HTTP Range extensions were not intended for pagination, and that they should be managed by the Server, not by the Application. Implementing pagination based on the HTTP Range header extensions is nevertheless technically possible, although not nearly as common as the implementation discussed in this article. Conclusion This article covered the implementation of Pagination in a RESTful service with Spring, discussing how to implement and test Discoverability. For a full implementation of pagination, check out the github project. If you read this far, you should follow me on twitter here. Reference: REST Pagination in Spring from our JCG partner Eugen Paraschiv at the baeldung blog...
software-development-2-logo

Public key infrastructure

Some time ago I was asked to create presentation for my colleagues which describes Public Key Infrastructure, its components, functions, how it generally works, etc. To create that presentation, I’ve collected some material on that topic and it would be just dissipation to throw it out. That presentation wasn’t technical at all, and that post is not going to be technical as well. It will give just a concept, high-level picture, which, I believe, can be a good base knowledge before start looking at details. I will start with cryptography itself. Why do we need it? There are at least three reasons for that – Confidentiality, Authentication and Integrity. Confidentiality is the most obvious one. It’s crystal clear that we need cryptography to hide information from others. Authentication confirms that message is send by subject which we can identify and our claims about it are true. And finally, Integrity ensures that message wasn’t modified or corrupted during transfer process.We may try to use Symmetric Cryptography to help us to achieve our aims. It uses just one shared key, which is also called secret. The secret is used for encryption and for decryption of data. Let’s have a look how it can help us to archive our aims. Does it encrypt messages? Yes. Well, Confidentiality is solved, as soon as nobody else, except communicating parties, knows the secret. Does it provide Authentication? Mmm… I would say, no. If there are just two parties in conversation, is seems ok, but if there are hundreds, then should be hundreds secrets, which is hard to manage and distribute. What about Integrity? Yes, it works fine – it’s very hard to modify encrypted message. As you can guess, symmetric cryptography has one big problem – and that problem is “shared secret”. These two words… they don’t even fit one to other. If something is known by more that one person, it is not a secret any more. Moreover, to be shared, that secret somehow has to be transferred and during that process there are too many way for secret to be stolen. This means that such type of cryptography hardly solves our problems. But it is still in use and works quite well for its purposes. It’s very fast and can be used for encryption/decryption of big amounts of data, e.g. you hard drive. Also, as far as it hundreds or even thousands times faster that asymmetric cryptography, it’s used in hybrid schemas (like TLS aka SSL), where asymmetric cryptography is used for just for transferring symmetric key and encryption/decryption is done by symmetric algorithm. Let’s have a look at Asymmetric Cryptography. It was invented very recently about 40 years ago. The first paper (“New Directions in Cryptography”) was published in 1976 by Whitfield Diffie and Martin Hellman. Their work was influenced by Ralph Merkle, who believed to be the one who created the idea of Public Key Cryptography in 1974 (http://www.merkle.com/1974/) and suggested it as project to his mentor – Lance Hoffman, who rejected it. “New Directions in Cryptography” describes algorithm of key exchange known as “Diffie–Hellman key exchange”. Interesting fact that the same key exchange algorithm was invented earlier, in 1974 in Government Communication Headquarters, UK by Malcolm J. Williamson, but that information was classified and fact was disclosed just in 1997.Asymmetric Cryptography uses pair of keys – one Private Key and one Public Key. Private Key has to be kept secret and not shared with anybody. Public Key can be available to public; it doesn’t need to be secret. Information encrypted with public key can be decrypted only with corresponding private key. As far as Private Key is not shared, there is no need to distribute it, and there is reasonably small chance that it will be compromised. So such way of exchanging information can solve Confidentiality problem. What about Authentication and Integrity? These problems are solvable as well and utilise mechanism called Digital Signature. The simplest variant if Digital Signature can use following scenario – subject creates a hash based on message, encrypt that hash with Private Key and attach it to message. Now if recipient wants to verify the subject who created a message, he will encrypt that hash using subject’s public key (that’s Authentication) and compare it with hash generated on recipient side (Integrity). In reality hash is not exactly encrypted, instead it used in special signing algorithm, but the overall concept is the same. It’s important to notice that in Asymmetric Cryptography each pair of keys serves just one purpose, e.g. if pair is used for signing, it can’t be used for encryption. Digital Signature, also, is the base for Digital Certificate AKA Public Key Certificate. Certificate is pretty much the same as your passport. It has identity information, which is similar to name, date of birth, etc. in passport. Owner of certificate has to have Private Key which matches Public Key stored in certificate, similar passport has photo of the owner, which matches owner’s face. And, finally, certificate has a signature, and its meaning is the same, as meaning of stamp in passport. Signature proves that certificate was issued by organization which made that signature. In Public Key Infrastructure world such organizations are called Certificate Authorities. If one system discovers that Certificate is signed by “trusted” Certificate Authority, it means that system will trust to information in certificate. Last paragraph may not be obvious, especially “trust” part of it. What does “trust” mean in that context? Let have a look at simple example. Every site on Web which makes a use of encrypted connection does it via TLS (SSL) protocol, which is based on Certificates. When you go to https://www.amazon.co.uk and it sends its certificate back to your browser. In that certificate there is information about website and reference to Certificate Authority who signed that certificate. First browser will look at the name in certificate – it has to be exactly the same as website domain name, in our case, that’s “www.amazon.co.uk”. Then browser will verify that certificate is signed by Trusted Certificate Authority, which is VeriSing in case of Amazon. You browser already has a list of Certificate Authorities (this is just a list of certificates with public keys) which are known as trusted ones, so it can verify that certificate is issued by one of them. There are some other verification steps, that these two are the most important ones. Assume in our case verification was successful (if it’s not browser will show is big red warning message, like that one) – certificate has proper name in it and was signed by Trusted Certificate Authority. What does it give to us? Just one thing – we know that we are on www.amazon.co.uk and the server behind that name is Amazon server, not some dodgy website, which just looks like Amazon. When we enter our credit card details and we can be relatively sure that they will be sent to Amazon, but not to hacker’s database. Our hope here based on assumption that such Certificate Authorities like VeriSign do not give dodgy certificates and Amazon server is not compromised. Well, better than nothing J Another example are severs in organization, which use certificates to verify that they can trust one to other. The schema there is very similar to browser’s ones, except two differences:Mutual authentication. Certificates are, usually, verified but both sides, not just by client. Client has to send his certificate to server. Certificate Authority, is hosted inside the company.When CA is inside the company we can be almost sure that certificates are going to be issued only to properly validated subjects. It gives some confidence that hacker can’t inject his server, even if he has access to network infrastructure. Attack is possible only if CA is compromised or some server’s Private Key is compromised.We already know, Certificate Authority is the organization which issues certificate and in the Internet, an example of such organization is VeriSing. If certificate is created to be used just inside organization (intranet), it can be issued by Information Security Department which can act as Certificate Authority. When someone wants to have certificate, he has to send certificate request which is called Certificate Signing Request to Certificate Authority. That certificate consists of subject’s identity information, subject’s public key and signature, created by subject’s private key to ensure, that subject who sent request has appropriate private key. Before signing Certificate Authority passes that request to Registration Authority who verifies all details, ensures that proper process is followed, etc. It’s possible that Certificate Authority can also act as Registration Authority. After all, if everything is ok, Certificate Authority creates new certificate signed by its private key and send it back to subject which requested certificate.I’ve already mentioned Certificate validation process. Here are some details of it; worth mentioning theirs details are still high-level. Validation consists of several steps which, broadly speaking can be described as:Certificate data validation – validity date, presence of required fields, their values, etc. Verify that certificate is issued by Trusted Certificate Authority. If you are browsing internet that list if already built-in in your browser. If that’s communication between two systems, each system has a list of trusted Certificate Authorities; usually that is just a file with certificates. Certificate’s signature is valid and made by Certificate Authority who signed that certificate. Verify that certificate is not revoked. Key verification – proves that servers can decode messaged encrypted by certificate’s Public Key.Mentioned above certificate revocation can happen because of many reasons – certificate could be compromised, or, in corporate world, employee, which owned certificate, left company, or sever which had certificate was decommissioned, etc. On order to verify certificate revocation, browser or any other piece of software, has to use one or both of following techniques:Certificate Revocation List (CRL). That’s just a file, which can be hosted on http server. It contains list of revoked certificate IDs. That’s method is simple and straightforward, it doesn’t require lots of efforts for implementation, but has three disadvantages – that’s just a file, which means, that it’s not real-time, it can use significant network traffic and it’s not checked by default by the most of the browsers (I would even say by all browsers), even if certificate has a link to CRL. Online Certificate Status Protocol (OCSP). That is preferable solution, which utilizes dedicated server, which implements protocol which will return back revocation status of certificate by its id. If browser (at least FireFox > v.3.0) will find link to that server in certificate, it will make a call to verify that certificate is not revoked. Only disadvantage is that OCSP server has to be very reliable and be able to answer on requests all the time.In internet certificate usually contains links to CRL or OCSP inside it. When certificates are used in corporate network these links are usually known by all parties and there is no need to have them in certificate. So, finally, what is Public Key Infrastructure? That’s infrastructure, which supports everything which was described above and generally consists of following elements:Subscribers. Users of certificates. Clients and ones who owns certificates. Certificates. Certificate Authority and Registration Authority. Certificate Revocation Infrastructure. Server with Certificate Revocation list or OCSP Server. Certificate Policy and Practices documents. Describe format of certificate, format of certificate request, when certificated have to be revoked, etc. Basically all procedures related to infrastructure. Hardware Security Modules, which are usually used to protect Root CA’s private key.And that entire infrastructure support following functions, which we’ve just discussed:Public Key Cryptography. Certificate issuance. Certificate validation. Certificate revocation.And that’s it. Appeared to be not such a big topic ;) References:  Public key infrastructure from our JCG partner Stanislav Kobylansky at the Stas’s blog ....
mongodb-logo

Streaming Files from MongoDB GridFS

Not too long ago I tweeted what I felt was a small triumph on my latest project, streaming files from MongoDB GridFS for downloads (rather than pulling the whole file into memory and then serving it up). I promised to blog about this but unfortunately my specific usage was a little coupled to the domain on my project so I couldn’t just show it off as is. So I’ve put together an example node.js+GridFS application and shared it on github and will use this post to explain how I accomplished it. GridFS module First off, special props go to tjholowaychuk who responded in the #node.js irc channel when I asked if anyone has had luck with using GridFS from mongoose. A lot of my resulting code is derived from an gist he shared with me. Anyway, to the code. I’ll describe how I’m using gridfs and after setting the ground work illustrate how simple it is to stream files from GridFS. I created a gridfs module that basically accesses GridStore through mongoose (which I use throughout my application) that can also share the db connection created when connecting mongoose to the mongodb server. mongoose = require "mongoose" request = require "request"GridStore = mongoose.mongo.GridStore Grid = mongoose.mongo.Grid ObjectID = mongoose.mongo.BSONPure.ObjectIDWe can’t get files from mongodb if we cannot put anything into it, so let’s create a putFile operation. exports.putFile = (path, name, options..., fn) -> db = mongoose.connection.db options = parse(options) options.metadata.filename = name new GridStore(db, name, "w", options).open (err, file) -> return fn(err) if err file.writeFile path, fnparse = (options) -> opts = {} if options.length > 0 opts = options[0] if !opts.metadata opts.metadata = {} optsThis really just delegates to the putFile operation that exists in GridStore as part of the mongodb module. I also have a little logic in place to parse options, providing defaults if none were provided. One interesting feature to note is that I store the filename in the metadata because at the time I ran into a funny issue where files retrieved from gridFS had the id as the filename (even though a look in mongo reveals that the filename is in fact in the database). Now the get operation. The original implementation of this simply passed the contents as a buffer to the provided callback by calling store.readBuffer(), but this is now changed to pass the resulting store object to the callback. The value in this is that the caller can use the store object to access metadata, contentType, and other details. The user can also determine how they want to read the file (either into memory or using a ReadableStream). exports.get = (id, fn) -> db = mongoose.connection.db id = new ObjectID(id) store = new GridStore(db, id, "r", root: "fs" ) store.open (err, store) -> return fn(err) if err # band-aid if "#{store.filename}" == "#{store.fileId}" and store.metadata and store.metadata.filename store.filename = store.metadata.filename fn null, storeThis code just has a small blight in that it checks to see if the filename and fileId are equal. If they are, it then checks to see if metadata.filename is set and sets store.filename to the value found there. I’ve tabled the issue to investigate further later. The Model In my specific instance, I wanted to attach files to a model. In this example, let’s pretend that we have an Application for something (job, a loan application, etc) that we can attach any number of files to. Think of tax receipts, a completed application, other scanned documents. ApplicationSchema = new mongoose.Schema( name: String files: [ mongoose.Schema.Mixed ] ) ApplicationSchema.methods.addFile = (file, options, fn) -> gridfs.putFile file.path, file.filename, options, (err, result) => @files.push result @save fnHere I define files as an array of Mixed object types (meaning they can be anything) and a method addFile which basically takes an object that at least contains a path and filename attribute. It uses this to save the file to gridfs and stores the resulting gridstore file object in the files array (this contains stuff like an id, uploadDate, contentType, name, size, etc). Handling Requests This all plugs in to the request handler to handle form submissions to /new. All this entails is creating an Application model instance, adding the uploaded file from the request (in this case we named the file field “file”, hence req.files.file) and saving it. app.post "/new", (req, res) -> application = new Application() application.name = req.body.name opts = content_type: req.files.file.type application.addFile req.files.file, opts, (err, result) -> res.redirect "/"Now the sum of all this work allows us to reap the rewards by making it super simple to download a requested file from gridFS. app.get "/file/:id", (req, res) -> gridfs.get req.params.id, (err, file) -> res.header "Content-Type", file.type res.header "Content-Disposition", "attachment; filename=#{file.filename}" file.stream(true).pipe(res)Here we simply look up a file by id and use the resulting file object to set Content-Type and Content-Disposition fields and finally make use of ReadableStream::pipe to write the file out to the response object (which is an instance of WritableStream). This is the piece of magic that streams data from MongoDB to the client side. Ideas This is just a humble beginning. Other ideas include completely encapsulating gridfs within the model. Taking things further we could even turn the gridfs model into a mongoose plugin to allow completely blackboxed usage of gridfs. Feel free to check the project out and let me know if you have ideas to take it even further. Fork away! Reference: Streaming Files from MongoDB GridFS from our JCG partner James Carr at the Rants and Musings of an Agile Developer blog...
arquillian-logo

Arquillian with NetBeans, GlassFish embedded, JPA and a MySQL Datasource

This is an, let’s call it accidental post. I was looking into transactional CDI observers and playing around with GlassFish embedded to run some integration tests against it. But surprisingly this did not work too well and I am still figuring out, where exactly the problems are while using the plain embedded GlassFish for that. In the meantime I switched to Arquillian. After I looked at the Arquillian 1.0.0.Alpha4 a bit more detailed last year, it was time to see, what the guys prepared for 1.0.0 final. Some stuff changed a bit and I was surprised to only find some basic examples on the net. That was the main reason I started writing this post. To have a more complete example of all the three technologies working together. Getting Started First of all, get yourself a fresh copy of latest NetBeans 7.1. It’s out since a few weeks and it really looks good. If you like, you can download a copy of the Java EE version, which comes with latest GlassFish OSS Edition 3.1.1 pre bundled. Install it and fire it up. Also grep a copy of MySQL Community Edition and install it. Back in NetBeans, setup a new Maven Webapplication via the new Project Wizard. Let’s call it “simpleweb” and we tell it to run on GF 3.1.1 (if you haven’t done so, you should create a Services>Server>GlassFish instance before or do it during the project setup). Implement your Application Even if this isn’t test-driven, I like to use this approach because I still believe, that this is the most common case you find out there in your projects. You have a lot of stuff ready to run and you are looking for some kind of automated integration tests. So, let’s assume you have a very basic entity, we call it “com.mycompany.simpleweb.entities.AuditLog”. You can create it with the NetBeans new Entity wizard. The third step during the Entity creation is the provider and database setup. Define a new Datasource and name it “jdbc/auditlog”. As a connection specification use MySQL J driver and I assume you have a database up and running (let’s assume this is called auditlog). Test the connection and finish the wizard.If you are using the wizard, you get some free gifts. Beside the fact, that you now have your AuditLog entity in the source tree, you also find a META-INF/persistence.xml file in your src/main/resources and a glassfish-resources.xml in src/main/setup. This is needed later on, keep it in mind. Add some additional properties to your entity. For now I add “String account”. And don’t forget to define a Version field “Timestamp timestamp”. And it’s also nice to have a little named query to get a list of all AuditLogs @NamedQuery(name = "findAllAuditLogs", query = "SELECT OBJECT (e) FROM AuditLog e") @Version private Timestamp timestamp;If you are using the wizard, make sure to check your pom.xml. The wizard is adding some eclipselink dependencies in scope provided, so this shouldn’t make a big difference here. Next is to add a com.mycompany.simpleweb.service.AuditRepositoryService EJB. This should be responsible for all CRUD operations on the AuditLog entity. Add some code to it to insert an AuditLog: @PersistenceContext private EntityManager em; public void writeLog(String account) { AuditLog log = new AuditLog(); log.setAccount(account); em.persist(log); }And some more code to find out the total number of entries in your table: public int findAll() { TypedQuery<AuditLog> query = em.createNamedQuery("AuditLog.findAllAuditLogs", AuditLog.class); return query.getResultList().size(); } That’s all for now. Adding Basic Test Dependencies Next we are going to add some very basic test dependencies. Open your projects pom.xml file and add the following sections for Arquillian to your project: <repository> <id>JBoss</id> <name>JBoss Repository</name> <url>https://repository.jboss.org/nexus/content/groups/public/</url> </repository> <dependencymanagement> <dependencies> <dependency> <groupid>org.jboss.arquillian</groupid> <artifactid>arquillian-bom</artifactid> <version>1.0.0.Final-SNAPSHOT</version> <scope>import</scope> <type>pom</type> </dependency> </dependencies> </dependencymanagement> <dependencies> <dependency> <groupid>org.jboss.arquillian.container</groupid> <artifactid>arquillian-glassfish-embedded-3.1</artifactid> <version>1.0.0.Final-SNAPSHOT</version> <scope>test</scope> </dependency> <dependency> <groupid>org.jboss.arquillian.junit</groupid> <artifactid>arquillian-junit-container</artifactid> <scope>test</scope> </dependency> </dependencies>Beside that, you also need the embedded Glassfish dependency. <dependency> <groupid>org.glassfish.extras</groupid> <artifactid>glassfish-embedded-all</artifactid> <version>3.1</version> <scope>test</scope> </dependency>We also need the MySQL J driver: <dependency> <groupid>mysql</groupid> <artifactid>mysql-connector-java</artifactid> <version>5.1.18</version> <scope>test</scope> </dependency>Configuring Arquillian After we have all the needed dependencies in place, we need to further configure Arquillian. This is done via the arquillian.xml which has to be placed in the src/test/resources folder (you might need to create it outside NetBeans before) and should look like this: <arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://jboss.org/schema/arquillian http://jboss.org/schema/arquillian/arquillian-1.0.xsd"> <engine> <property name="deploymentExportPath">target/arquillian</property> </engine> <container default="true" qualifier="glassfish"> <configuration> <property name="sunResourcesXml">src/main/setup/glassfish-resources.xml</property> </configuration> </container> </arquillian>The engine parameter tells Arquillian to place a packaged version of your test archive to a target/arquillian folder. This is quite useful for hunting down problems. The container qualifier points the testrunner to the glassfish-resources.xml which was created by the entity creation wizard. All done. One single thing I would like to suggest is to make a copy of your persistence.xml and place it to the test/resources folder renamed to something like test-persistence.xml. I consider it a best practice to have an option to configure your test JPA a bit different than the productive one. To make a simple example, we would like to see some more logging output during the tests, so the copied version should additionally contain the needed parameters. I also like to change the table generation strategy for testing to drop-and-create-tables: <property name="eclipselink.ddl-generation" value="drop-and-create-tables" /> <property name="eclipselink.logging.level.sql" value="FINEST" /> <property name="eclipselink.logging.parameters" value="true" />Let’s take a look at the the tests. Add a Testcase Lets add a test. This is easy with NetBeans: Right click on your EJB and select “Tools>Create JUnit Tests”. Select JUnit 4.x and accept the name proposal “com.mycompany.simpleweb.service.AuditRepositoryServiceTest”. Now your project has a new “Test Packages” folder. As you can see, the test is in error. NetBeans assumes, that you want to do a test based on embedded EJBContainer. Nice guess, but we would like to add some Arquillian here. Remove the EJBContainer import and strip the class down to this: @RunWith(Arquillian.class) public class AuditRepositoryServiceTest { }Now it’s time to define the deployment archive for the test using ShrinkWrap. The deployment archive for the test is defined using a static method annotated with Arquillian’s @Deployment annotation. @Deployment public static JavaArchive createTestArchive() { return ShrinkWrap.create(JavaArchive.class, "test.jar").addPackage(AuditLog.class.getPackage()).addPackage(AuditRepositoryService.class.getPackage()).addAsManifestResource( new ByteArrayAsset("<beans>".getBytes()), ArchivePaths.create("beans.xml")).addAsManifestResource( "test-persistence.xml", ArchivePaths.create("persistence.xml")); } After the packages which should be contained are added, an empty beans.xml (which should be enough for now) is added and the test-persistence.xml is added as a manifest resource named persistence.xml. Great. One last thing is to define the test itself. @EJB AuditRepositoryService repository; @Test public void insertLog() throws Exception { repository.writeLog("Markus"); int numberOfLogs = repository.findAll(); Assert.assertEquals(1, numberOfLogs); }We are inserting a simple test entity and are getting the count back from the database which is checked via assertEquals. That’s all. Fire up your tests. (Right click on the AuditRepositoryServiceTest class and select “Test File (Strg+F6). Examining what’s happening The output window shows the Std.out of a starting GlassFish. If you examine the output further you see, that the JDBC connection pool and the JDBC resource are created: INFO: command add-resources result: PlainTextActionReporterSUCCESSDescription: add-resources AdminCommandnull JDBC connection pool mysql_auditlog_rootPool created successfully. JDBC resource jdbc/auditlog created successfully. and the “test” application was deployed: INFO: WEB0671: Loading application [test] at [/test] 17.01.2012 10:12:39 org.glassfish.deployment.admin.DeployCommand execute INFO: test was successfully deployed in 6.461 milliseconds. Scanning through the output does point you to some EclipseLink stuff, but the additional sql logging doesn’t seem to be in effect. This is because EclipseLink needs to know to which logger to point the output to. Normally the log output is redirected to the server logger which is auto-discovered. We didn’t do any logging configuration until now and simply rely on what is default for Java Logging. So, let’s add a little logging configuration. Put an empty logging.properties file to src/test/resources and add some simple lines to it: handlers=java.util.logging.ConsoleHandler java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.ConsoleHandler.level=FINEST add the maven sure-fire plugin to the build section of your pom.xml: <plugin> <groupid>org.apache.maven.plugins</groupid> <artifactid>maven-surefire-plugin</artifactid> <version>2.11</version> <configuration> <includes> <include>com/mycompany/simpleweb/service/*</include> </includes> <systempropertyvariables> <java.util.logging.config.file>${basedir}/src/test/resources/logging.properties</java.util.logging.config.file> </systempropertyvariables> </configuration> </plugin>If you now issue a clean and build, you see, that the desired log output is shown in the NetBeans build output. FEIN: INSERT INTO AUDITLOG (ID, ACCOUNT, TIMESTAMP) VALUES (?, ?, ?) bind => [1, Markus, 2012-01-17 11:02:54.0] In fact, if you use Strg+F6 you still only see INFO level messages. To fix this, you need to change your NetBeans run action settings. Right click your project and select “Properties”. Select “Actions” and “Test file”. Add the following as a new line within “Set Properties” area: java.util.logging.config.file=src/test/resources/logging.propertiesNow you also see the desired log-level output with a single test-run. That was it. You can download the complete sample maven project (as ZIP-File) with the mentioned classes and play around a bit. Have fun! Even if the above post was intended to be a more complex demo it turns out, that I missed a big issue with the setup. Everything works fine up to the point when you start introducing enhanced features to your entities. To name but a few: lazy loading, change tracking, fetch groups and so on. JPA providers like to call this “enhancing” and it is most often referred to as “weaving”. Weaving is a technique of manipulating the byte-code of compiled Java classes. The EclipseLink JPA persistence provider uses weaving to enhance JPA entities for the mentioned things and to do internal optimizations. Weaving can be performed either dynamically at runtime, when Entities are loaded, or statically at compile time by post-processing the Entity .class files. Dynamic weaving is mostly recommended as it is easy to configure and does not require any changes to a project’s build process. You may have seen some finer log output from EclipseLink like the following: [...]–Begin weaver class transformer processing class [com/mycompany/simpleweb/entities/AuditLog]. [...]–Weaved persistence (PersistenceEntity) [com/mycompany/simpleweb/entities/AuditLog]. [...]–Weaved change tracking (ChangeTracker) [com/mycompany/simpleweb/entities/AuditLog]. [...]–Weaved lazy (ValueHolder indirection) [com/mycompany/simpleweb/entities/AuditLog]. [...]–Weaved fetch groups (FetchGroupTracker) [com/mycompany/simpleweb/entities/AuditLog]. [...]–End weaver class transformer processing class [com/mycompany/simpleweb/entities/AuditLog]. The problem with Arquillian and Embedded GlassFish Imagine you take the example from yesterday’s blog post and change the simple String account property to something like this: @ManyToOne(cascade = CascadeType.PERSIST, fetch = FetchType.LAZY) private Person person;That’s exactly one of the mentioned cases where your JPA provider would need to do some enhancements to your class files before executing. Without modifying the project it would lead to some very nasty exceptions: Exception Description: A NullPointerException would have occurred accessing a non-existent weaved _vh_ method [_persistence_get_person_vh]. The class was not weaved properly – for EE deployments, check the module order in the application.xml deployment descriptor and verify that the module containing the persistence unit is ahead of any other module that uses it. [...] Internal Exception: java.lang.NoSuchMethodException: com.mycompany.simpleweb.entities.AuditLog._persistence_get_person_vh() Mapping: org.eclipse.persistence.mappings.ManyToOneMapping[person] Descriptor: RelationalDescriptor(com.mycompany.simpleweb.entities.AuditLog –> [DatabaseTable(AUDITLOG)]) at org.eclipse.persistence.exceptions.DescriptorException.noSuchMethodWhileInitializingAttributesInMethodAccessor(DescriptorException.java:1170) at org.eclipse.persistence.internal.descriptors.MethodAttributeAccessor.initializeAttributes(MethodAttributeAccessor.java:200) [...] indicating that something is missing. And that missing method is introduced by the weaving process. If you decompile a weaved entity you can see what the JPA provider is complaining about. This is how your enhanced entity class should look like. And this is only one of the enhanced methods a weaving process is introducing into your code. public WeavedAttributeValueHolderInterface _persistence_get_person_vh() { _persistence_initialize_person_vh(); if(_persistence_person_vh.isCoordinatedWithProperty() || _persistence_person_vh.isNewlyWeavedValueHolder()) { Person person1 = _persistence_get_person(); if(person1 != _persistence_person_vh.getValue()) _persistence_set_person(person1); } return _persistence_person_vh; }Dynamic vs. Static Weaving Obviously the default dynamic weaving doesn’t work with the described setup. Why? Weaving is a spoiled child. It only works when the entity classes to be weaved do exist only in the application classloader. The combination of embedded GlassFish, Arquillian and the maven sure-fire-plugin mix this up a bit and the end of the story is, that exactly none of your entities are enhanced at all. Compare this nice discussion for a more detailed explanation. If dynamic waving doesn’t work, we have to use the fallback called static weaving. Static means: post processing the entities during the build. Having the maven project at hand, this sounds like a fairly easy job. Let’s look for something like this. The first thing you probably find is the StaticWeaveAntTask. The second thing may be Craig’s eclipselink-staticweave-maven-plugin. Let’s start with the StaticWeaveAntTask. You would have to use maven-antrunner-plugin to get this introduced. Copy classes from left to right and do an amazing lot of wrangling to get your classpath rigth. Laird Nelson did a great job to archetype-ize an example configuration for all 3 big JPA providers (EclipseLink, OpenJPA, Hibernate) and your could give this a try. A detailed explanation about what is happening can be found on his blog. Thanks Laird for the pointers! Don’t get me wrong: This is a valid approach, but I simply don’t like it. Mainly because it introduces a massive complexity to the build and having seen far too many projects without the right skills for managing even normal maven projects, this simply isn’t a solution for me. I tried the static weaving plugin done by Craig Day. Adding static weaving to simpleweb So, let’s open the pom.xml from yesterdays project and introduce the new plugin: <plugin> <artifactId>eclipselink-staticweave-maven-plugin</artifactId> <groupId>au.com.alderaan</groupId> <version>1.0.1</version> <executions> <execution> <goals> <goal>weave</goal> </goals> <phase>process-classes</phase> </execution> </executions> </plugin>Done. Now your classes are weaved and if you introduce some logging via the plugin configuration you can actually see, what happens to your entity classes. The plugin is available via repo1.maven.org. The only issue I came across is, that the introduced dependency towards EclipseLink 2.2.0 isn’t (or course) not available via the same repo, so you probably would need to build it for yourself with the right repositories and dependencies. You can get the source code via the plugin’s google code page. Don’t forget to add the weaving property to your test-persistance.xml: <property name="eclipselink.weaving" value="static" />[UPDATE: 19.01.2012] Craig released a new 1.0.2 version of the plugin which solves the issues with the EclipseLink dependency. You now can simply include the needed EclipseLink version as a dependency to the plugin. Also needed is the correct EclipseLink maven repository. A complete example with a configured log level looks like this: <repository> <id>eclipselink</id> <name>Repository hosting the eclipselink artifacts</name> <url>http://www.eclipse.org/downloads/download.php?r=1&nf=1&file=/rt/eclipselink/maven.repo/</url> </repository> [...] <plugin> <artifactId>eclipselink-staticweave-maven-plugin</artifactId> <groupId>au.com.alderaan</groupId> <version>1.0.2</version> <executions> <execution> <goals> <goal>weave</goal> </goals> <phase>process-classes</phase> <configuration> <logLevel>ALL</logLevel> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.eclipse.persistence</groupId> <artifactId>eclipselink</artifactId> <version>2.3.1</version> </dependency> </dependencies> </plugin>References:  Arquillian with NetBeans, GlassFish embedded, JPA and a MySQL Datasource & Arquillian with NetBeans, GlassFish embedded, JPA and a MySQL Datasource – Part II from our JCG partner Markus Eisele at the Enterprise Software Development with Java blog....
software-development-2-logo

Tips for Testing Database Code

Almost everybody understands that source code belongs into version control. Many people understand we need to test our code. Quite a few do that automatically. But everything seems to change when it comes to databases. A lot of stuff in and around databases goes untested. Heck some of the scripts don’t even live in version control. If you can’t believe that, because you haven’t seen it, you are a damn lucky bastard. At least one reason for this state of affairs is obviously: It hurts. Database tests tend to be slow, interdependent and hard to maintain. Granted. But you know what: If it hurts, you should do it more often. It will teach you what exactly the things are that cause pain when testing and eventually you’ll find approaches to make it less painful. Here are some ideas I found helpful when testing database related code:Give every developer her own database. This forces you to find a way to set up the database fast, easy and reliable. If your application lives in a single user/schema/namespace it is sufficient for each developer to have his own user/schema/namespace in a single database. For this to work though … … the application should be user/schema/namespace agnostic. It makes it much easier to create multiple instances one a single server. Let the application live in a single user/schema/namespace. If you have multiple interdependent namespaces (e.g. for modules) you’ll have a hard time making them agnostic of the names. Have separate instances for CI, Demos, QA and so on. Actually ideally it should be trivial to create a fresh instance. Stay away from any tool that comes with its own repository. If have seen about a dozen of such tools and although some looked promising in the beginning, they all completely failed to integrate with other tools on the development side of things. Examples of such tools are tools for code generation from UML or ER models and ETL tools.The previous points where about the setup of your database and infrastructure. Lets have a look at the code:Favor a proper Language (like Java, C, PHP …) over database specific languages like T-SQL or PL/SQL. If you are wondering why, make a comparison between your favorite DB language and your all purpose language. For which do you get the better IDE, Unit testing support, code analysis, code coverage and so on. Reconsider your main language if it doesn’t win in that comparison. Have a small layer that does all interaction with the database. Make sure no SQL or other database dependent code leaks out. Inject that layer as a dependency into anybody in need of it. This will allow you to test almost everything without worrying about the database. Only the tests for that integration layer actually needs a database for testing. Favor database independent SQL or a library abstracting away the differences of various databases. Back in the time people claimed they needed that in case they have to switch database vendors, which never happened. Now it does. See below.The next points will deal with the actual tests:Consider an in-memory-database for testing. Many databases can run in an in-memory-mode. They are great for testing, because you can throw them away after the test, and they are way faster then any database writing to disk. This of course is only possible when you work with a database system that can run as a in-memory-database or if your code is database independent. Hence the previous point. If you can’t use your database as in memory database, consider putting it on a RAM disk. We got a huge performance gain for our tests with this approach. A solid state disk might be the next best thing although I’m not sure how it would react to the heavy load of continuous database tests. Make test failure messages so explicit that you don’t have to look into the database for analyzing test failures. Use code for setting up your test data. Make it nice and concise. If you need a row in a table without special requirements for its values you should be able to create that with a single trivial line of code, no matter how many foreign keys the table contains. I.e. you should have a little DSL for your test data. Doing it with plain code will enable all the refactoring power of your IDE for your tests. For load and performance tests other approaches like loading production data or large amounts of generated data might be suitable. Make sure your tests clean up after them selves. There are two approaches I found usable in most cases:Recreate the schema for every test. This is slow but really safe. Do a rollback after each test. This only works when there is no commit inside the test. The mean thing is: If a test tries to rollback, but fails because there was a commit inside the test some completely different test will fail. It can be a really frustrating task to find the bad test in such a case.We covered the testing of database related code inside your application. But there is another kind of code we need to deal with: Scripts for deploying your application (or upgrading it)The scripts that change your database schema from one version to the next are source code just like everything else. Therefore they belong under version control and should get tested continuously. The testing process is really simple: Create a database as it looks now. Apply your change scripts and verify it looks as desired. For verifying the resulting schema it is useful to have a script that creates your target database (or at least the schema) from scratch, so you compare it with the result of the test. For performance reasons you might want to test this with an empty database first. But don’t forget to run it on an instance with realistic data as well. Hint: adding a column to huge tables can take loooooong.Happy testing everybody. Reference: Tips for Testing Database Code from our JCG partner Jens Schauder at the Schauderhaft blog...
javafx-logo

Book Review: JavaFX 2.0: Introduction by Example

Although Oracle’s changes to JavaFX at JavaOne 2010 and JavaOne 2011 have converted me from a skeptic to a believer when it comes to JavaFX, the shift in JavaFX vision has not been without its downsides. In particular, the JavaFX book market has been tricky because nearly all available JavaFX books have been about version 1.x. In this post, I review the only book that I’m aware of at time of this writing that is completely focused on JavaFX 2.0: Carl Dea‘s JavaFX 2.0: Introduction by Example. I’ll start my review by stating the most important observation about JavaFX 2.0: Introduction by Example: it provided me exactly what I was looking for exactly when I needed it. There are some attributes of the book that might be considered negative by some readers that I felt were positives in my use of the book. I’ll attempt in this post to articulate the finer points of these attributes so that perspective readers can make up their own minds. JavaFX 2.0 Introduction by Example does exactly what the title implies: it introduces JavaFX 2.0 via numerous and varied examples. This code-heavy book is roughly similar to a “recipes” or “cookbook” with individual item covered (AKA a “recipe” in a recipes or cookbook) featuring subsections on the problem to be solved, the solution or solutions to that problem, and how those solutions work. Like the best recipes-oriented or cookbook-oriented software development books, this one is constructed so that Chapter 1 (“JavaFX Fundamentals”) covers some of the basics of JavaFX early on. In other words, the reader is not dropped into JavaFX without first getting some examples of how to write and deploy basic “Hello World” style JavaFX applications. Although JavaFX 2.0: Introduction by Example does provide introductory examples early on, I really appreciated the author not spending significant time discussing esoteric features of the language or delving into the history of JavaFX in tedious detail or providing pages worth of explanation on why JavaFX is the greatest thing since sliced bread. I’m usually in a hurry and I have come to resent books wasting my time on such things and this book doesn’t do that. In this case, I was already familiar with these aspects of JavaFX (at least its history and why I might be interested in learning more about it), so I was especially appreciative of Dea not wasting paper and my time on that subject. In the book’s concise “Introduction,” Dea covers in a page and a half some advantages of JavaFX and “some history” of JavaFX along with a simple table articulating features of each release of JavaFX. It’s a thing of beauty to be able to read all of this in less than two pages and in the introduction! Dea covers some more background on JavaFX in the first chapter, but again limits that discussion to a single page. This page is more detailed than the introductory section and is a nice, brief segue into the technical meat of the book. This first page also contains the sentences that I think best sum up the value of this book: Although this book doesn’t go through an exhaustive study of all of JavaFX 2.0’s capabilities, you will find common use cases that can help you build richer applications. Hopefully, these recipes can lead you in the right direction by providing practical and real-world examples. This is exactly what JavaFX 2.0: Introduction by Example has done for me. It has provided me with a fast start into the world of JavaFX. Although I have since used several aspects of JavaFX not covered in this book, the book gave me the start that I needed and I was able to use JavaFX documentation for the areas not covered in this book. JavaFX 2.0: Introduction by Example gets to the point quickly. Besides common things like the very brief introduction and the index, the book contains four chapters (32 “recipes”) spanning 174 pages of text, images, and code. Dea doesn’t even waste time with a conclusion, but ends the book with “recipe” 4.5 (“Displaying Content from the Database”). Although some readers may need a conclusion to bring closure to their reading experience, I usually little value in this for me as a reader and I didn’t miss it here. I typically don’t read these types of book cover-to-cover anyway (instead focusing on sections or recipes I am most interested in), so the conclusion is often unnecessary. Lack of a conclusion is another example of how Dea’s book focuses most on what I want: the technical meat. The four chapters in JavaFX 2.0: Introduction by Example are “JavaFX Fundamentals,” “Graphics with JavaFX,” “Media with JavaFX,” and “JavaFX on the Web.” The first chapter was most useful for quickly immersing myself into the basics of JavaFX and how to apply it. The examples in that chapter tend to be simple and easy to follow. The examples in the other three chapters tend to be more sophisticated because the functionality being covered tends to be more sophisticated. There are numerous lengthy code listings in the book. Although code listings may not be the easiest to read, I like to see actual code in any book on a language. Dea typically follows each code sample with descriptive text on any new feature shown in the code sample that had not been covered previously in the book. The code samples can be downloaded from Apress’s site. The book also features numerous screen snapshots, which I consider a must for a book focused on user interfaces. The concise and introductory approach of JavaFX 2.0: Introduction by Example that appealed to me may not appeal to everyone. This book, as advertised in the above cited quotation from the first chapter, is intended to be introductory (hence the title) and not to be exhaustive. Some topics that I have not seen covered in this book include subjects such as FXML, the JavaFX charting functionality, GroovyFX, and ScalaFX. Deployment is only lightly covered (and mostly via NetBeans), but Dea does reference Deploying JavaFX Applications for more details on JavaFX deployment. All of these areas, however, are fairly approachable given the JavaFX basics provided in this book. Dea recommends that readers reference forthcoming (mid-February 2012, Apress) Pro JavaFX 2.0 Platform for an “invaluable resource” that provides “further insight into JavaFX.” Although a small number of the items/recipes covered in JavaFX 2.0: Introduction by Example are based on and assume use of NetBeans, most examples in no way specific to any tool or IDE. Rather, most examples provide “raw” code that can be used in any IDE or favorite text editor. Indeed, many of the examples can be compiled with the javac compiler and executed with the java application launcher. I appreciated that in at least one NetBeans-oriented recipe, Dea took the page or two necessary to provide the code listing for code generated by NetBeans. This is invaluable for those who do not use NetBeans or who want to understand the code itself rather than simply understanding how to use NetBeans to generate the code. JavaFX 2.0: Introduction by Example was exactly what I needed for an efficient and effective start to my investigation of JavaFX. It may not provide quite the level of soft introduction someone completely unfamiliar with JavaFX might want (especially if that person’s basic Java skills are a little rusty) and it isn’t an “exhaustive” JavaFX 2.0 reference either. It falls in between these extremes and seems to be a fast start example-based introduction to JavaFX for those who want to get into the core of JavaFX as quickly as possible. That’s what I wanted when I purchased this book and I was happy to find out that’s exactly what this book provides. It’s completely JavaFX 2.x-oriented and there is no sign of deprecated JavaFX Script in any of the code examples. References:  Book Review: JavaFX 2.0: Introduction by Example  from our JCG partner Dustin Marx at the Inspired by Actual Events  blog....
jetbrains-intellijidea-logo

Intellij vs. Eclipse

Choosing the right IDE can make you or break you as a coder. Most developers would be lost without the comfort of their preferred IDE, which takes care of classpath, make files, command line arguments, etc. The problematic dependence on the IDE, while very beneficial, is off topic and a discussion for another post. We concentrate on 2 main platforms, Eclipse and Intellij Community Edition, comparing them, mainly in the Java SE context. Disclosure: Nadav uses Intellij on a regular basis, and Roi uses Eclipse.Walking through history lane, Eclipse is around since 2001, while the real major release was Eclipse 3.0 in 2004. It began as an IBM project, but current members of the Eclipse Foundation range from Oracle to Google. Current release is Eclipse Indigo 3.7, and it is licensed under the Eclipse Public License. Intellij is part of the JetBrains, which was founded in 2000 as a private company. Intellij for Java was first released in 2001, and the Community Edition supports Java, Groovy and Scala, and its free and open source under the Apache 2.0 license.We use Java as our main development language. Each developer chooses its own IDE. War between the IDE’s is waging around us, starting from our school days and University, and extends to our current workplace. While each side is certain in his righteousness, we believe there is no right or wrong answer, but rather choosing the right platform for your needs and challenges, taking into account the kind of programmer you are. We would like to share our own experience on when to use each. So here we go:Plugins: Eclipse marketplace offers 1,276 plugins, and the Intellij Plugin Repository offers 727 plugins. This difference is not to be taken lightly, since plugins for new technologies will usually be developed mainly for Eclipse (e.g. Android, Drools, Activiti, etc). Moreover, Eclipse is easier to extend. When working on a specific technology most chances are that if a plugin exists, it will be an Eclipse plugin.Multiple projects: This is an Eclipse winner for sure. It has the ability to open multiple projects in the same window, giving the coder control over dependencies and relations. Intellij has an option to open one project with multiple modules, but we found it to be cumbersome, and in times a little buggy. If you are going to use a lot of projects together and hate to switch windows, Eclipse is your choice.Multiple languages: We have stated that we will only examine the Intellij Community Edition that supports Java, Groovy and Scala. However, if you plan to create a Python server, combined with Ajax & Html, joint with a java web server, or any other exotic language combinations, than Eclipse is your choice.Code completion & inspection: While Eclipse has the ability to add plugins such as checkstyle, this one definitely goes for Intellij. The default code completion and assistance in Intellij is faster and better. If you are a rookie developer, Intellij can improve your code.Usability: Intellij user experience is much easier to grasp. The learning curve in Intellij is by far faster. It seems using Intellij makes developing easier and more natural. Dropdowns, code completion, quick view, project wizards, etc, are all possible both in Eclipse and Intellij, but the experience in Intellij is much more satisfying.Performance: The more plugins are installed on the IDE, the more heavy it is for your computer. However, saying that, Eclipse handles very large projects faster. Moreover, both of the IDE’s seems to be RAM junkies. Projects usually open faster in Eclipse, as Intellij indexes the entire project on startup, but while working on an existing project, Intellij works smoother. For example we have a huge SOAP project, which is impossible to work on with Intellij, so some of us even learn Eclipse just for that.Repository integration: Both of the IDE’s have SVN\GIT\etc plugins. No doubt Intellij’s plugin is more reliable, has better GUI and easier to use.GUI builder: We found that the built in Intellij GUI builder is more comfortable, and as mentioned above, usability wise its easier to learn, and more enjoyable to develop.For a conclusion, a programmer should be able to find the right tool given a specific task. This means that one should be acquainted with both of the IDE’s, in order to face the challenge with the right tool. References:  Intellij vs. Eclipse from our JCG partner Nadav Azaria & Roi Gamliel at the DeveloperLife blog....
java-logo

Practical Garbage Collection, part 1 – Introduction

This is the first part of a series of blog posts I intend to write, whose aim will be to explain how garbage collection works in the real world (in particular, with the JVM). I will cover some theory that I believe is necessary to understand garbage collection enough for practical purposes, but will keep it to a minimum. The motivation is that garbage collection related questions keeps coming up in variety of circumstances, including (for example) on the Cassandra mailing list. The problem when trying to help is that explaining the nuances of garbage collection is too much of an effort to do ad-hoc in a mailing list reply tailored to that specific situation, and you rarely have enough information about the situation to tell someone what their particular problem is caused by. I hope that this guide will be something I can point to in answering these questions. I hope that it will be detailed enough to be useful, yet easy to digest and non-academic enough for a broad audience. I very much appreciate any feedback on what I need to clarify, improve, rip out completely, etc. Much of the information here is not specific to Java. However, in order to avoid constantly invoking generic and abstract terminology, I am going to speak in concrete terms of the Hotspot JVM wherever possible. Why should anyone have to care about the garbage collector? That is a good question. The perfect garbage collector would do its job without a human ever noticing that it exists. Unfortunately, there exists no known perfect (whatever perfection means) garbage collection algorithm. Further, the selection of garbage collectors practically available to most people is additionally limited to a subset of garbage collection algorithms that are in fact implemented. (Similarly, malloc is not perfect either and has its issues, with multiple implementations available with different characteristics. However, this post is not trying to contrast automatic and explicit memory management, although that is an interesting topic.) The reality is that, as with many technical problems, there are some trade-offs involved. As a rule of thumb, if you’re using the freely available Hotspot based JVM:s (Oracle/Sun, OpenJDK), you mostly notice the garbage collector if you care about latency. If you do not, chances are the garbage collector will not be a bother – other than possibly to select a maximum heap size different from the default. By latency, in the context of garbage collection, I mean pause times. The garbage collector needs to pause the application sometimes in order to do some of its work; this is often referred to as a stop-the-world pause (the “world” being the observable universe from the perspective of the Java application, or mutator in GC speak (because it is mutating the heap while the garbage collector is trying to collect it). It is important to note that while all practically available garbage collectors impose stop-the-world pauses on the application, the frequency and duration of these pauses vary greatly with the choice of garbage collector, garbage collector settings, and application behavior. As we shall see, there exists garbage collection algorithms that attempt to avoid the need to ever collect the entire heap in a stop-the-world pause. The reason this is an important property is that if at any point (even if infrequent), you stop the application for a complete collection of the heap, the pause times suffered by the application scale proportionally to the heap size. This is typically the main thing you want to avoid when you care about latency. There are other concerns as well, but this is usually tbe big one. Tracing vs. reference counting You may have heard of reference counting being used (for example, cPython uses a reference counting scheme for most of it’s garbage collection work). I am not going to talk much about it because it is not relevant to JVM:s, except to say two things:One property that reference counting garbage collection has is that an object will be known to be unreachable immediately at the point where the last reference is removed. Reference counting will not detect as unreachable cyclic data structures, and has some other problems that cause it to not be the end-all be-all garbage collection choice.The JVM instead uses what is known as a tracing garbage collector. It is called tracing because, at least at an abstract level, the process of identifying garbage involves taking the root set (things like your local variables on your stack or global variables) and tracing a path from those objects to all objects that are directly or indirectly reachable from said root set. Once all reachable (live) objects have been identified, the objects eligible for being freed by the garbage collector have been identified by a proces of elimination. Basic stop-the-world, mark, sweep, resume A very simple tracing garbage collector works using the following process:Pause the application completely. Mark all objects that are reachable (from the root set, see above) by tracing the object graph (i.e., following references recursively). Free all objects that were not reachable. Resume the application.In a single-threaded world, this is pretty easy to imagine: The call that is responsible for allocating a new object will either return the new object immediately, or, if the heap is full, initiate the above process to free up space, followed by completing the allocation and returning the object. None of the JVM garbage collectors work like this. However, it is good to understand this basic form of a garbage collector, as the available garbage collectors are essentially optimizations of the above process. The two main reasons why the JVM does not implement garbage collection like this are:Every single garbage collection pause will be long enough to collect the entire heap; in other words, it has very poor latency. For almost all real-world applications, it is by far not the most efficient way to perform garbage collection (it has a high CPU overhead).Compacting vs. non-compacting garbage collection An important distinction between garbage collectors is whether or not they are compacting. Compacting refers to moving objects around (in memory) to as to collect them in one dense region of memory, instead of being spread out sparsely over a larger region. Real-world analogy: Consider a room full of things on the floor in random places. Taking all these things and stuffing them tightly in a corner is essentially compacting them; freeing up floor space. Another way to remember what compaction is, is to envision one of those machines that take something like a car and compact it together into a block of metal, thus taking less space than the original car by eliminating all the space occupied by air (but as someone has pointed out, while the car id destroyed, objects on the heap are not!). By contrast a non-compacting collector never moves objects around. Once an object has been allocated in a particular location in memory, it remains there forever or until it is freed. There are some interesting properties of both:The cost of performing a compacting collection is a function of the amount of live data on the heap. If only 1% of data is live, only 1% of data needs to be compacted (copied in memory). By contrast, in a non-compacting collector objects that are no longer reachable still imply book keeping overhead as their memory locations must be kept track of as being freed, to be used in future allocations. In a compacting collector, allocation is usually done via a bump-the-pointer approach. You have some region of space, and maintain your current allocation pointer. If you allocate an object of n bytes, you simply bump that pointer by n (I am eliding complications like multi-threading and optimizations that implies). In a non-compacting collector, allocation involves finding where to allocate using some mechanism that is dependent on the exact mechanism used to track the availability of free memory. In order to satisfy an allocation of n bytes, a contiguous region of n bytes free space must be found. If one cannot be found (because the heap is fragmented, meaning it consists of a mixed bag of free and allocated space), the allocation will fail.Real-world analogy: Consider your room again. Suppose you are a compacting collector. You can move things around on the floor freely at your leisure. When you need to make room for that big sofa in the middle of the floor, you move other things around to free up an appropriately sized chunk of space for the sofa. On the other hand, if you are a non-compacting collector, everything on the floor is nailed to it, and cannot be moved. A large sofa might not fit, despite the fact that you have plenty of floor space available – there is just no single space large enough to fit the sofa. Generational garbage collection Most real-world applications tend to perform a lot allocation of short-lived objects (in other words, objects that are allocated, used for a brief period, and then no longer referenced). A generational garbage collector attempts to exploit this observation in order to be more CPU efficient (in other words, have higher throughput). (More formally, the hypothesis that most applications have this behavior is known as the weak generational hypothesis.) It is called “generational” because objects are divided up into generations. The details will vary between collectors, but a reasonable approximation at this point is to say that objects are divided into two generations:The young generation is where objects are initially allocated. In other words, all objects start off being in the young generation. The old generation is where objects “graduate” to when they have spent some time in the young generation.The reason why generational collectors are typically more efficient, is that they collect the young generation separately from the old generation. Typical behavior of an application in steady state doing allocation, is frequent short pauses as the young generation is being collected – punctuated by infrequent but longer pauses as the old generation fills up and triggers a full collection of the entire heap (old and new). If you look at a heap usage graph of a typical application, it will look similar to this:Typical sawtooth behavior of heap usage with the throughput collectorThe ongoing sawtooth look is a result of young generation garbage collections. The large dip towards the end is when the old generation became full and the JVM did a complete collection of the entire heap. The amount of heap usage at the end of that dip is a reasonable approximation of the actual live set at that point in time. (Note: This is a graph from running a stress test against a Cassandra instance configured to use the default JVM throughput collector; it does not reflect out-of-the-box behavior of Cassandra.) Note that simply picking the “current heap usage” at an arbitrary point in time on that graph will not give you an idea of the memory usage of the application. I cannot stress that point enough. What is typically considered the memory “usage” is the live set, not the heap usage at any particular time. The heap usage is much more a function of the implementation details of the garbage collector; the only effect on heap usage from the memory usage of the application is that it provides a lower bound on the heap usage. Now, back to why generational collectors are typically more efficient. Suppose our hypothetical application is such that 90% of all objects die young; in other words, they never survive long enough to be promoted to the old generation. Further, suppose that our collection of the young generation is compacting (see previous sections) in nature. The cost of collecting the young generation is now roughly that of tracing and copying 10% of the objects it contains. The cost associated with the remaining 90% was quite small. Collection of the young generation happens when it becomes full, and is a stop-the-world pause. The 10% of objects that survived may be promoted to the old generation immediately, or they may survive for another round or two in young generation (depending on various factors). The important overall behavior to understand however, is that objects start off in the young generation, and are promoted to the old generation as a result of surviving in the young generation. (Astute readers may have noticed that collecting the young generation completely separately is not possible – what if an object in the old generation has a reference to an object in the new generation? This is indeed something a garbage collector must deal with; a future post will talk about this.) The optimization is quite dependent on the size of the young generation. If the size is too large, it may be so large that the pause times associated with collecting it is a noticeable problem. If the size is too small, it may be that even objects that die young do not die quite quickly enough to still be in the young generation when they die. Recall that the young generation is collected when it becomes full; this means that the smaller it is, the more often it will be collected. Further recall that when objects survive the young generation, they get promoted to the old generation. If most objects, despite dying young, never have a chance to die in the young generation because it is too small – they will get promoted to the old generation and the optimization that the generational garbage collector is trying to make will fail, and you will take the full cost of collecting the object later on in the old generation (plus the up-front cost of having copied it from the young generation). Parallel collection The point of having a generational collector is to optimize for throughput; in other words, the total amount of work the application gets to do in a particular amount of time. As a side-effect, most of the pauses incurred due to garbage collection also become shorter. However, no attempt is made to eliminate the periodic full collections which will imply a pause time of whatever is necessary to complete a full collection. The throughput collector does do one thing which is worth mentioning in order to mitigate this: It is parallel, meaning it uses multiple CPU cores simultaneously to speed up garbage collection. This does lead to shorter pause times, but there is a limit to how far you can go – even in an unrealistic perfect situation of a linear speed-up (meaning, double CPU count -> half collection time) you are limited by the number of CPU cores on your system. If you are collecting a 30 GB heap, that is going to take some significant time even if you do so with 16 parallel threads. In garbage collection parlance, the word parallel is used to refer to a collector that does work on multiple CPU cores at the same time. Incremental collection Incremental in a garbage collection context refers to dividing up the work that needs to be done in smaller chunks, often with the aim of pausing the applications for multiple brief periods instead of a single long pause. The behavior of the generational collector described above is partially incremental in the sense that the young generation collectors constitute incremental work. However, as a whole, the collection process is not incremental because of the full heap collections incurred when the old generation becomes full. Other forms of incremental collections are possible; for example, a collector can do a tiny bit of garbage collection work for every allocation performed by the application. The concept is not tied to a particular implementation strategy. Concurrent collection Concurrent in a garbage collection context refers to performing garbage collection work concurrently with the application (mutator). For example, on an 8 core system, a garbage collector might keep two background threads that do garbage collection work while the application is running. This allows significant amounts of work to be done without incurring an application pause, usually at some cost of throughput and implementation complexity (for the garbage collector implementor). Available Hotspot garbage collectors The default choice of garbage collector in Hotspot is the throughput collector, which is a generational, parallel, compacting collector. It is entirely optimized for throughput; total amount of work achieved by the application in a given time period. The traditional alternative for situations where latency/pause times are a concern, is the CMS collector. CMS stands for Concurrent Mark & Sweep and refers to the mechanism used by the collector. The purpose of the collector is to minimize or even eliminate long stop-the-world pauses, limiting garbage collection work to shorter stop-the-world (often parallel) pauses, in combination with longer work performed concurrently with the application. An important property of the CMS collector is that it is not compacting, and thus suffers from fragmentation concerns (more on this in a later blog post). As of later versions of JDK 1.6 and JDK 1.7, there is a new garbage collector available which is called G1 (which stands for Garbage First). It’s aim, like the CMS collector, is to try to mitigate or eliminate the need for long stop-the-world pauses and it does most of it’s work in parallel in short stop-the-world incremental pauses, with some work also being done concurrently with the application. Contrary to CMS, G1 is a compacting collector and does not suffer from fragmentation concerns – but has other trade-offs instead (again, more on this in a later blog post). Observing garbage collector behavior I encourage readers to experiment with the behavior of the garbage collector. Use jconsole (comes with the JDK) or VisualVM (which produced the graph earlier on in this post) to visualize behavior on a running JVM. But, in particular, start getting familiar with garbage collection log output, by running your JVM with (updated with jbellis’ feedback – thanks!):-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailureAlso useful but verbose (meaning explained in later posts):-XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:PrintFLSStatistics=1The output is pretty easy to read for the throughput collector. For CMS and G1, the output is more opaque to analysis without an introduction. I hope to cover this in a later update. In the mean time, the take-away is that those options above are probably the first things you want to use whenever you suspect that you have a GC related problem. It is almost always the first thing I tell people when they start to hypothesize GC issues; have you looked at GC logs? If you have not, you are probably wasting your time speculating about GC. Conclusion I have tried to produce a crash-course introduction that I hope was enlightening in and of itself, but is primarily intended as background for later posts. I welcome any feedback, particularly if things are unclear or if I am making too many assumptions. I want this series to be approachable by a broad audience as I said in the beginning, though I certainly do assume some level of expertise. But intimate garbage collection knowledge should not be required. If it is, I have failed – please let me know. Reference: Practical Garbage Collection, part 1 – Introduction  from our JCG partner Peter Schuller at the (mod :world :scode) blog...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close