Featured FREE Whitepapers

What's New Here?

play-framework-logo

Connect to RabbitMQ (AMQP) using Scala, Play and Akka

In this article we’ll look at how you can connect from Scala to RabbitMQ so you can support the AMQP protocol from your applications. In this example I’ll use the Play Framework 2.0 as container (for more info on this see my other article on this subject) to run the application in, since Play makes developing with Scala a lot easier. This article will also use Akka actors to send and receive the messages from RabbitMQ. What is AMQP First, a quick introduction into AMQP. AMQP stands for “Advanced Message Queueing Protocol” and is an open standard for messaging. The AMQP homepage states their vision as this: “To become the standard protocol for interoperability between all messaging middleware”. AMQP defines a transport level protocol for exchanging messages that can be used to integrate applications from a number of different platform, languages and technologies. There are a number of tools implementing this protocol, but one that is getting more and more attention is RabbitMQ. RabbitMQ is an open source, erlang based message broker that uses AMQP. All application that can speak AMQP can connect to and make use of RabbitMQ. So in this article we’ll show how you can connect from your Play2/Scala/Akka based application to RabbitMQ. In this article we’ll show you how to do implement the two most common scenarios:Send / recieve: We’ll configure one sender to send a message every couple of seconds, and use two listeners that will read the messages, in a round robin fashion, from the queue. Publish / subscribe: For this example we’ll create pretty much the same scenario, but this time, the listeners will both get the message at the same time.I assume you’ve got an installation of RabbitMQ. If not follow the instructions from their site. Setup basic Play 2 / Scala project For this example I created a new Play 2 project. Doing this is very easy: jos@Joss-MacBook-Pro.local:~/Dev/play-2.0-RC2$ ./play new Play2AndRabbitMQ _ _ _ __ | | __ _ _ _| | | '_ \| |/ _' | || |_| | __/|_|\____|\__ (_) |_| |__/   play! 2.0-RC2, http://www.playframework.org   The new application will be created in /Users/jos/Dev/play-2.0/PlayAndRabbitMQ   What is the application name? > PlayAndRabbitMQ   Which template do you want to use for this new application?   1 - Create a simple Scala application 2 - Create a simple Java application 3 - Create an empty project   > 1   OK, application PlayAndRabbitMQ is created.   Have fun! I am used to work from Eclipse with the scala-ide pluging, so I execute play eclipsify and import the project in Eclipse. The next step we need to do is set up the correct dependencies. Play uses sbt for this and allows you to configure your dependencies from the build.scala file in your project directory. The only dependency we’ll add is the java client library from RabbitMQ. Even though Lift provides a scala based AMQP library, I find using the RabbitMQ one directly just as easy. After adding the dependency my build.scala looks like this: import sbt._ import Keys._ import PlayProject._   object ApplicationBuild extends Build {   val appName = "PlayAndRabbitMQ" val appVersion = "1.0-SNAPSHOT"   val appDependencies = Seq( "com.rabbitmq" % "amqp-client" % "2.8.1" )   val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings( ) } Add rabbitMQ configuration to the config file For our examples we can configure a couple of things. The queue where to send the message to, the exchange to use, and the host where RabbitMQ is running. In a real world scenario we would have more configuration options to set, but for this case we’ll just have these three. Add the following to your application.conf so that we can reference it from our application. #rabbit-mq configuration rabbitmq.host=localhost rabbitmq.queue=queue1 rabbitmq.exchange=exchange1 We can now access these configuration files using the ConfigFactory. To allow easy access create the following object: object Config { val RABBITMQ_HOST = ConfigFactory.load().getString("rabbitmq.host"); val RABBITMQ_QUEUE = ConfigFactory.load().getString("rabbitmq.queue"); val RABBITMQ_EXCHANGEE = ConfigFactory.load().getString("rabbitmq.exchange"); } Initialize the connection to RabbitMQ We’ve got one more object to define before we’ll look at how we can use RabbitMQ to send and receive messages. to work with RabbitMQ we require a connection. We can get a connection to a server by using a ConnectionFactory. Look at the javadocs for more information on how to configure the connection. object RabbitMQConnection {   private val connection: Connection = null;   /** * Return a connection if one doesn't exist. Else create * a new one */ def getConnection(): Connection = { connection match { case null => { val factory = new ConnectionFactory(); factory.setHost(Config.RABBITMQ_HOST); factory.newConnection(); } case _ => connection } } } Start the listeners when the application starts We need to do one more thing before we can look at the RabbitMQ code. We need to make sure our message listeners are registered on application startup and our senders start sending. Play 2 provides a GlobalSettings object for this which you can extend to execute code when your application starts. For our example we’ll use the following object (remember, this needs to be stored in the default namespace: import play.api.mvc._ import play.api._ import rabbitmq.Sender   object Global extends GlobalSettings {   override def onStart(app: Application) { Sender.startSending } } We’ll look at this Sender.startSending operation, which initializes all the senders and receivers in the following sections. Setup send and receive scenario Let’s look at the Sender.startSending code that will setup a sender that sends a msg to a specific queue. For this we use the following piece of code: object Sender {   def startSending = { // create the connection val connection = RabbitMQConnection.getConnection(); // create the channel we use to send val sendingChannel = connection.createChannel(); // make sure the queue exists we want to send to sendingChannel.queueDeclare(Config.RABBITMQ_QUEUE, false, false, false, null);   Akka.system.scheduler.schedule(2 seconds, 1 seconds , Akka.system.actorOf(Props( new SendingActor(channel = sendingChannel, queue = Config.RABBITMQ_QUEUE))) , "MSG to Queue"); } }   class SendingActor(channel: Channel, queue: String) extends Actor {   def receive = { case some: String => { val msg = (some + " : " + System.currentTimeMillis()); channel.basicPublish("", queue, null, msg.getBytes()); Logger.info(msg); } case _ => {} } } In this code we take the following steps:Use the factory to retrieve a connection to RabbitMQ Create a channel on this connection to use in communicating with RabbitMQ Use the channel to create the queue (if it doesn’t exist yet) Schedule Akka to send a message to an actor every second.This all should be pretty straightforward. The only (somewhat) complex part is the scheduling part. What this schedule operation does is this. We tell Akka to schedule a message to be sent to an actor. We want a 2 seconds delay before it is fired, and we want to repeat this job every second. The actor that should be used for this is the SendingActor you can also see in this listing. This actor needs access to a channel to send a message and this actor also needs to know where to send the message it receives to. This is the queue. So every second this Actor will receive a message, append a timestamp, and use the provided channel to send this message to the queue: channel.basicPublish(“”, queue, null, msg.getBytes());. Now that we send a message each second it would be nice to have listeners on this queue that can receive messages. For receiving messages we’ve also created an Actor that listens indefinitely on a specific queue. class ListeningActor(channel: Channel, queue: String, f: (String) => Any) extends Actor {   // called on the initial run def receive = { case _ => startReceving }   def startReceving = {   val consumer = new QueueingConsumer(channel); channel.basicConsume(queue, true, consumer);   while (true) { // wait for the message val delivery = consumer.nextDelivery(); val msg = new String(delivery.getBody());   // send the message to the provided callback function // and execute this in a subactor context.actorOf(Props(new Actor { def receive = { case some: String => f(some); } })) ! msg } } } This actor is a little bit more complex than the one we used for sending. When this actor receives a message (kind of message doesn’t matter) it starts listening on the queue it was created with. It does this by creating a consumer using the supplied channel and tells the consumers to start listening on the specified queue. The consumer.nextDelivery() method will block until a message is waiting in the configured queue. Once a message is received, a new Actor is created to which the message is sent. This new actor passes the message on to the supplied method, where you can put your business logic. To use this listener we need to supply the following arguments:Channel: Allows access to RabbitMQ Queue: The queue to listen to for messages f: The function that we’ll execute when a message is received.The final step for this first example is glueing everything together. We do this by adding a couple of method calls to the Sender.startSending method. def startSending = { ... val callback1 = (x: String) => Logger.info("Recieved on queue callback 1: " + x);   setupListener(connection.createChannel(),Config.RABBITMQ_QUEUE, callback1);   // create an actor that starts listening on the specified queue and passes the // received message to the provided callback val callback2 = (x: String) => Logger.info("Recieved on queue callback 2: " + x);   // setup the listener that sends to a specific queue using the SendingActor setupListener(connection.createChannel(),Config.RABBITMQ_QUEUE, callback2); ... }   private def setupListener(receivingChannel: Channel, queue: String, f: (String) => Any) { Akka.system.scheduler.scheduleOnce(2 seconds, Akka.system.actorOf(Props(new ListeningActor(receivingChannel, queue, f))), ""); } In this code you can see that we define a callback function, and use this callback function, together with the queue and the channel to create the ListeningActor. We use the scheduleOnce method to start this listener in a separate thread. Now with this code in place we can run the application (play run) open up localhost:9000 to start the application and we should see something like the following output. [info] play - Starting application default Akka system. [info] play - Application started (Dev) [info] application - MSG to Exchange : 1334324531424 [info] application - MSG to Queue : 1334324531424 [info] application - Recieved on queue callback 2: MSG to Queue : 1334324531424 [info] application - MSG to Exchange : 1334324532522 [info] application - MSG to Queue : 1334324532522 [info] application - Recieved on queue callback 1: MSG to Queue : 1334324532522 [info] application - MSG to Exchange : 1334324533622 [info] application - MSG to Queue : 1334324533622 [info] application - Recieved on queue callback 2: MSG to Queue : 1334324533622 [info] application - MSG to Exchange : 1334324534722 [info] application - MSG to Queue : 1334324534722 [info] application - Recieved on queue callback 1: MSG to Queue : 1334324534722 [info] application - MSG to Exchange : 1334324535822 [info] application - MSG to Queue : 1334324535822 [info] application - Recieved on queue callback 2: MSG to Queue : 1334324535822 Here you can clearly see the round-robin way messages are processed. Setup publish and subscribe scenario Once we’ve got the above code running, adding publish / subscribe functionality is very trivial. Instead of the SendingActor we now use a PublishingActor: class PublishingActor(channel: Channel, exchange: String) extends Actor {   /** * When we receive a message we sent it using the configured channel */ def receive = { case some: String => { val msg = (some + " : " + System.currentTimeMillis()); channel.basicPublish(exchange, "", null, msg.getBytes()); Logger.info(msg); } case _ => {} } } An exchange is used by RabbitMQ to allow multiple recipients to receive the same message (and a whole lot of other advanced functionality). The only change in the code from the other actor is that this time we send the message to an exchange instead of to a queue. The listener code is exactly the same, the only thing we need to do is connect a queue to a specific exchange. So that listeners on that queue receive the messages sent to to the exchange. We do this, once again, from the setup method we used earlier. ... // create a new sending channel on which we declare the exchange val sendingChannel2 = connection.createChannel(); sendingChannel2.exchangeDeclare(Config.RABBITMQ_EXCHANGEE, "fanout");   // define the two callbacks for our listeners val callback3 = (x: String) => Logger.info("Recieved on exchange callback 3: " + x); val callback4 = (x: String) => Logger.info("Recieved on exchange callback 4: " + x);   // create a channel for the listener and setup the first listener val listenChannel1 = connection.createChannel(); setupListener(listenChannel1,listenChannel1.queueDeclare().getQueue(), Config.RABBITMQ_EXCHANGEE, callback3);   // create another channel for a listener and setup the second listener val listenChannel2 = connection.createChannel(); setupListener(listenChannel2,listenChannel2.queueDeclare().getQueue(), Config.RABBITMQ_EXCHANGEE, callback4);   // create an actor that is invoked every two seconds after a delay of // two seconds with the message "msg" Akka.system.scheduler.schedule(2 seconds, 1 seconds, Akka.system.actorOf(Props( new PublishingActor(channel = sendingChannel2 , exchange = Config.RABBITMQ_EXCHANGEE))), "MSG to Exchange"); ... We also created an overloaded method for setupListener, which, as an extra parameter, also accepts the name of the exchange to use. private def setupListener(channel: Channel, queueName : String, exchange: String, f: (String) => Any) { channel.queueBind(queueName, exchange, "");   Akka.system.scheduler.scheduleOnce(2 seconds, Akka.system.actorOf(Props(new ListeningActor(channel, queueName, f))), ""); } In this small piece of code you can see that we bind the supplied queue (which is a random name in our example) to the specified exchange. After that we create a new listener as we’ve seen before. Running this code now will result in the following output: [info] play - Application started (Dev) [info] application - MSG to Exchange : 1334325448907 [info] application - MSG to Queue : 1334325448907 [info] application - Recieved on exchange callback 3: MSG to Exchange : 1334325448907 [info] application - Recieved on exchange callback 4: MSG to Exchange : 1334325448907 [info] application - MSG to Exchange : 1334325450006 [info] application - MSG to Queue : 1334325450006 [info] application - Recieved on exchange callback 4: MSG to Exchange : 1334325450006 [info] application - Recieved on exchange callback 3: MSG to Exchange : 1334325450006 As you can see, in this scenario both listeners receive the same message. That pretty much wraps it up for this article. As you’ve seen using the Java based client api for RabbitMQ is more than sufficient, and easy to use from Scala. Note though that this example is not production ready, you should take care to close connections, nicely shutdown listeners and actors. All this shutdown code isn’t shown here. Reference: Connect to RabbitMQ (AMQP) using Scala, Play and Akka from our JCG partner Jos Dirksen at the Smart Java blog....
aspectj-logo

AOP made easy with AspectJ and Spring

I recently started looking at Aspect Oriented Programming (AOP) and I’m finding it exciting to say the least. Of course I was acquainted with it, since I saw it used for transaction management within Spring but I have never looked at it in depth. In this article I want to show how quick it is to get up to speed with AOP and Spring thanks to AspectJ. The material in this article is based on the excellent AOP book AspectJ in Action by Ramnivas Laddad. AOP is not a language, but rather an approach to software engineering. Like any methodology it has got different implementations and AspectJ is currently the richest and most complete of all. Since AspectJ and AspectWerkz merged, it is now possible to create aspects using annotations. The reason developers write code is to provide functionality of some sort. The kind of functioniality is not important for this discussion: some might want to deliver business functionality, others might write code for research purposes, other for sheer fun. The point is that any information system has got a core motive, a key functionality which it wants to deliver. For instance, I recently wrote PODAM, a testing tool which has as its ultimate goal that of automatically fill POJO / JavaBean properties. Every information system has also got needs for orthogonal services (what AOP calls crosscutting concerns); for instance logging, security, auditing, exception management and so on. While an information system can be divided into discrete pieces of functionality (what AOP defines join points), orthogonal services are required across the board. For instance, if one wanted to log how long the execution of every single public method took, each public method should have something like the following pseudo-code: public void someBusinessMethod() {long start = System.currentTimeInMilliseconds();doTheBusinessFunctionality();long end = System.currentTimeInMilliseconds();log.debug("The execution of someBusinessMethod took " + (end - start) + " milliseconds");}In the above method, the core functionality is identified solely by someBusinessMethod() whereas everything else is just logging activity. It would be nice to have something like: //Some external magic happens before the invocation of this method to take the start time public void someBusinessMethod() {doTheBusinessFunctionality();} //Some external magic happens after the invocation of this method to take the end time and logs how long the execution took.Developers typically want logging, security, etc. throughout their application, not for a single method; AOP allows developers to achieve this goal by defining somewhere externally (called an Aspect) the behaviour to apply to all code matching some pattern (AOP actually allows for a broader set of functionalities, such as the possibility to add interfaces, instance variables, methods, etc to a class just to name one). This empowered behaviour is then somewhat added to the final executing code by what AOP calls a Weaver. There are various ways that this can be achieved: weaving can happen at the source level, at the binary level and at load time. You could think of the weaver as the linker in C and C++; sources and libraries are linked together to create an executable; the weaver combines together Java code and aspects to create empowered behaviour. Spring achieves this empowered behaviour by creating an AOP proxy around the code whose behaviour must be enriched. The code that follows shows a very simple example based on AspectJ; the example surrounds the execution of a simple method with some Authentication service. The Authentication services looks very simple (the point is not how the functionality has been implemented but rather that an authentication service is available): /** * */ package uk.co.jemos.aop;/** * A simple authenticator service. * * @author mtedone * */ public class Authenticator {public void authenticate() { System.out.println("Authenticated"); } }Now let's have a look at the business logic:/** * */ package uk.co.jemos.aop;/** * A simple service which delivers messages * @author mtedone * */ public class MessageCommunicator {public void deliver(String message) { System.out.println(message); }public void deliver(String person, String message) { System.out.println(person + ", " + message); }}What we would like is for the Authenticator to be invoked before the invocation of any of the business methods of MessageCommunicator. Using AspectJ annotation syntax, we write in Aspect in pure Java: package uk.co.jemos.aop;import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Before; import org.aspectj.lang.annotation.Pointcut;@Aspect public class SecurityAspect {private Authenticator authenticator = new Authenticator();@Pointcut("execution(* uk.co.jemos.aop.MessageCommunicator.deliver(..))") public void secureAccess() { };@Before("secureAccess()") public void secure() {System.out.println("Checking and authenticating user..."); authenticator.authenticate();}}   The code above is a bit more interesting. An Aspect is marked with the @Aspect annotation. A Pointcut is some point of interest in our code, where we would like our Aspect to kick in. The syntax @Pointcut(“execution(* uk.co.jemos.aop.MessageCommunicator.deliver(..))”) public void secureAccess() { }; means: “Define a Pointcut named secureAccess which applies to all deliver methods within the MessageCommunicator class, regardless of the return type of such method”. What follows is called an advice, and it’s where AOP empowers the behaviour of our class: @Before("secureAccess()") public void secure() {System.out.println("Checking and authenticating user..."); authenticator.authenticate();}The code above says: “Before any match of the secureAccess() Pointcut apply the code within the block”. All of the above is pure Java, although the annotations belong to the AspectJ runtime. To use the above aspect with Spring, I defined a Spring context file: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:aop="http://www.springframework.org/schema/aop" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd"><aop:aspectj-autoproxy /> <bean id="messageCommunicator" /> <bean id="securityAspect" /></beans>The XML element: <aop:aspectj-autoproxy /> instructs Spring to create a proxy around every aspect. Now when I use the MessageCommunicator from a client: /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "classpath:aop-appContext.xml");MessageCommunicator communicator = ctx.getBean("messageCommunicator", MessageCommunicator.class); communicator.deliver("Hello World"); communicator.deliver("Marco", "Hello World"); } I get the following output: INFO: Loading XML bean definitions from class path resource [aop-appContext.xml] 15-May-2011 11:51:41 org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@21b64e6a: defining beans [org.springframework.aop.config.internalAutoProxyCreator,messageCommunicator,securityAspect]; root of factory hierarchy Checking and authenticating user… Authenticated Hello World  Checking and authenticating user… Authenticated Marco, Hello World AOP substantially changes the way we think software engineering, by allowing us to externalise crosscutting concerns in external components which are then weaved into our code when needed.This allows for cleaner and more maintainable code and the implementations are limitless. Additionally, if we are careful in writing our Aspects by making them reusable, we can quickly come up with a library of general-purpose, reusable aspects which add functionality to our code in an injected way. There are obviously drawbacks in the adoption of AOP, mainly the learning curve which is required by developers to get acquainted with the technology. AspectJ defines its own language and syntax, as the example above demonstrates); the @Before annotation is just one possibility: advices can be applied before, after, around objects; additionally the syntax to define Pointcuts is not Java but rather script-like. AspectJ aspects also have keywords and native objects to capture the context of the join points they advice, and this syntax needs to be learned. However, the potential gains outweight by large the extra effort required in learning this new and exciting technology. Reference: AOP made easy with AspectJ and Spring from our JCG partner Marco Tedone at the Marco Tedone’s blog....
software-development-2-logo

Open Source Culture and Ideals

Watching the commercial software industry frenetically trying to make process work occasionally feels similar to watching tragicomedy. This constant struggle of trying to force their feet into these tailor-made development processes left and right: agile, lean, scrum, kanban, even waterfall, whatever, just because some manager or “tech-lead” read a blog post. The Emperor’s New Groove, eh? It can be a really shocking experience to observe this rather bizarre circus sometimes. But what is even more astounding is that these corporations often seem ignorant towards open source cultures and ideals. Do they know they exist? Are they ignorant? Or is decentralization and meritocracy so scary (even threatening) to leaders of the organization? Maybe. Who knows. Go figure. But in the context of corporations, I cannot help thinking about people like Jim Whitehurst, President and CEO of Red Hat. This man seems to be a humble and smart guy, wanting the best for his people and company. What has Jim seen that so many others have not? Well, if you work at a company that struggle with process without getting any real work done, you first have to realize that “Culture is King”. You will never get process right without culture, ideals and beliefs. One important difference between culture and process is that culture is never forced. Culture is more like a garden or family. Treat it carefully, give it freedom and nourishment to grow and eventually (with good seeds of course) it will flourish and process emerge in a strong and natural way. I’m not saying that open source is the answer per se; not everything must be open source, but open source culture accumulates so much timeless experience, healthy principles and working ethics. On the contrary, corporations often lose invaluable information along with people leaving them and they constantly struggle to re-build expertise and educate their workforce. Anyway, here is a short reading list (in no particular order) for those of who want to know more about open source culture, community collaboration and (in opinion) the most appealing and practical way of doing software development. (Comments on each reading is not mine, but from its author(s) or other source(s)) Open Advice – Misc Open Advice is a knowledge collection from a wide variety of Free Software projects. It answers the question what 42 prominent contributors would have liked to know when they started so you can get a head-start no matter how and where you contribute. The Open Source Way – Red Hat Community Architecture team Guide for helping people to understand how to and how not to engage with community over projects such as software, content, marketing, art, infrastructure, standards, and so forth. It contains knowledge distilled from years of Red Hat experience.Open Source Community Values – Jeff Cohen Welcome to Our Community. Here Are the Ground Rules. The Art of Community: Building the New Age of Participation – Jono Bacon Will help you develop the broad range of talents you need to recruit members to your community, motivate and manage them, and help them become active participants. Producing Open Source Software – Karl Fogel A book about the human side of open source development. It describes how successful projects operate, the expectations of users and developers, and the culture of free software.Open Sources: Voices from the Open Source Revolution – Misc Leaders of Open Source come together for the first time to discuss the new vision of the software industry they have created. The essays in this volume offer insight into how the Open Source movement works, why it succeeds, and where it is going. Debian Constitution – The Debian Project This document describes the organisational structure for formal decision-making in the Debian Project. The Cathedral and the Bazaar – Eric Steven Raymond Surprising theories about software engineering suggested by the history of Linux. The Art of Unix Programming – Eric Steven Raymond This book has a lot of knowledge in it, but it is mainly about expertise. It is going to try to teach you the things about Unix development that Unix experts know, but aren’t aware that they know. How To Ask Questions The Smart Way – Eric Steven Raymond In the world of hackers, the kind of answers you get to your technical questions depends as much on the way you ask the questions as on the difficulty of developing the answer. This guide will teach you how to ask questions in a way more likely to get you a satisfactory answer. How the ASF works – Apache Software Foundation Will give you everything you always wanted to know about ASF but were afraid to ask. Apache Subversion Community Guide – Subversion Community Subversion community participation guidelines. Python Community Diversity Statement – Python Community The Python Software Foundation and the global Python community welcome and encourage participation by everyone. Our community is based on mutual respect, tolerance, and encouragement, and we are working to help each other live up to these principles. Eclipse Development Process – The Eclipse Foundation This document describes the Development Process for the Eclipse Foundation. Ubuntu Code of Conduct – Ubuntu Community This Code of Conduct covers our behaviour as members of the Ubuntu Community, in any forum, mailing list, wiki, web site, IRC channel, install-fest, public meeting or private correspondence. Mozilla Code of Conduct (draft) – Mozilla Foundation This Code of Conduct covers our behaviour as members of the Mozilla Community, in any forum, mailing list, wiki, web site, IRC channel, bug, event, public meeting or private correspondence. Reference: Open Source Culture and Ideals from our JCG partner Kristoffer Sjögren at the deephacks blog....
java-logo

Java Enum puzzler

Let’s suppose we have the following code:enum Case { CASE_ONE, CASE_TWO, CASE_THREE; private static final int counter; private int valueDependsOnCounter; static { int sum = 0; for(int i = 0; i<10; i++) { sum +=i; } counter = sum; } Case() { this.valueDependsOnCounter = counter*counter; } }What do you think is the result of compiling and running the code?Compiler error Runtime error Runs ok but valueDependsOnCounter has a strange value Runs okGive it a second of thought. (Spoiler block) The answer is the 8th letter in the following sequence: bdcadcbabcad. To shed a light on this it’s neccesary to review the following: A. The order of static initalization inside a class:static viaribales in the order they apear static blocks in the order they apear instance variables in the order theyappear constructorsB. The order of constructor calling (this aplies to the statics as well):super classes local classC. The way in which a enum object is represented in java: 1) An enums of name E is a class that among others, has an *implicit* static final field named n of type Efor every member of the enum. More specificaly, the Caseclass could be written in the following way: enum Case {public static final Case CASE_ONE;public static final Case CASE_TWO;public static final Case CASE_THREE;…} 2) The above members apear in the order they are declared and are located above all the other static members of the enum (that means they are the first ones to be initialized). 3) The enum constant is said to be created when the corresponding field is initialized. So the compiler gives an error something like ”It is illegal to access static member counter from enum or instance initializer.”. This is because the order in which the enums are initialized: 1)public static final Case CASE_ONE; 2)public static final Case CASE_TWO; 3)public static final Case CASE_THREE; 4)public static final counter; 5) static { .. counter = something;} 6) Case() { this.valueDependsOnCounter = counter*counter; } The first thing that needs to be done is init the CASE_ONE but that would have to call the Case() constructor which in turn depends on the counter which is only initialized in the static {} block (but which hasn’t been executed yet). Now accessing a static from a constructor would be a huge limitation but this is what this flow somehow suggest, that you cannot use statics in a constructor of an enum. Luckly, this is not quite right. What the error is actually trying to tell us is that ”It is a compile-time error to reference a static field of an enum type that is not a *compile-time constant * from constructors, instance initializer blocks, or instance variable initializer expressions of that type.”. The compiler does in fact allow access to statics fields in a enum constructor but only for those that it can compute staticaly (as an optimization mechanism). If we had: enum Case { CASE_ONE, CASE_TWO, CASE_THREE;private static final int counter = 0; private int valueDependsOnCounter; Case() { this.valueDependsOnCounter = counter*counter; } } , all would have been fine since the compiler could have predicted the initalization of counter, use it in the constructor, build the enum instance, and assign it to the static final CASE_ONEvariable. But since counter depends on some hard to predict computation, an error is raised. There are two solutions for this problem, in order to still have the code work: 1) Put the statics that you need in a nested class and access them from there: class Nested { private static final int counter;static { int sum = 0;for(int i = 0; i<10; i++) { sum +=i;} counter = sum;}} enum Case { CASE_ONE, CASE_TWO, CASE_THREE;private static final int counter;private int valueDependsOnCounter; Case() {this.valueDependsOnCounter = Nested.counter*Nested.counter; }}2) Initialize in a static block not in the constructor (recomended): enum Case {CASE_ONE,CASE_TWO,CASE_THREE;private static final int counter; private int valueDependsOnCounter; static { int sum = 0; for(int i = 0; i<10; i++) { sum +=i;}counter = sum; for(Case c : Case.values()) { c.valueDependsOnCounter = counter*counter; } } } The exception discussed is even specified in the JAVA specification document. Reference: Enum puzzler from our JCG partner Attila-Mihaly Balazs at the Transylvania JUG blog. ...
subversion-logo

Application Lifecycle Management at Eclipse

The Eclipse Foundation has evolved a pretty impressive application lifecycle management story. Based on what I’ve observed over the years, I believe that it’s completely reasonable to say that our ALM story is one of the best in the world: the envy of hundreds of open source projects and closed source development shops around the world. I believe that our success comes from a combination of great people, well-defined process, and a powerful stack of open-source tools.We started from humble beginnings: issue tracking with Bugzilla, CVS for source code management, PDE Build and Ant scripts for build, cron for orchestration, and the Eclipse Development Process for guidance. Over time, all the the pieces have evolved resulting in the world-class whole that we have today. Process The Eclipse Development Process describes the structure of projects and teams at Eclipse. It provides guidance to help projects run in an open, transparent, and vendor-neutral manner. It provides a framework for processes around releases and other important stages in project lifecycle (e.g. creation, graduation, and termination). From a process point of view, it’s a pretty high-level document that provides a framework for day-to-day work; individual projects, however, are given flexibility to decide how they run their day-to-day development. Eclipse projects have the benefit of the most comprehensive IP management and due diligence process available to open source projects. In fact–based on my experience working with many dozens of companies over the years–it is one of the most comprehensive IP management processes available to anybody. IP Management is important when you care about adoption of your open source project. Adopters need to know that they can safely use the output of your open source project in their own projects and products. Tools The tools story has evolved considerably since those humble beginnings.We still use Bugzilla for issue tracking. We have a second Bugzilla instance, called IPZilla, that we use for tracking intellectual property contributions and use of third-party libraries. And we’ve added a few new pieces:Subversion (SVN) was added as an alternative to CVS for source code management, but both are now being phased out in favour of Git. In fact, CVS is meeting its end-of-life at Eclipse on December 21/2012 and Subversion is no longer an option for new Eclipse projects. Moving forward, it’s Git or nothing. Hudson provides build orchestration. As of EclipseCon 2012, we have 337 build jobs that have run a total of 86,000 times on Hudson: ninety-eight run daily, and 218 have run in the last month. Gerrit was recently implemented by our noble Webmaster team to provide code review for projects that opt to use it. Using Gerrit streamlines the contribution workflow: contributors can push their commits directly to Gerrit where project committers can quickly and easily process them. Gerrit has a lot of very cool tricks including the ability to invoke Hudson builds to confirm that new contributions will actually compile (Hudson, in effect, gets a vote on whether or not a contribution should be accepted). With the introduction and subsequent development of Tycho–technology that lets Maven understand and build Eclipse plug-ins and OSGi bundles–Maven-based builds are quickly becoming the gold standard for Eclipse projects. Tying it all together is, of course, Eclipse. The Mylyn project provides integration from Eclipse to Bugzilla (along with many dozens of other issue trackers), Hudson, and Gerrit. The EGit project provides integration with Git, and the m2e project provides support for maintaining Maven build artifacts. People None of this would be possible, of course, without people. The implementation of Git at Eclipse was not a trivial matter. We had a lot of questions that needed to be be answered, and the Eclipse Committer community stepped up to help us answer those questions. There are similar stories for the other technologies that have been adopted at Eclipse. We’re still working on our Maven story. Like everything else at Eclipse, this isn’t being driven from the top-down, but rather it is being driven by the developer community. One of the most important things that keeps all this technology and process moving forward is communication. We have a strong ethic of transparent communication and open discussion across all 250+ Eclipse projects. Of course people drive both the evolution of technology and process. The Eclipse Architecture Council–a group of old and grizzled veterans of open source development–provides assistance to projects who are just learning the process, and are responsible for evolving the process as necessary. The Eclipse Development Process is considered a living document and so is subject to change. We tweak and adapt our processes and practices on an ongoing basis. No discussion of the Eclipse ALM story is complete without discussing the simultaneous release. The simultaneous release, which is coordinated by the Eclipse Planning Council, is as much about people as it is technology. Every year, many of projects join together to coordinate their development schedules and combine their releases into a single mega-event. This year’s Juno release includes 71 separate Eclipse projects and (total SWAG) 60 million lines of code.The size of the simultaneous release has been growing over the years. Something as massive as the simultaneous release can only happen with free-flowing lines of communication among developers working together with common goals. Evolution Are we there yet? No. There is no “there”. The ALM story at Eclipse continues to evolve. It will always evolve. We still need to solve the Maven question, and projects are pushing us into the continuous integration space. And there may be more changes ahead. Who knows what the future will bring? Reference: Application Lifecycle Management at Eclipse from our JCG partner Wayne Beaton at the Eclipse Hints, Tips, and Random Musings blog....
java-interview-questions-answers

JNDI and JPA without J2EE Container

We wanted to test some JPA code with as simple setup as possible. The plan was to use only Java and Maven, without an application server or other J2EE container. Our JPA configuration needs two things to run successfully:database to store data, JNDI to access the database.This post has two parts. First part shows how to use standalone JNDI and an embedded in-memory database in test. Remaining chapters explain how the solution works. All used code is available on Github. If you are interested in the solution but do not want to read the explanations, download the project from Github and read only the first chapter. JPA Test This chapter shows how to use our code to enable standalone JNDI and embedded in-memory database in tests. How and why the solution works is explained in the rest of this post. The solution has three ‘API’ classes:JNDIUtil – JNDI initialization, clean up and some convenience methods, InMemoryDBUtil – database and data source creation/removal, AbstractTestCase – database clean up before first test and JNDI clean up before each test.We use Liquibase to maintain the database structure. If you do not wish to use Liquibase, you will have to customize the class InMemoryDBUtil. Tweak the method createDatabaseStructure to do what you need. Liquibase keeps list of all needed database changes in a file named changelog. Unless configured otherwise, each change runs only once. Even if the changelog file is applied multiple times to the same database. Usage Any test case extended from the AbstractTestCase will:drop the database before first test, install standalone JNDI or delete all data stored in it before each test, run Liquibase changelog against the database before each test.JPA test case must extend AbstractTestCase and override the getInitialChangeLog method. The method should return changelog file location. public class DemoJPATest extends AbstractTestCase {private static final String CHANGELOG_LOCATION = "src/test/java/org/meri/jpa/simplest/db.changelog.xml"; private static EntityManagerFactory factory;public DemoJPATest() { }@Override protected String getInitialChangeLog() { return CHANGELOG_LOCATION; }@Test @SuppressWarnings("unchecked") public void testJPA() { EntityManager em = factory.createEntityManager();Query query = em.createQuery("SELECT x FROM Person x"); List<Person> allUsers = query.getResultList(); em.close();assertFalse(allUsers.isEmpty()); }@BeforeClass public static void createFactory() { factory = Persistence.createEntityManagerFactory("Simplest"); }@AfterClass public static void closeFactory() { factory.close(); }}Note: it would be cleaner to drop the database before each test. However, drop and recreate db structure is costly operation. It would slow down the test case too much. Doing it only before the class seems to be reasonable compromise. While the database is dropped only once, the changelog is run before each test. It may seems like a waste, but this solution has some advantages. First, the getInitialChangeLog method does not have to be static and may be overridden in each test. Second, changes configured to ‘runAlways’ will run before each test and thus may contain some cheap clean up or other initialization. JNDI This chapter explains what JNDI is, how it is used and how to configure it. If you are not interested in theory, skip to the next chapter. Standalone JNDI is created there. Basic Usage JNDI allows clients to store and look up data and objects via a name. The data store is accessed through an implementation of the interface Context. The following code shows how to store data in JNDI: Context ctx = new InitialContext(); ctx.bind("jndiName", "value"); ctx.close();Second piece of code shows how to look for things in JNDI: Context ctx = new InitialContext(); Object result = ctx.lookup("jndiName"); ctx.close();Try to run the above without a J2EE container and you will get an error: javax.naming.NoInitialContextException: Need to specify class name in environment or system property, or as an applet parameter, or in an application resource file: java.naming.factory.initial at javax.naming.spi.NamingManager.getInitialContext(Unknown Source) at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source) at javax.naming.InitialContext.getURLOrDefaultInitCtx(Unknown Source) at javax.naming.InitialContext.bind(Unknown Source) at org.meri.jpa.JNDITestCase.test(JNDITestCase.java:16) at ...The code does not work because InitialContext class is not a real data store. The InitialContext class is only able to find another instance of the Context interface and delegates all work to it. It is not able to store data nor to find them. Context Factories The real context, the one that does all the work and is able to store/find data, has to be created by a context factory. This section shows how to create a context factory and how to configure InitialContext to use it. Each context factory must implement InitialContextFactory interface and must have a no-argument constructor: package org.meri.jpa.jndi;public class MyContextFactory implements InitialContextFactory {@Override public Context getInitialContext(Hashtable environment) throws NamingException { return new MyContext(); } }Our factory returns a simple context called MyContext. Its lookup method always returns a string “stored value”: class MyContext implements Context {@Override public Object lookup(Name name) throws NamingException { return "stored value"; }@Override public Object lookup(String name) throws NamingException { return "stored value"; }.. the rest ... }JNDI configuration is passed between classes in a hash table. The key always contains property name and the value contains the property value. As the initial context constructor InitialContext() has no parameter, an empty hash table is assumed. The class has also an alternative constructor which takes configuration properties hash table as a parameter. Use the property "java.naming.factory.initial" to specify context factory class name. The property is defined in Context.INITIAL_CONTEXT_FACTORY constant. Hashtable env = new Hashtable(); env.put(Context.INITIAL_CONTEXT_FACTORY, "className");Context ctx = new InitialContext(environnement);Next test configures MyContextFactory and checks whether created initial context returns “stored value” no matter what: @Test @SuppressWarnings({ "unchecked", "rawtypes" }) public void testDummyContext() throws NamingException { Hashtable environnement = new Hashtable(); environnement.put(Context.INITIAL_CONTEXT_FACTORY, "org.meri.jpa.jndi.MyContextFactory");Context ctx = new InitialContext(environnement); Object value = ctx.lookup("jndiName"); ctx.close();assertEquals("stored value", value); }Of course, this works only if you can supply hash table with custom properties to the initial context constructor. That is often impossible. Most libraries use no-argument constructor shown in the beginning. They assume that initial context class has default context factory available and that no-argument constructor will use that one. Naming Manager Initial context uses NamingManager to create a real context. Naming manager has a static method getInitialContext(Hashtable env) which returns an instance of a context. The parameter env contains a configuration properties used to build the context. By default, naming manager reads Context.INITIAL_CONTEXT_FACTORY from the env hash table and creates an instance of specified initial context factory. The factory method then creates a new context instance. If that property is not set, naming manager throws an exception. It is possible to customize naming managers behavior. The class NamingManager has method setInitialContextFactoryBuilder. If the initial context factory builder is set, naming manager will use it to create context factories. You can use this method only once. Installed context factory builder can not be changed. try { MyContextFactoryBuilder builder = new MyContextFactoryBuilder(); NamingManager.setInitialContextFactoryBuilder(builder); } catch (NamingException e) { // handle exception }Initial context factory builder must implement InitialContextFactoryBuilder interface. The interface is simple. It has only one method InitialContextFactory createInitialContextFactory(Hashtable env). Summary In short, initial context delegates a real context initialization to naming manager which delegates it to context factory. Context factory is created by an instance of initial context factory builder.Standalone JNDI We will create and install standalone JNDI implementation. The entry point to our standalone JNDI implementation is the class JNDIUtil. Three things are needed to enable JNDI without an application server:an implementation of Context and InitialContextFactory interfaces, an implementation of InitialContextFactoryBuilder interface, initial context factory builder installation and ability to clean all stored data.Context and Factory We took SimpleJNDI implementation from osjava project and modified it to suit our needs better. The project uses a new BSD license. Add SimpleJNDI maven dependency into pom.xml: simple-jndi simple-jndi 0.11.4.1SimpleJNDI comes with a MemoryContext context which lives exclusively in the memory. It requires almost no configuration and its state is never saved of loaded. It does almost what we need, except two things:its close() method deletes all stored data, each instance uses its own storage by default.Most libraries assume that the close method optimizes resources. They tend to call it each time they load or store data. If the close method deletes all data right after they have been stored, the context is useless. We have to extend the MemoryContext class and override the close method: @SuppressWarnings({"rawtypes"}) public class CloseSafeMemoryContext extends MemoryContext {public CloseSafeMemoryContext(Hashtable env) { super(env); }@Override public void close() throws NamingException { // Original context lost all data on close(); // That made it unusable for my tests. }}By convention, the builder/factory system creates new context instance for each use. If they do not share data, JNDI can not be used to transfer data between different libraries. Fortunately, this problem has also an easy solution. If the environnement hash table contains property "org.osjava.sj.jndi.shared" with value "true", created memory context will use common static storage. Therefore, our initial context factory creates CloseSafeMemoryContext instances and configures them to use common storage: public class CloseSafeMemoryContextFactory implements InitialContextFactory {private static final String SHARE_DATA_PROPERTY = "org.osjava.sj.jndi.shared";public Context getInitialContext(Hashtable environment) throws NamingException {// clone the environnement Hashtable sharingEnv = (Hashtable) environment.clone();// all instances will share stored data if (!sharingEnv.containsKey(SHARE_DATA_PROPERTY)) { sharingEnv.put(SHARE_DATA_PROPERTY, "true"); } return new CloseSafeMemoryContext(sharingEnv);; }}Initial Context Factory Builder Our builder acts almost the same way as the original naming manager implementation. If the property Context.INITIAL_CONTEXT_FACTORY is present in the incoming environment, specified factory is created. However, if this property is missing, the builder creates an instance of CloseSafeMemoryContextFactory. The original naming manager would throw an exception. Our implementation of the InitialContextFactoryBuilder interface: public InitialContextFactory createInitialContextFactory(Hashtable env) throws NamingException { String requestedFactory = null; if (env!=null) { requestedFactory = (String) env.get(Context.INITIAL_CONTEXT_FACTORY); }if (requestedFactory != null) { return simulateBuilderlessNamingManager(requestedFactory); } return new CloseSafeMemoryContextFactory(); }The method simulateBuilderlessNamingManager uses class loader to load requested context factory: private InitialContextFactory simulateBuilderlessNamingManager(String requestedFactory) throws NoInitialContextException { try { ClassLoader cl = getContextClassLoader(); Class requestedClass = Class.forName(className, true, cl); return (InitialContextFactory) requestedClass.newInstance(); } catch (Exception e) { NoInitialContextException ne = new NoInitialContextException(...); ne.setRootCause(e); throw ne; } }private ClassLoader getContextClassLoader() { return (ClassLoader) AccessController.doPrivileged(new PrivilegedAction() { public Object run() { return Thread.currentThread().getContextClassLoader(); } }); }Builder Installation and Context Cleaning Finally, we have to install context factory builder. As we wanted to use standalone JNDI in tests, we needed also a method to clean up all stored data between tests. Both is done inside the method initializeJNDI which will run before each test: public class JNDIUtil {public void initializeJNDI() { if (jndiInitialized()) { cleanAllInMemoryData(); } else { installDefaultContextFactoryBuilder(); } }}JNDI is initialized if the default context factory builder has been set already: private boolean jndiInitialized() { return NamingManager.hasInitialContextFactoryBuilder(); }Installation of the default context factory builder: private void installDefaultContextFactoryBuilder() { try { NamingManager.setInitialContextFactoryBuilder(new ImMemoryDefaultContextFactoryBuilder()); } catch (NamingException e) { //We can not solve the problem. We will let it go up without //having to declare the exception every time. throw new ConfigurationException(e); } }Use the original implementation of the method close in MemoryContext class to clean up stored data: private void cleanAllInMemoryData() { CleanerContext cleaner = new CleanerContext(); try { cleaner.close(); } catch (NamingException e) { throw new RuntimeException("Memory context cleaning failed:", e); } }class CleanerContext extends MemoryContext { private static Hashtable environnement = new Hashtable(); static { environnement.put("org.osjava.sj.jndi.shared", "true"); }public CleanerContext() { super(environnement); }}In-Memory Database Apache Derby is an open source relational database implemented in Java. It is available under Apache License, Version 2.0. Derby is able to run in embedded mode. Embedded database data are stored either on the filesystem or in the memory. Maven dependency for Derby: org.apache.derby derby 10.8.2.2Create DataSource Use an instance of the EmbeddedDatasource class to connect to the database. The data source will use an in-memory instance whenever the database name starts with “memory:”. Following code creates data source pointing to an instance of in-memory database. If the database does not exist yet, it will be created: private EmbeddedDataSource createDataSource() { EmbeddedDataSource dataSource = new EmbeddedDataSource(); dataSource.setDataSourceName(dataSourceJndiName); dataSource.setDatabaseName("memory:" + databaseName); dataSource.setCreateDatabase("create");return dataSource; }Drop Database The easiest way to clean up the database is to drop and recreate it. Create an instance of embedded data source, set the connection attribute “drop” to “true” and call its getConnection method. It will drop the database and throw an exception. private static final String DATABASE_NOT_FOUND = "XJ004";private void dropDatabase() { EmbeddedDataSource dataSource = createDataSource(); dataSource.setCreateDatabase(null); dataSource.setConnectionAttributes("drop=true");try { //drop the database; not the nicest solution, but works dataSource.getConnection(); } catch (SQLNonTransientConnectionException e) { //this is OK, database was dropped } catch (SQLException e) { if (DATABASE_NOT_FOUND.equals(e.getSQLState())) { //attempt to drop non-existend database //we will ignore this error return ; } throw new ConfigurationException("Could not drop database.", e); } }Database Structure We used Liquibase to create the database structure and test data. The database structure is kept in a so called changelog file. It is an xml file, but you can include DDL or SQL code if you do not feel like learning yet another xml language. Liquibase and its advantages are out of scope of this article. The most relevant advantage for this demo is its ability to run the same changelog against the same database multiple times. Each run applies only new changes to the database. If the file did not changed, nothing happens. You can add the changelog to the jar or war and run it on each application start up. That will ensure that the database is always updated to the latest version. No configuration or installation scripts are necessary. Add Liquibase dependency to the pom.xml: org.liquibase liquibase-core 2.0.3Following changelog creates one table named Person and puts one entry ‘slash – Simon Worth’ into it: <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd"><changeSet id="1" author="meri"> <comment>Create table structure for users and shared items.</comment><createTable tableName="person"> <column name="user_id" type="integer"> <constraints primaryKey="true" nullable="false" /> </column> <column name="username" type="varchar(1500)"> <constraints unique="true" nullable="false" /> </column> <column name="firstname" type="varchar(1500)"/> <column name="lastname" type="varchar(1500)"/> <column name="homepage" type="varchar(1500)"/> <column name="about" type="varchar(1500)"/> </createTable> </changeSet><changeSet id="2" author="meri" context="test"> <comment>Add some test data.</comment> <insert tableName="person"> <column name="user_id" valueNumeric="1" /> <column name="userName" value="slash" /> <column name="firstName" value="Simon" /> <column name="lastName" value="Worth" /> <column name="homePage" value="http://www.slash.blogs.net" /> <column name="about" value="I like nature and writing my blog. The blog contains my opinions about everything." /> </insert> </changeSet></databaseChangeLog>Liquibase use is pretty straightforward. Use data source to create new Liquibase instance, run its update method and handle all declared exceptions: private void initializeDatabase(String changelogPath, DataSource dataSource) { try { //create new liquibase instance Connection sqlConnection = dataSource.getConnection(); DatabaseConnection db = new DerbyConnection(sqlConnection); Liquibase liquibase = new Liquibase(changelogPath, new FileSystemResourceAccessor(), db);//update the database liquibase.update("test");} catch (SQLException e) { // We can not solve the problem. We will let it go up without // having to declare the exception every time. throw new ConfigurationException(DB_INITIALIZATION_ERROR, e); } catch (LiquibaseException e) { // We can not solve the problem. We will let it go up without // having to declare the exception every time. throw new ConfigurationException(DB_INITIALIZATION_ERROR, e); } }End Both standalone JNDI and embedded in-memory database are up and running each time we run our tests. While the JNDI set up is probably universal, the database construction will probably need project specific modifications. Feel free to download sample project from Github and use/modify whatever you find useful. Reference: Running JNDI and JPA Without J2EE Container from our JCG partner Maria Jurcovicova at the This is Stuff blog....
apache-lucene-logo

Lucene Overview Part One: Creating the Index

Introduction I’ve recently been working with the open source search engine Lucene. I’m no expert, but since I have just pored through some rather sparse documentation and migrated an application from a very old version of Lucene to the latest version, 2.4, I’m pretty clear on the big picture. The documentation for Lucene leaves a bit to the imagination, so I thought I’d take this opportunity to share a high level overview of Lucene while it’s fresh in my mind. If you find this page looking for introductory material to Lucene, good for you! That’s what it’s for. Don’t expect to find best practices, code samples or advanced topics. You will find a clear introduction to the conceptual architecture of Lucene, with which you will be able to productively approach the FAQ’s and tutorials on the project web site. I’m using the Java implementation of Lucene, but all of this high level stuff would apply equally to any of the other Lucene flavors. The first thing you should understand is what Lucene actually does. Lucene only does two things really.It creates search indexes. It searches for content in those indexes.An index is a efficiently navigable representation of what ever data you need to make searchable. Your data might be as simple as a set of Word documents in a content management system, or it might be records from a database, HTML pages, or any kind of data object in your system. It’s up to you to decide what entities you want to make searchable. For our discussion, we’ll assume that we are working with a set of Word documents. Create the Index So, step one is to create the index for our set of Word documents. To do this, we need to write some code that takes the information from the Word documents and turns them into a searchable index. The only way to do this is by brute force. We’ll have to iterate over each of the Word documents, examing each and converting each into the pieces that Lucene needs to work with when it creates the index. What are the pieces that Lucene needs to create the index? There are two.Documents FieldsThese two abstractions are so key to Lucene that Lucene represents them with two top level Java classes, Document and Field. A Document, not to be confused with our actual Word documents, is a Java class that represents a searchable item in Lucene. By searchable item, we mean that a Document is the thing that you find when you search. It’s up to you to create these Documents. Lucky for us, it’s a pretty clear step from an actual Word document to a Lucene Document. I think anyone would agree that it will be the Word documents that our users will want to find when they conduct a search. This makes our processing rather simple, we will simply create a single Lucene Document for each of our actual Word documents. Create the Document and its Fields But how do we do that? It’s actually very easy. First, we make the Document object, with the new operator — nothing more. But at this point the Document is meaningless. We now have to decide what Fields to add to the Document. This is the part where we have to think. A Document is made of any number of Fields, and each Field has a name and a value. That’s all there is to it. Two fields are created almost universally by developers creating Lucene indexes. The most important field will be the “content” field. This the Field that holds the content the Word document for which we are creating the Lucene Document. Bear in mind, the name of the Field is entirely arbitrary, but most people call one of the Fields “content” and they stick the actual content of the real world searchable object, the Word document in our case, into the value of that Field. In essense, a Field is just a name: value pair. Another very common Field that developers create is the “title” Field. This field’s value will be the title of the Word document. What other information about the Word document might we want to keep in our index. Other common fields are things like “author”, “creation_date”, “keywords”, etc. The identification of the fields that you will need is entirely driven by your business requirements. So, for each Word document that we want to make searchable, we will have to create a Lucene Document, with Fields such as those we outlined above. Once we have created the Document with those Fields, we then add it the Lucene index writer and ask it to write our Index. That’s it! We now have a searchable index. This is true, but we may have glossed over a couple of Field details. Let’s take a closer look at Fields. Field Details: Stored or Indexed? A Field may be kept in the index in more than one way. The most obvious way, and perhaps the only way that you might at first suspect the existence of, is the searchable way. In our example, we fully expect that if the user types in a word that exists in the contents of one of the Word documents, then the search will return that Word document in the search results. To do this, Lucene must index that Field. The nomenclature is a bit confusing a first, but, note, it is entirely possible to “store” a Field in the index without making it searchable. In other words, it’s possible to “store” a Field but not “index” it. Why? You’ll see shortly. The first distiniction that Lucene makes between the way it can keep a Field in the index is whether it is stored or indexed. If we expect a match on a Field’s value to cause the Document to be hit by the search, then we must index the Field. If we only store the Field, it’s value can’t be reached by the search queries. Why then store a Field? Simple, when we hit the Document, via one of the indexed fields, Lucene will return us the entire Document object. All stored Fields will be available on that Document object; indexed Fields will not be on that object. An indexed Field is information used to find an Document, a stored Field is information returned with the Document. Two different things. This means that while we might not make searches based upon the contents of a given Field, we might still be able to make use of that Field’s value when the Document is returned by the search. The most obvious use case I can think of is a “url” Field for a web based Document. It makes no sense to search for the value of aURL, but you will definitely want to know the URL for the documents that your search returns. How else would your results page be able to steer the user to the hit page? This is a very important point: a stored Field’s value will be available on the Document returned by a search, but only an indexed Field’s value can actually be used as the target of a search. Technically, stored Fields are kept within the Lucene index. But we must keep track of the fact that an indexed Field is different than a stored Field. Unfortunate nomenclature. This is why words matter. They can save on a lot of confusion. Indexed Fields: Analyzed or Not Analyzed? For the next wrinkle, we must point out that an indexed Field can be indexed in two different fashions. First, we can index the value of the Field in a single chunk. In other words, we might have a “phone number” Field. When we search for phone numbers, we need to match the entire value or nothing. This makes perfect sense. So, for a Field like phone number, we index the entire value ATOMICALLY into the Lucene index. But let’s consider the “content” Field of the Word document. Do we want the user to have to match that entire Field? Certainly not. We want the contents of the Word document to be broken down into searchable tokens. This process is know as analyzation. We can start by throwing out all of the unimportant words like, “a”, “the”, “and”, etc. There are many other optimizations we can make, but the bottom line is that the content of a Field like “contents” should be analyzed by Lucene. This produces a targeted lightweight index. This is how search becomes efficient and powerful. In the APIs, this comes down to the fact that when we create a Field, we must specifyWhether to STORE it or not Whether to INDEX it or notIf indexing, whether to ANALYZE it or notNow, you should be clear on the details of Fields. Importantly, we can both store and index a given Field. It’s not an either or choice. Creating the Index When we have added all the Documents to the index, we simply tell the index writer to create the index. From this point on we can search according to the indexed Fields for any of our Documents. Look for an upcoming entry to give a high level overview of the searching for things in a Lucene index. Parting Note Recall that we said it would be simpler to assume that our target data was a set of Word documents. Now that we’ve finished, consider that your target data can be anything. In reality, it’s the Lucene Documents that are searched. And you can create these from anything you want. They can, and frequently do, come from an aggregation of real world data objects. Again, what data will go into your Lucene Documents is up to your business requirements. It can be as simple as a one-to-one mapping of Word documents to Lucene Documents, or each Lucene Document can be the aggregate of a variety of database queries and anything else you might find laying around. Happy indexing! Reference: Lucene Overview Part One: Creating the Index from our W4G partner Chad Davis at the zeroInsertionForce blog....
json-logo

MOXy as Your JAX-RS JSON Provider – Client Side

Recently I posted how to leverage EclipseLink JAXB (MOXy)‘s JSON binding to create a RESTful service.  In this post I will demonstrate how easy it is to take advantage of MOXy’s JSON binding on the client side.MOXy as Your JAX-RS JSON Provider – Server Side MOXy as Your JAX-RS JSON Provider – Client SideURI This post will focus on the following URI from the service we declared in the previous post.  The following call will return a list of customers that live in “Any Town”. http://localhost:8080/CustomerService/rest/customers/findCustomersByCity/Any%20TownJava SE Client APIs In the first example we will use the standard Java SE 6 APIs.  Some interesting items to note:MOXy can directly marshal (line 35) and unmarshal (line 28) collections to/from JSON arrays without requiring a wrapper object. There are no compile time dependencies on MOXy (it is a run time dependency). The eclipselink.media-type property is used to enable JSON binding on the unmarshaller (line 25) and marshaller (line 33). The eclipselink.json.include-root property is used to indicate that the @XmlRootElement annotation should be ignored in the JSON binding (lines 26 and 34).package example;import java.io.InputStream; import java.net.*; import java.util.List; import javax.xml.bind.*; import javax.xml.transform.stream.StreamSource; import org.example.Customer;public class JavaSEClient {private static final String MEDIA_TYPE = "application/json";public static void main(String[] args) throws Exception { String uri = "http://localhost:8080/CustomerService/rest/customers/findCustomersByCity/Any%20Town"; URL url = new URL(uri); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setRequestProperty("Accept", MEDIA_TYPE);JAXBContext jc = JAXBContext.newInstance(Customer.class);Unmarshaller unmarshaller = jc.createUnmarshaller(); unmarshaller.setProperty("eclipselink.media-type", MEDIA_TYPE); unmarshaller.setProperty("eclipselink.json.include-root", false); InputStream xml = connection.getInputStream(); List<Customer> customers = (List<Customer>) unmarshaller.unmarshal(new StreamSource(xml), Customer.class).getValue(); connection.disconnect();Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.setProperty("eclipselink.media-type", MEDIA_TYPE); marshaller.setProperty("eclipselink.json.include-root", false); marshaller.marshal(customers, System.out); }} Output Below is the output from running the Java SE client.  For those that may have used a JAXB ( JSR-222) implementation with something like Jettison to produce/consume JSON, the following are some interesting items to note:MOXy renders collections as JSON arrays. MOXy represents the numeric values correctly without quotes (line 26). MOXy surrounds collections of size 1 correctly with square brackets (lines 28 and 32).[ { "address" : { "city" : "Any Town", "id" : 1, "street" : "1 A Street" }, "firstName" : "Jane", "id" : 1, "lastName" : "Doe", "phoneNumbers" : [ { "id" : 2, "num" : "555-2222", "type" : "HOME" }, { "id" : 1, "num" : "555-1111", "type" : "WORK" } ] }, { "address" : { "city" : "Any Town", "id" : 10, "street" : "456 Another Road" }, "firstName" : "Sue", "id" : 10, "lastName" : "Jones", "phoneNumbers" : [ { "id" : 10, "num" : "555-3333", "type" : "WORK" } ] } ]Jersey Client APIs JAX-RS 2.0 ( JSR-339) is working on standardizing the client APIs.  With JAX-RS 1.0 many of the implementations provide their own version.  Below is an example using the client APIs provided by Jersey.  Note how we can leverage the exact same MessageBodyReader/ Writer that we used on the server side (line 14, refer to MOXy as Your JAX-RS JSON Provider – Server Side).  I have also specified the LoggingFilter (line 17) so we can take a closer look at the message. package example;import java.util.List; import org.example.Customer; import org.example.MOXyJSONProvider; import com.sun.jersey.api.client.*; import com.sun.jersey.api.client.config.*; import com.sun.jersey.api.client.filter.LoggingFilter;public class JerseyClient {public static void main(String[] args) { ClientConfig cc = new DefaultClientConfig(); cc.getClasses().add(MOXyJSONProvider.class);Client client = Client.create(cc); client.addFilter(new LoggingFilter());WebResource resource = client.resource("http://localhost:8080/CustomerService/rest/customers"); List<Customer> customers = resource.path("/findCustomersByCity/Any%20Town").accept("application/json").get(new GenericType<List<Customer>>(){});for(Customer customer : customers) { System.out.println(customer.getFirstName()); } }}Output Below is the output from running the Jersey client. 14-Mar-2012 4:08:12 PM com.sun.jersey.api.client.filter.LoggingFilter log INFO: 1 * Client out-bound request 1 > GET http://localhost:8080/CustomerService/rest/customers/findCustomersByCity/Any%20Town 1 > Accept: application/json 1 >14-Mar-2012 4:08:12 PM com.sun.jersey.api.client.filter.LoggingFilter log INFO: 1 * Client in-bound response 1 < 200 1 < Transfer-Encoding: chunked 1 < Date: Wed, 14 Mar 2012 20:08:12 GMT 1 < Content-Type: application/json 1 < X-Powered-By: Servlet/3.0 JSP/2.2 (GlassFish Server Open Source Edition 3.1.1 Java/Oracle Corporation/1.7) 1 < Server: GlassFish Server Open Source Edition 3.1.1 1 < [{"address" : {"city" : "Any Town", "id" : 1, "street" : "1 A Street"}, "firstName" : "Jane", "id" : 1, "lastName" : "Doe", "phoneNumbers" : [{"id" : 1, "num" : "555-1111", "type" : "WORK"}, {"id" : 2, "num" : "555-2222", "type" : "HOME"}]}, {"address" : {"city" : "Any Town", "id" : 10, "street" : "456 Another Road"}, "firstName" : "Sue", "id" : 10, "lastName" : "Jones", "phoneNumbers" : [{"id" : 10, "num" : "555-3333", "type" : "WORK"}]}]Doe, Jane Jones, SueFurther Reading If you enjoyed this post then you may also be interested in:RESTful ServicesMOXy as Your JAX-RS JSON Provider  – Server Side Creating a RESTful ServicePart 1 – The DatabasePart 2 – Mapping the Database to JPA EntitiesPart 3 – Mapping JPA entities to XML (using JAXB)Part 4 – The RESTful ServicePart 5 – The ClientMOXy’s XML Metadata in a JAX-RS Service JSON BindingJSON Binding with EclipseLink MOXy – Twitter Example Binding to JSON & XML – Geocode ExampleReference: MOXy as Your JAX-RS JSON Provider – Client Side from our JCG partner Blaise Doughan at the Java XML & JSON Binding blog....
oracle-weblogic-logo

Cost of Ownership Analysis: Oracle WebLogic Server vs. JBoss

A very interesting whitepaper by Crimson Consulting Group comparing the cost of ownership between Weblogic and JBoss. Although JBoss is free, the whitepaper has serious claims that on the long run Weblogic is cheaper. Although, this study was sponsored by Oracle, it seems to be very serious and definitely is worthy of a look. Below some interesting parts from the whitepaper: JBoss costs more than WebLogic Server after 2 years and as much as 35% more over 5 years. Key TakeawaysJBoss is 35% more costly than WebLogic Server over 5 years, despite its free license. Oracle WebLogic Server becomes less expensive on a TCO basis within two years from acquisition – an advantage that continues to grow with every year of operation. Software licensing is a small portion of the total cost of ownership; people costs in operations drive the bulk of long-term costs. Other issues, such as performance, time-to-value, and customized infrastructure, can have a significant impact on the overall business ROI of an application server deployment.Figure 1 illustrates how small the initial costs of Acquisition and Implementation are with respect to the total 5-year costs of an application server deployment. The savings created by not paying for a software license are more than offset by having to invest in employees and consultants for implementation, development of custom scripts and utilities, configuring and testing other open source components, and managing and monitoring the JBoss environment. Description of Cost Categories Included in Research and Analysis Category Description Acquisition This category includes the hard costs for purchase of the application server software and for the hardware platform(s) to run it. Implementation This category includes the labor costs for implementation, installation, configuration, and testing of the application servers and the related infrastructure. Ongoing Application Deployment &amp;amp; Testing Costs This category includes the ongoing labor costs for deploying custom applications from test and staging environments to production environments. It also includes the ongoing interoperability testing and periodic testing for new releases and updates to the application servers and other infrastructure components. Ongoing Vendor Support Costs This category includes the hard costs for annual subscription support or maintenance agreements for the application server software, as well as for any additional software required. Ongoing Administration &amp;amp; Management Costs This category includes the ongoing labor costs to configure, manage, and maintain the application servers and the related infrastructure. Ongoing Monitoring, Diagnostics, Tuning Costs This category includes the ongoing labor costs to monitor, tune, and optimize the application servers. Other Cost Considerations This category includes cost considerations identified in the study but not necessarily included in the cost of ownership model. This includes the cost of unplanned downtime, time to market, and backward compatibility considerations.Table 3 outlines the pro forma costs for a typical application server deployment, consisting of 5 server hosts (server blades with two dual-core processors each), running an average of 4 application server instances per host (one instance per core). The acquisition and on-going costs in Table 3 reflect current list prices for hardware, software and support, less an average discount of 25%, while the people costs for implementation, deployment, testing, administration, and management are based on the results of Crimson’s primary research and resulting cost model.Table 4 shows the total costs of implementing, configuring, and customizing the two application servers. Key takeaways are:JBoss implementation costs more than twice as much as WebLogic Server implementation. By the end of the implementation phase, the cost of JBoss (inclusive of acquisition cost) is within 33% of the cost of WebLogic Server and operations haven’t started yet. Though we haven’t tried to quantify the business cost of the delay in time-to-value associated with an extra 8.5 weeks of effort, it could clearly be substantial.Check the whitepaper for more like Ongoing Operations which involve:Application Deployment and Infrastructure Testing Costs Application Server Administration, Management, Monitoring, and Tuning Costs Monitoring, Diagnostics, and Tuning Costs.Conclusion“Out-of-the-box” configuration and implementation tools are more mature, robust, and efficient for WebLogic Server than for JBoss, with the result that time-to-value is faster, the customization needs lower, and the costs lesser than with JBoss. Similarly, out-of-the-box administration, management, and tuning tools have been through as many development cycles as the core software and are consequently more complete and more productive than their equivalents in the JBoss environment. Oracle takes on the responsibilities and costs of maintaining performance and backwards-integration as the software evolves; users of JBoss take on those responsibilities for themselves.All these factors combine, with additional software-specific performance issues, to give a very different picture of the total cost of ownership in comparison to the initial acquisition costs. In fact, Crimson’s analysis indicates Oracle WebLogic Server becomes less expensive on a TCO basis within two years from acquisition – an advantage that continues to grow with every year of operation. Over a 3-to-5 year time horizon, the TCO of Red Hat JBoss becomes as much as 35 percent more than WebLogic Server, in spite of its lower acquisition cost. Reference: Cost of Ownership Analysis: Oracle WebLogic Server vs. JBoss from our JCG partner Spyros Doulgeridis at the ADF & Weblogic How To blog....
findbugs-logo

Can you get more out of Static Analysis?

When it comes to static analysis, Bill Pugh, software researcher and the father of Findbugs (the most popular static analysis tool for Java), is one of the few experts who is really worth listening to. He’s not out to hype the technology for commercial gain (Findbugs is a free, Open Source research project), and he provides a balanced perspective based on real experience working with lots of different code bases, including implementing Findbugs at Google. His recent presentation on the effective use of static analysis provides some useful lessons: Development is a zero sum game Any time spent reviewing and fixing bugs is time taken away from designing and implementing new features, or improving performance, or working with customers to understand the business better, or whatever else may be important. In other words: “you shouldn’t try to fix everything that is wrong with your code” At Google, they found thousands of real bugs using Findbugs, and the developers fixed a lot of them. But none of these bugs caused significant production problems. Why? Static analysis tools are especially good at finding stupid mistakes, but not all of these mistakes matter. What we need to fix is the small number of very scary bugs, at the “intersection of stupid and important”. Working with different static analysis tools over the past 5+ years, we’ve found some real bugs, some noise, and a lot of other “problems” that didn’t turn out to be important. Like everyone else, we’ve tuned the settings and filtered out checks that aren’t important or relevant to us. Each morning a senior developer reviews the findings (there aren’t many), tossing out any false positives and “who cares” and intentional (“the tool doesn’t like it but we do it on purpose and we know it works”) results. All that is left are a handful of real problems that do need to be fixed each month, and a few more code cleanup issues that we agree are worth doing (the code works, but it could be written better). Another lesson is that finding old bugs isn’t all that important or exciting. If the code has been running for a long time without any serious problems, or if people don’t know about or are willing to put up with the problems, then there’s no good reason to go back and fix them – and maybe some good reasons not to. Fixing old bugs, especially in legacy systems that you don’t understand well, is risky: there’s a 5-30% chance of introducing a new bug while trying to fix the old one. And then there’s the cost and risks of rolling out patches. There’s no real pay back. Unless of course, you’ve been looking for a “ghost in the machine” for a long time and the tool might have found it. Or the tool found some serious security vulnerabilities that you weren’t aware of. The easiest way to get developers to use static analysis is to focus on problems in the code that they are working on now – helping them to catch mistakes as they are making them. It’s easy enough to integrate static analysis checking into Continuous Integration and to report only new findings (all of the commercial tools that I have looked at can do this, and Findbugs does this as well). But it’s even better to give immediate feedback to developers – this is why commercial vendors like Klocwork and Coverity are working on providing close-to-immediate feedback to developers in the IDE, and why built-in checkers in IDEs like IntelliJ are so useful. Getting more out of static analysis Over a year ago my team switched to static analysis engines for commercial reasons. We haven’t seen a fundamental difference between using one tool or the other, other than adapting to minor differences in workflow and presentation – each tool has its own naming and classification scheme for the same set of problems. The new tool finds some bugs the previous one didn’t, and it’s unfortunately missing a few checks that we used to rely on, but we haven’t seen a big difference in the number or types of problems found. We still use Findbugs as well, because Findbugs continues to find problems that the commercial engines don’t, and it doesn’t add to the cost of checking – it’s easy to see and ignore any duplicate findings. Back in 2010 I looked at the state of static analysis tools for Java and concluded that the core technology had effectively matured – that vendors had squeezed as much as they could from static analysis techniques, and that improvements from this point on would be on better packaging and feedback and workflow, making the tools easier to use and understand. Over the past couple of years that’s what has happened. The latest versions of the leading tools provide better reporting and management dashboards, make it easier to track bugs across branches and integrate with other development tools, and just about everything is now available in the Cloud. Checking engines are getting much faster, which is good when it comes to providing feedback to developers. But the tools are checking for the same problems, with a few tweaks here and there. Speed changes how the tools can be used by developers, but doesn’t change what the tools can do. Based on what has happened over the past 2 or 3 years, I don’t expect to see any significant improvements in static analysis bug detection for Java going forward, in the kinds of problems that these tools can find – at least until/if Oracle makes some significant changes to the language in Java 8 or 9 or something and we’ll need new checkers for new kinds of mistakes. Want more? Do it yourself… Bill Pugh admits that Findbugs at least is about as good as it is going to get. In order to find more bugs or find bugs more accurately, developers will need to write their own custom rules checkers. Most if not all of the static analysis tools let you write your own checkers, using different analysis functions of their engines. Gary McGraw at Cigital agrees that a lot of the real power in static analysis comes from from writing your own detectors: In our experience, organizations obtain the bulk of the benefit in static analysis implementations when they mature towards customization. For instance, imagine using your static analysis tool to remind developers to use your secure-by-default web portal APIs and follow your secure coding standards as part of their nightly build feedback. (Unfortunately, the bulk of the industry’s experience remains centered around implementing the base tool.)If tool providers can make it simple and obvious for programmers to write their own rules, it opens up possibilities for writing higher-value, project-specific and context-specific checks. To enforce patterns and idioms and conventions. Effectively, more design-level checking than code-level checking. Another way that static analysis tools can be extended and customized is by annotating the source code. Findbugs, Intellij, Coverity, Fortify, Klocwork (and other tools I’m sure) allow you to improve the accuracy of checkers by annotating your source code to include information that the tools can use to help track control flow or data flow, or to suppress checks on purpose. If JSR-305 gets momentum ( it was supposed to make it into Java 7, but didn’t) and tool suppliers all agree to follow common annotation conventions, it might encourage more developers to try it out. Otherwise you need to make changes to your code base tied to a particular tool, which is not a good idea. But is it worth it? It takes a lot of work to get developers to use static analysis tools and fix the bugs that the tools find. Getting developers to take extra time to annotate code or to understand and write custom code checkers is much more difficult, especially with the state of this technology today. It demands a high level of commitment and discipline and strong technical skills, and I am not convinced that the returns will justify the investment. We’re back to the zero sum problem. Yes, you will probably catch some more problems with customized static analysis tools, and you’ll have less noise to filter through. But you will get a much bigger return out of getting the team to spend that time on code reviews or pairing, or more time on design and prototyping, or writing better tests. Outside of high-integrity environments and specialist work done by consultants, I don’t see these ideas being adopted or making a real impact on software quality or software security. Reference: Can you get more out of Static Analysis? from our JCG partner Jim Bird at the Building Real Software blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close