Featured FREE Whitepapers

What's New Here?

software-development-2-logo

SQL tooling, the ranking

When you need to get up and running quickly with your database, the tooling becomes very important. When developing jOOQ and adding integrations for new databases, I really love those ones that provide me with simple ways to create new databases, schemata, users, roles, grants, whatever is needed, using simple dialogs where I can click next next next. After all, we’re in the 21st century, and I don’t want to configure my software with punchcards anymore. Database tooling categories So with jOOQ development, I’ve seen a fair share of databases and their toolings. I’d like to divide them into three categories. Please note, that this division is subjective, from the point of view of jOOQ development. With most of these databases, I have no productive experience (except Oracle and MySQL). Things may change drastically when you go into production. So here are the categories: The “all-you-can-wish-for” ones These friends of mine ship with excellent tooling already integrated into their standard deliverable for free. It is easy to start the tooling and use it right away, without any configuration. The tooling is actually an intuitive rich client and I don’t have to read thousands of manual pages and google all around, or pay extra license fees to get the add-on. This category contains (in alphabetical order):CUBRID with its Eclipse-RCP based CUBRID Manager. This is a very nice tool for a newcomer. DB2 with its Eclipse-RCP based IBM Data Studio. IBM created Eclipse. It would’ve been a shame if they hadn’t created the Data Studio. Postgres with pgAdminIII. Very very nice looking and fast. SQL Server with its SQL Server Management Studio. This is probably the most complete of all. You can lose yourself in its myriads of properties and configuration popups. Sybase SQL Anywhere and Sybase ASE, both share the same tooling called Sybase Central. It looks a bit out of date, but all administrative operations can be done easily.The ones with sufficient tooling These databases have tooling that is “sufficient”. This means that they ship with some integrated scripting-enabled console. Some of them are also generally popular, such that there exist free open source tools to administer those databases. This includes MySQL and Oracle. Here are to “OK” ones:H2. Its web-based console is actually quite nice-looking. It features DHTML-based auto-completion and scripting. I can live with that. Ingres. This dinosaur seems not to have upgraded UI components since Windows 95, but it works as good as it has to. MySQL, with phpMyAdmin. This is a very nice, independent, open source PHP application for MySQL administration. You can install it easily along with MySQL using XAMPP, a nice Apache, MySQL, PHP, Perl distribution. Yes, I like installing complete things using the next next next pattern! Oracle. It has sql*plus for scripting and there are many commercial and open source products with user interfaces. My favourite ones are Toad and Toad Extensions, a really nice and free Eclipse plugin. It is worth mentioning, that if you pay the extra license fee, you will have access to Oracle Enterprise Manager and other very very fancy tools. With money, you clearly can’t complain here.The other ones… Here, you’re back to loading *.sql files with DDL all along. No help from the vendors, here.Derby. I’m not aware of any tooling. Correct me if I’m wrong HSQLDB. Its integrated console can execute SQL, but it doesn’t provide syntax highlighting, checking, autocompletion, etc. I’m probably better off using SQuirreL SQL, or any other generic SQL tool. SQLite. Good luck there! This database is really minimal!Screenshots (ordered by database, alphabetically)            Reference: SQL tooling, the ranking from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....
apache-maven-logo

Better looking HTML test reports for TestNG with ReportNG – Maven guide

TestNG is a testing framework created as an annotation driven alternative for JUnit 3 in times when “extends TestCase” was a indispensable part of writing tests. Even now it provides some interesting features like data providers, parallel tests or test groups. In the situation our tests are not executed from IDE it’s often useful to take a look at test result in HTML report. The original TestNG report looks… raw. What is more they are not very intuitive and readable. There is an alternative – ReportNG. It provides better looking and more lucid HTML test reports. More information about ReportNG can be found at its webpage, but when I tried to use for my AppInfo library in Maven builds running from CI server I had a problem to find any at a glance guide how to use it with Maven. Fortunately there are samples for Ant and Gradle, so I was able to figure it out, but I hope with this post everyone wanting to use ReportNG with Maven will be able to achieve it without any problem within a few minutes. First, additional dependency has to be added to pom.xml: <dependencies> <dependency> <groupId>org.uncommons</groupId> <artifactId>reportng</artifactId> <version>1.1.2</version> <scope>test</scope> <exclusions> <exclusion> <groupId>org.testng</groupId> <artifactId>testng</artifactId> </exclusion> </exclusions> </dependency> (...) </dependencies> Usually in our project newer TestNG version is used, so that ReportNG dependency should be excluded. Next, Surefire plugin has to be configured: <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.5</version> <configuration> <properties> <property> <name>usedefaultlisteners</name> <value>false</value> </property> <property> <name>listener</name> <value>org.uncommons.reportng.HTMLReporter, org.uncommons.reportng.JUnitXMLReporter</value> </property> </properties> <workingDirectory>target/</workingDirectory> </configuration> </plugin> (...) </plugins> </build> ReportNG uses two reporters pluggable into TestNG. JUnitXMLReporter generates XML summarize of running tests. It’s used for tools (like CI server). HTMLReporter creates human readable HTML report. Default TestNG listeners should be disabled. After test run I added also workingDirectory property which causes that velocity.log (file created by Velocity engine used internally by ReportNG) is placed in target instead of main project directory (and therefor is deleted by “mvn clean” command). One more thing. Unfortunately ReportNG jar isn’t available in Maven Central Repository, so could be required to add java.net repository in your settings.xml. <repositories> <repository> <id>java-net</id> <url>http://download.java.net/maven/2</url> </repository> (...) </repositories> That’s all. Now “mvn clean test” should generate nice looking HTML report for lots of tests covering our project. Reference: Better looking HTML test reports for TestNG with ReportNG – Maven guide from our JCG partner Marcin Zajaczkowski at the Solid Soft blog....
apache-hadoop-mapreduce-logo

Joins with Map Reduce

I have been reading on Join implementations available for Hadoop for past few days. In this post I recap some techniques I learnt during the process. The joins can be done at both Map side and Join side according to the nature of data sets of to be joined. Reduce Side Join Let’s take the following tables containing employee and department data.Let’s see how join query below can be achieved using reduce side join. SELECT Employees.Name, Employees.Age, Department.Name FROM Employees INNER JOIN Department ON Employees.Dept_Id=Department.Dept_IdMap side is responsible for emitting the join predicate values along with the corresponding record from each table so that records having same department id in both tables will end up at on same reducer which would then do the joining of records having same department id. However it is also required to tag the each record to indicate from which table the record originated so that joining happens between records of two tables. Following diagram illustrates the reduce side join process.Here is the pseudo code for map function for this scenario. map (K table, V rec) {dept_id = rec.Dept_Idtagged_rec.tag = tabletagged_rec.rec = recemit(dept_id, tagged_rec)}At reduce side join happens within records having different tags. reduce (K dept_id, list<tagged_rec> tagged_recs) {for (tagged_rec : tagged_recs) {for (tagged_rec1 : taagged_recs) {if (tagged_rec.tag != tagged_rec1.tag) {joined_rec = join(tagged_rec, tagged_rec1)} emit (tagged_rec.rec.Dept_Id, joined_rec)}}Map Side Join (Replicated Join) Using Distributed Cache on Smaller Table For this implementation to work one relation has to fit in to memory. The smaller table is replicated to each node and loaded to the memory. The join happens at map side without reducer involvement which significantly speeds up the process since this avoids shuffling all data across the network even though most of the records not matching are later dropped. Smaller table can be populated to a hash-table so look-up by Dept_Id can be done. The pseudo code is outlined below. map (K table, V rec) {list recs = lookup(rec.Dept_Id) // Get smaller table records having this Dept_Idfor (small_table_rec : recs) {joined_rec = join (small_table_rec, rec)}emit (rec.Dept_id, joined_rec)}Using Distributed Cache on Filtered Table If the smaller table doesn’t fit the memory it may be possible to prune the contents of it if filtering expression has been specified in the query. Consider following query. SELECT Employees.Name, Employees.Age, Department.Name FROM Employees INNER JOIN Department ON Employees.Dept_Id=Department.Dept_Id WHERE Department.Name="Eng"Here a smaller data set can be derived from Department table by filtering out records having department names other than “Eng”. Now it may be possible to do replicated map side join with this smaller data set. Replicated Semi-Join Reduce Side Join with Map Side Filtering Even of the filtered data of small table doesn’t fit in to the memory it may be possible to include just the Dept_Id s of filtered records in the replicated data set. Then at map side this cache can be used to filter out records which would be sent over to reduce side thus reducing the amount of data moved between the mappers and reducers. The map side logic would look as follows. map (K table, V rec) {// Check if this record needs to be sent to reducer boolean sendToReducer = check_cache(rec.Dept_Id) if (sendToReducer) { dept_id = rec.Dept_Idtagged_rec.tag = tabletagged_rec.rec = recemit(dept_id, tagged_rec) } }Reducer side logic would be same as the Reduce Side Join case. Using a Bloom Filter A bloom filter is a construct which can be used to test the containment of a given element in a set. A smaller representation of filtered Dept_ids can be derived if Dept_Id values can be augmented in to a bloom filter. Then this bloom filter can be replicated to each node. At the map side for each record fetched from the smaller table the bloom filter can be used to check whether the Dept_Id in the record is present in the bloom filter and only if so to emit that particular record to reduce side. Since a bloom filter is guaranteed not to provide false negatives the result would be accurate. Reference: Joins with Map Reduce from our JCG partner Buddhika Chamith at the Source Open blog....
jsf-logo

Spring & JSF integration: Dynamic Navigation

Often your JSF application will need to move beyond basic static navigation and start to make dynamic navigation decisions. For example, you may want to redirect users based on their age. Most JSF tutorials recommend that dynamic navigation is implemented by binding the action attribute of a command to a backing bean: <h:commandButton action="#{bean.actionBasedOnAge}"/>public String actionBasedOnAge() { if(age &lt; 12) { return "fetchadult"; } else { return "ok" } }The example above shows how anyone under twelve is directed to 'fetchadult' instead of the usual 'ok'. Both the 'fetchadult' and 'ok' outcomes will need to have navigation rules defined in the faces-config.xml so that JSF knows what actual page to display. When working with Spring MVC it is often more natural to have navigation logic contained in the @Controller bean. To help with this, implicit 'controller' and 'handler' variables are available when rendering JSF from MVC. The 'controller' variable provides access to the controller bean that was mapped to the original request, and the 'handler' variable to the underling MVC handler. In Spring 3.0 'controller' and 'handler' are generally the same object. In Spring 3.1 however, the underlying MVC architecture is changing and 'handler' will generally be a org.springframework.web.method.HandlerMethod instance. Here is a submit button that references the someNavigation() method of the the @Controller: <h:commandButton action="#{controller.someNavigation"/>Whilst accessing the controller bean is useful, it is not the ideal solution. I prefer to use logical names in my JSF pages and map those the Java methods. I also want an easy way to get back to data from the underlying model. The @NavigationMapping annotation provides another, more flexible approach to handling navigation. It works in a very similar way to @RequestMappings. The annotation can be placed on any public method in your @Controller to map navigation outcomes to destinations. <h:commandButton action="submit"/>@NavigationMapping public String onSubmit() { return "redirect:http://www.springsource.org"; }If you need access to a backing bean the standard Spring @Value annotation can be used. Any EL expression that the page can resolve can also be used on a navigation method parameter. @NavigationMapping public String onSubmit(@Value("#{person.age}") int age) { ... }Accessing model elements is even easier. As long as you only have a single object of the type you want to access in your model, and it is not a simple type (int, String, etc), you don’t need any annotations: @NavigationMapping public String onSubmit(Person p) { ... }Other argument types can also be used (see the JavaDoc for a complete list). For example, here is a navigation mapping that handles 'submit', 'cancel' and 'save' outcomes. The injected arguments tell us the which of the three outcomes was clicked and provides access to the source UIComponent. @NavigationMapping('submit','cancel','save') public String handleNavigation(String outcome, UIComponent source) { ... }Return types are also equally flexible. You can return view names as Strings, you can also use the same "@hotelsController.show" notation that I have previously blogged about. You can also return View objects directly or you can use NavigationOutcome if you want to include implicit model items. Finally, if you just want to render an immediate response you can use the @ResponseBody annotation or return a HttpEntity. This works in exactly the same way as Spring. Reference: Integrating Spring & JavaServer Faces : Dynamic Navigation from our JCG partner Phillip Webb at the Phil Webb’s Blog blog....
java-logo

Implementing the State Machine Pattern as a Stream Processor

In my last blog, I said that I really thought that some of the Gang Of Four (GOF) patterns were becoming somewhat obsolete, and if not obsolete then certainly unpopular. In particular I said that StateMachine wasn’t that useful as you can usually think of another, simpler way of doing whatever it is you’re doing rather than using it. In order to make amends, both for preaching obsolescence and for the hideous ‘C’ code that I attached to the end of my last blog, I thought that I’d demonstrate the use of StateMachine in converting Twitter tweets into HTML. The scenario, just for once, isn’t contrived or far fetched, but something that I had to do the other day. In this scenario I have an app that’s just down loaded a bunch of timeline tweets for an authenticated Twitter user. Having parse the XML (or JSON) and got hold of the tweets I needed to format them for display. The problem was that they’re in plain text and I needed to convert them into HTML, adding anchor tags along the way to produce something similar to the way that twitter does when it formats the same thing on your twitter home page. Just for reference, a users Tweets can be retrieved using the Twitter API via the following URL: <a href="https://api.twitter.com/1/statuses/user_timeline.xml?include_entities=true&include_rts=true&screen_name=BentleyMotors&count=2" target="new">https://api.twitter.com/1/statuses/user_timeline.xml?include_entities=true&include_rts=true&screen_name=BentleyMotors&count=2</a>…where the username in this case is “BentleyMotors”. If you specify XML formatting in the URL, the a tweet is returned in the text tag and looks something like this: Deputy PM Nick Clegg visits #Bentley today to tour Manufacturing facilities. #RegionalGrowthFund http://t.co/kX81aZmY http://t.co/Eet31cCA…and this needed converting into something like this: Deputy PM Nick Clegg visits <a href=\"https://twitter.com/#!/search/%23Bentley\">#Bentley</a> today to tour Manufacturing facilities. <a href=\"https://twitter.com/#!/search/%23RegionalGrowthFund\">#RegionalGrowthFund</a> <a href=\"http://t.co/kX81aZmY\">t.co/kX81aZmY</a> <a href=\"http://t.co/Eet31cCA\">t.co/Eet31cCA</a>The big idea in solving this problem1 is to use a State Machine that reads an input stream a byte a time to find the hashtags, user names and URLS and convert them into HTML anchor tags. For example, from the complete tweet above #Bentley becomes <a href=\"https://twitter.com/#!/search/%23Bentley\">#Bentley</a> and http://t.co/Eet31cCA becomes <a href=\"http://t.co/Eet31cCA\">t.co/Eet31cCA</a>. The means that the code has to find every word that begins with either ‘#’ or ‘@’ or a URL that begins with ‘http://’. The URL diagram for this State Machine looks something like this:This implementation does differ from the GOF diagram below in that for this application I’ve separated the state from the event/action. This has the benefits of improved decoupling and that actions can be associated with multiple states.Gathering Up Your States The first thing to do when building any state machine is to gather together your states. In the original GOF pattern states were abstract classes; however, I prefer to use more modern enums for simplicity. The states for this state machine are: public enum TweetState {OFF("Off - not yet running"), //RUNNING("Running - happily processing any old byte bytes"), //READY("Ready - found a space, so there's maybe soemthing to do, but that depends upon the next byte"), //HASHTAG("#HashTag has been found - process it"), //NAMETAG("@Name has been found - process it"), //HTTPCHECK("Checking for a URL starting with http://"), //URL("http:// has been found so capture the rest of the URL");private final String description;TweetState(String description) { this.description = description; }@Override public String toString() { return "TweetState: " + description; }} Reading the Bytes The next thing that’s needed is a class that reads an input stream a byte at a time, gets hold of the action class that’s associated with the machine’s current state and the process the byte using the action. This is done by the StateMachine class shown below: public class StateMachine<T extends Enum<?>> {private final byte[] inputBuffer = new byte[32768]; private T currentState; private final Map<T, AbstractAction<T>> stateActionMap = new HashMap<T, AbstractAction<T>>();public StateMachine(T startState) {this.currentState = startState;} /*** Main method that loops around and processes the input stream*/public void processStream(InputStream in) {// Outer loop - continually refill the buffer until there's nothing // left to readtry { processBuffers(in); terminate(); } catch (Exception ioe) { throw new StateMachineException("Error processing input stream: " + ioe.getMessage(), ioe); }}private void processBuffers(InputStream in) throws Exception {for (int len = in.read(inputBuffer); (len != -1); len = in .read(inputBuffer)) {// Inner loop - process the contents of the Bufferfor (int i = 0; i < len; i++) {processByte(inputBuffer[i]);}}}/*** Deal with each individual byte in the buffer*/private void processByte(byte b) throws Exception { // Get the set of actions associated with this stateAbstractAction<T> action = stateActionMap.get(currentState);// do the action, get the next statecurrentState = action.processByte(b, currentState);}/*** The buffer is empty. Make sue that we tidy up*/private void terminate() throws Exception {AbstractAction<T> action = stateActionMap.get(currentState);action.terminate(currentState);}/*** Add an action to the machine and associated state to the machine. A state* can have more than one action associated with it*/public void addAction(T state, AbstractAction<T> action) {stateActionMap.put(state, action);}/*** Remove an action from the state machine*/public void removeAction(AbstractAction<T> action) {stateActionMap.remove(action); // Remove the action - if it's there}}The key method here is processByte(...) /*** Deal with each individual byte in the buffer*/private void processByte(byte b) throws Exception {// Get the set of actions associated with this stateAbstractAction<T> action = stateActionMap.get(currentState);// do the action, get the next statecurrentState = action.processByte(b, currentState);} For every byte this method gets hold of the an action that’s associated with the current state from the stateActionMap. The action is then called and performed updating the current state ready for the next byte. Having sorted out the states and the state machine the next step is to write the actions. At this point I follow the GOF pattern more closely by creating an AbstractAction class that processes each event with… public abstract T processByte(byte b, T currentState) throws Exception; This method, given the current state, processes a byte of information and uses that byte to return the next state. The full implementation of the AbstractAction is: public abstract class AbstractAction<T extends Enum<?>> { /*** This is the next action to take - See the Chain of Responsibility Pattern*/protected final AbstractAction<T> nextAction;/** Output Stream we're using */protected final OutputStream os;/** The output buffer */protected final byte[] buff = new byte[1];public AbstractAction(OutputStream os) {this(null, os);}public AbstractAction(AbstractAction<T> nextAction, OutputStream os) {this.os = os;this.nextAction = nextAction;}/*** Call the next action in the chain of responsibility** @param b* The byte to process* @param state* The current state of the machine.*/protected void callNext(byte b, T state) throws Exception {if (nextAction != null) { nextAction.processByte(b, state); }}/*** Process a byte using this action** @param b* The byte to process* @param currentState* The current state of the state machine** @return The next state*/public abstract T processByte(byte b, T currentState) throws Exception;/*** Override this to ensure an action tides up after itself and returns to a* default state. This may involve processing any data that's been captured** This method is called when the input stream terminates*/public void terminate(T currentState) throws Exception {// blank}protected void writeByte(byte b) throws IOException {buff[0] = b; // Write the data to the output directoryos.write(buff); }protected void writeByte(char b) throws IOException {writeByte((byte) b); }}Building the State Machine So far all the code that I’ve written has been generic and can be reused time and time again 2, all of which means that the next step is to write some domain specific code. From the UML diagram above, you can see that the domain specific actions are: DefaultAction, ReadyAction and CaptureTags. Before I go on to describe what they do, you may have guessed that some I need to inject the actions in to the StateMachine and associate them with a TweetState. The JUnit code below shows how this is done… StateMachine<TweetState> machine = new StateMachine<TweetState>(TweetState.OFF); // Add some actions to the statemachine// Add the default actionmachine.addAction(TweetState.OFF, new DefaultAction(bos));machine.addAction(TweetState.RUNNING, new DefaultAction(bos));machine.addAction(TweetState.READY, new ReadyAction(bos));machine.addAction(TweetState.HASHTAG, new CaptureTag(bos, new HashTagStrategy()));machine.addAction(TweetState.NAMETAG, new CaptureTag(bos, new UserNameStrategy()));machine.addAction(TweetState.HTTPCHECK, new CheckHttpAction(bos));machine.addAction(TweetState.URL, new CaptureTag(bos, new UrlStrategy()));From the code above you can see that DefaultAction is linked the OFF and RUNNING states, the ReadyAction is linked to the READY state, the CaptureTag action is linked to the HASHTAG, NAMETAG and URL states and the HttpCheckAction is linked to the HTTPCHECK state. You may have noticed that the CaptureTag action is linked to more that one state. This is fine because the CaptureTag employs the Strategy pattern to change its behaviour on the fly; hence I have one action with some common code that, after injecting a strategy object, can do three things. Writing Actions Getting back to writing actions, the first action to write is usually the DefaultAction, which is the action that’s called when nothing interesting is happening. This action happily takes input characters and puts them into the output stream, whilst looking out for certain characters or character/state combinations. The heart of the DefaultAction is the switch statement in the processByte(...) method. public class DefaultAction extends AbstractAction<TweetState> { public DefaultAction(OutputStream os) {super(os);}/*** Process a byte using this action** @param b* The byte to process* @param currentState* The current state of the state machine*/@Override public TweetState processByte(byte b, TweetState currentState) throws Exception {TweetState retVal = TweetState.RUNNING;// Switch state if a ' ' charif (isSpace(b)) {retVal = TweetState.READY; writeByte(b); } else if (isHashAtStart(b, currentState)) { retVal = TweetState.HASHTAG; } else if (isNameAtStart(b, currentState)) { retVal = TweetState.NAMETAG; } else if (isUrlAtStart(b, currentState)) { retVal = TweetState.HTTPCHECK;} else { writeByte(b);}return retVal;}private boolean isSpace(byte b) {return b == ' ';}private boolean isHashAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == '#');}private boolean isNameAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == '@');}private boolean isUrlAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == 'h');}} From the code above you can see that the central switch statement is checking each byte. If the byte is a space, then the next byte maybe a special character: ‘#’ for the start of a hashtag, ‘@’ for the start of a name tag and ‘h’ for the start of a URL; hence, if a space is found then the DefaultAction returns the READY state as there may be more work to do. If a space isn’t found then it returns a RUNNING state which tells StateMachine to call the DefaultAction when the next byte is read. The DefaultAction also checks for special characters at the start of a line as the first character of a tweet maybe a ‘#’, ‘@’ or ‘h’. Control has now been passed back to the StateMachine object, which reads the next byte from the input stream. As the state is now READY, the next call to processByte(...) retrieves the ReadyAction. @Override public TweetState processByte(byte b, TweetState currentState) throws Exception {TweetState retVal = TweetState.RUNNING;switch (b) {case '#': retVal = TweetState.HASHTAG; break;case '@': retVal = TweetState.NAMETAG; break;case 'h': retVal = TweetState.HTTPCHECK; break;default: super.writeByte(b); break; }return retVal;} From ReadyAction’s switch statement you can see that its responsibility is to confirm that the code has found a hashtag, name or URL by checking for a ‘#’, ‘@’ and ‘h’ respectively. If it finds one then it returns one of the following states: HASHTAG, NAMETAG or HTTPCHECK to the StateMachine Assuming that the ReadyAction found a ‘#’ character and returned a HASHTAG state, then StateMachine, when it reads the next byte, will pull the CaptureTag class with the injected HashTagStrategy class from the stateActionMap public class CaptureTag extends AbstractAction<TweetState> {private final ByteArrayOutputStream tagStream; private final byte[] buf; private final OutputStrategy output; private boolean terminating;public CaptureTag(OutputStream os, OutputStrategy output) {super(os); tagStream = new ByteArrayOutputStream(); buf = new byte[1]; this.output = output; }/*** Process a byte using this action * @param b * The byte to process * @param currentState * The current state of the state machine */@Override public TweetState processByte(byte b, TweetState currentState) throws Exception { TweetState retVal = currentState;if (b == ' ') {retVal = TweetState.READY; // fix 1 output.build(tagStream.toString(), os); if (!terminating) { super.writeByte(' '); }reset();} else { buf[0] = b; tagStream.write(buf); }return retVal;}/*** Reset the object ready for processing*/public void reset() { terminating = false; tagStream.reset();}@Override public void terminate(TweetState state) throws Exception {terminating = true; processByte((byte) ' ', state);}} The idea behind the CaptureTag code is that it captures characters adding them to a ByteArrayOutputStream until it detects a space or the input buffer is empty. When a space is detected, the CaptureTag call its OutputStrategy interface, which in this case is implemented by HashTagStrategy. public class HashTagStrategy implements OutputStrategy { /** * @see state_machine.tweettohtml.OutputStrategy#build(java.lang.String, * java.io.OutputStream) */@Override public void build(String tag, OutputStream os) throws IOException {String url = "<a href=\"https://twitter.com/#!/search/%23" + tag + "\">#" + tag + "</a>"; os.write(url.getBytes());} }The HashTagStrategy builds a hashtag search URL and writes it to the output stream. Once the URL has been written to the stream, the CaptureTag returns a state of READY – as a space has been detected and returns control to the StateMachine. The StateMachine reads the next byte and so the process continues. Processing a hashtag is only one of several possible scenarios that this code can handle and in demonstrating this scenario I’ve tried to demonstrate how a state machine can be used to process an input stream a byte at a time in order to realize some predefined solution. If you’re interested in how the other scenarios are handled take a look at the source code on github In Summary In summary, this isn’t a technique that you’d want to use on a regular basis; it’s complex, pretty hard to implement and prone to error, plus there’s usually a simpler way of parsing incoming data. However, there are those odd few times when it is useful, when, despite its complexity, it is a good solution, so I’d recommend keeping it in your metaphorical toolbox and saving it for a rainy day. 1There are several ways of solving this puzzle some of which may be simpler and less complex than State Machine 2This version of StateMachine was written in 2006 to process XML. In this scenario the code had to unzip some Base 64 XML fields and as the pattern was re-usable I just dug it out of my toolbox of code samples for the Tweet to HTML case. 3The complete project is available on github … Reference: Implementing the State Machine Pattern as a Stream Processor from our JCG partner Roger Hughes at the Captain Debug’s Blog blog....
jboss-hibernate-logo

The Future of NoSQL with Java EE

I’ve been following the recent NoSQL momentum since some time now and it seems as if this buzzword also is drawing some kind of attention in the enterprise java world. Namely EclipseLink 2.4 started supporting MongoDB and Oracle NoSQL. Having EclipseLink as the JPA reference implementation you might wonder what this means for Java EE 7. A short side-note here: Even if I am part of the JSR-342 EG this isn’t meant to be an official statement. In the following I simply try to summarize my own personal experiences and feelings towards NoSQL support with future Java EE versions. A big thank you goes out to Emmanuel Bernard for providing early feedback! Happy to discuss what follows: What is NoSQL? NoSQL is a classification of database systems that do not conform to the relational database or SQL standard. Most often they are categorized according to the way they store the data and fall under categories such as key-value stores, BigTable implementations, document store databases, and graph databases. In general the term isn’t well enough defined to reduce it to a single supporting JSR or technology. So the only way to find suitable integration technologies is to dig through every single category. Key/Value Stores Key/Value stores allow data storage in a schema-less way. It could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is obviously comparable to parts of JSR 338 (Java Persistence 2.1) and JSR 347 ( Data Grids for the Java Platform) and also to what is done with JSR 107( JCACHE – Java Temporary Caching API). with native JPA2 Also primary aimed at caching is the JPA L2 Cache. The JPA Cache API is good for basic cache operations, while L2 cache shares the state of an entity — which is accessed with the help of the entity manager factory — across various persistence contexts. Level 2 cache underlies the persistence context, which is highly transparent to the application. When Level 2 cache is enabled, the persistence provider will look for the entities in the persistence context first. If it does not find them there, the persistence provider will look in the Level 2 cache next instead of sending a query to the database. The drawback here obviously is, that as of today this only works with NoSQL as some kind of “Cache”. And not as a replacement for the RDBMS data store. Given the scope of this spec it would be a good fit: But I strongly believe that JPA is designed to be an abstraction on RDBS and nothing else. If there has to be some kind of support for non relational databases we might end up having a more high level abstraction layer in place which tons of different persistence modes and features (maybe something like Spring Data). Generally mapping at the object level has many advantages including the ability to think object and let the underlying engine drive the de-normalization if needed. So reducing JPA to the caching features probably is the wrong decision. with JCache JCache having a CacheManager that holds and controls a collection of Caches and every single Caches have it’s entries. The basic API can be thought of map-­like with additional features (compare Greg’s blog). With JCache being designed as a “Cache” using it as a standardised interface against NoSQL data stores this isn’t a good fit on the first look. But given the nature of the use-cases for unstructured Key/Value based data with enterprise java this might be the right kind of integration. And the NoSQL concept also allows for the “Key-value cache in RAM” category which is an exact fit for both JCache and DataGrids. with DataGrids This JSR proposes an API for interacting with in-memory and disk-based distributed data grids. The API aims to allow users to perform operations on the data grid (PUT, GET, REMOVE) in an asynchronous and non-blocking manner returning a java.util.concurrent.Futures rather than the actual return values. The process here is not really visible at the moment (at least to me). So there aren’t any examples or concepts for integration of a NoSQL Key/Value store available until today. Beside this the same reservations as for the JCache API are in place. with EclipseLink EclipseLink’s NoSQL support is based on previous EIS support offered since EclipseLink 1.0. EclipseLink’s EIS support allowed persisting objects to legacy and non-relational databases. EclipseLink’s EIS and NoSQL support uses the Java Connector Architecture (JCA) to access the data-source similar to how EclipseLink’s relational support uses JDBC. EclipseLink’s NoSQL support is extendable to other NoSQL databases, through the creation of an EclipseLink EISPlatform class and a JCA adapter. At the moment it supports MongoDB (Document Oriented) and Oracle NoSQL (BigData). It’s interesting to see, that Oracle doesn’t address the Key/Value DBs first. Might be because of the possible confusion with the Cache features (e.g. Coherence). Column based DBs Read and write is done using columns rather than rows. The best known examples are Google’s BigTable and the likes of HBase and Cassandra that were inspired by BigTable. The BigTable paper says that BigTable is a sparse, distributed, persistent, multidimensional sorted Map. GAE for example works only with BigTable. It offers variety of APIs: from “native” low-level API to “native” high-level ones ( JDO and JPA). With the older Datanucleus version used by Google there seem to be a lot of limitations in place which could be removed ( see comments) but still are in place. Document-oriented DBs The Document-oriented DBs are most obviously best addressed by JSR 170 (Content Repository for Java) and JSR 283 (Content Repository for Java Technology API Version 2.0). With JackRabbit as a reference implementation it’s a strong sign for that :) The support for other NoSQL document stores is non existent as of today. Even Apache’s CouchDB doesn’t provide a JSR 170/283 compliant way of accessing the documents. The only drawback is that both JSR’s aren’t sexy or bleeding edge. But for me this would be the right bucket to put support for document-oriented DBs.Flip side of the medal?The content repository API isn’t exactly a natural model for an application. Does an app really want to deal with Nodes and attributes in Java?The notion of a domain model works nicely for many apps and if there is no chance to use it, you probably would be better off going native and use the MondoDB driver directly. Graph oriented DBs This kind of databases are thought for data whose relations are well represented with a graph-style (elements interconnected with an undetermined number of relations between them). Aiming primarily at any kind of network topology the recently rejected JSR 357 (Social Media API) would have been a good place to put support. At least from a use-case point of view. If those graph-oriented DBs are considered as a data-store there are a couple of options. If the Java EE persistence is steering into the direction of a more general data abstraction layer the 338 or it’s successors would be the right place to put support. If you know a little bit about how Coherence works internally and what had to be done to put JPA on top of it you also could consider 347 a good fit for it. With all the drawbacks already mentioned. Another alternative would be to have a separate JSR for it. The most prominent representative of this category is Neo4J which itself has an easy API available to simply include everything you need directly into your project. There is additional stuff to consider if you need to control the Neo4J instance via the application server. Conclusion To sum it up: We already have a lot in place for the so-called “NoSQL” DBs. And the groundwork for integrating this into new Java EE standards is promising. Control of embedded NoSQL instances should be done via JSR 322 (Java EE Connector Architecture) with this being the only allowed place spawn threads and open files directly from a filesystem. I’m not a big supporter of having a more general data abstraction JSR for the platform comparable to what Spring is doing with Spring Data. To me the concepts of the different NoSQL categories are too different than to have a one-size-fits-all approach.The main pain point of NoSQL besides the lack of standard API is that users are forced to denormalize and maintain de-normalization by hand. What I would like to see are some smaller changes to both the products to be more Java EE ready and also to the way the integration into the specs is done. Might be a good idea to simply define the different persistence types and generally define the JSRs which could be influenced by this and noSQLing those accordingly. For users willing to facilitate domain model (ie a higher level of abstraction compared to the raw NoSQL API), JPA might be the best vehicle for that at the moment. The feedback from both EclipseLink and Hibernate OGM users is needed to value what is working and what not. From a political point of view it might also make sense to pursue 347. Especially since main big players are present here already. The really hard part is querying.Should there be standardised query APIs for each family? With Java EE? Or would that better be placed within the NoSQL space? Would love to read your feedback on this! Reference: The Future of NoSQL with Java EE from our JCG partner Markus Eisele at the Enterprise Software Development with Java blog....
java-logo

Java 7: Closing NIO.2 file channels without loosing data

Closing an asynchronous file channel can be very difficult. If you submitted I/O tasks to the asynchronous channel you want to be sure that the tasks are executed properly. This can actually be a tricky requirement on asynchronous channels for several reasons. The default channel group uses deamon threads as worker threads, which isn’t a good choice, cause these threads just abandon if the JVM exits. If you use a custom thread pool executor with non-deamon threads you need to manage the lifecycle of your thread pool yourself. If you don’t the threads just stay alive when the main thread exits. Hence, the JVM actually does not exit at all, what you can do is kill the JVM. Another issue when closing asynchronous channels is mentioned in the javadoc of AsynchronousFileChannel: “Shutting down the executor service while the channel is open results in unspecified behavior.” This is because the close() operation on AsynchronousFileChannel issues tasks to the associated executor service that simulate the failure of pending I/O operations (in that same thread pool) with an AsynchronousCloseException. Hence, you’ll get RejectedExecutionException if you perform close() on an asynchronous file channel instance when you previously closed the associated executor service. That all being said, the proposed way to safely configure the file channel and shutdown that channel goes like this: public class SimpleChannelClose_AsynchronousCloseException {private static final String FILE_NAME = "E:/temp/afile.out"; private static AsynchronousFileChannel outputfile; private static AtomicInteger fileindex = new AtomicInteger(0); private static ThreadPoolExecutor pool = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());public static void main(String[] args) throws InterruptedException, IOException, ExecutionException { outputfile = AsynchronousFileChannel.open( Paths.get(FILE_NAME), new HashSet<StandardOpenOption>(Arrays.asList(StandardOpenOption.WRITE, StandardOpenOption.CREATE,StandardOpenOption.DELETE_ON_CLOSE)), pool); List<Future<Integer>> futures = new ArrayList<>(); for (int i = 0; i < 10000; i++) { futures.add(outputfile.write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5)); } outputfile.close(); pool.shutdown(); pool.awaitTermination(60, TimeUnit.SECONDS); for (Future<Integer> future : futures) { try { future.get(); } catch (ExecutionException e) { System.out.println("Task wasn't executed!"); } } } }The custom thread pool executor service is defined in lines 6 and 7. The file channel is defined in lines 10 to 13. In the lines 18 to 20 the asynchronous channel is closed in an orderly manner. First the channel itself is closed, then the executor service is shutdown and last not least the thread awaits termination of the thread pool executor. Although this is a safe way to close a channel with a custom executor service, there’s a new issue introduced. The clients submitted asynchronous write tasks (line 16) and may want be sure that, once they’ve been submitted successfully, those tasks will definitely be executed. Always waiting for Future.get() to return (line 23), isn’t an option, cause in many cases this would lead *asynchronous* file channels ad adsurdum. The snippet above will return lot’s of “Task wasn’t executed!” messages cause the channel is closed immediately after the write operations were submitted to the channel (line 18). To avoid such ‘data loss’ you can implement your own CompletionHandler and pass that to the requested write operation. public class SimpleChannelClose_CompletionHandler { ... public static void main(String[] args) throws InterruptedException, IOException, ExecutionException { ... outputfile.write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5, "", defaultCompletionHandler); ... }private static CompletionHandler<integer, string=""> defaultCompletionHandler = new CompletionHandler<Integer, String>() { @Override public void completed(Integer result, String attachment) { // NOP }@Override public void failed(Throwable exc, String attachment) { System.out.println("Do something to avoid data loss ..."); } }; }The CompletionHandler.failed() method (line 16) catches any runtime exception during task processing. You can implement any compensation code here to avoid data loss. When you work on mission critical data, then it may be a good idea to use CompletionHandlers. But *still* there’s another issue. The clients can submit tasks but they don’t know if the pool will successfully process these tasks. Successful in this context means that the bytes submitted actually reach their destination (the file on the hard disk). If you want to be sure that all submitted tasks are actually processed before closing, it gets a little trickier. You need a ‘graceful’ closing mechanism, that waits until the work queue is empty *before* it actually closes the channel and the associated executor service (this isn’t possible using standard lifecycle methods).Introducing GracefulAsynchronousChannel My last snippets introduce the GracefulAsynchronousFileChannel. You can get the complete code here in my Git repository. The behaviour of that channel is like this: guarantee to process all successfully submitted write operations and throw an NonWritableChannelException if the channel prepares shutdown. It takes two things to implement that behaviour. Firstly, you’ll need to implement the afterExecute() in an extension of ThreadPoolExecutor that sends a signal when the queue is empty. This is what DefensiveThreadPoolExecutor does. private class DefensiveThreadPoolExecutor extends ThreadPoolExecutor {public DefensiveThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, LinkedBlockingQueue<Runnable> workQueue, ThreadFactory factory, RejectedExecutionHandler handler) { super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, factory, handler); }/** * "Last" task issues a signal that queue is empty after task processing was completed. */ @Override protected void afterExecute(Runnable r, Throwable t) { if (state == PREPARE) { closeLock.lock(); // only one thread will pass when closer thread is awaiting signal try { if (getQueue().isEmpty() && state < SHUTDOWN) { System.out.println("Issueing signal that queue is empty ..."); isEmpty.signal(); state = SHUTDOWN; // -> no other thread can issue empty-signal } } finally { closeLock.unlock(); } } super.afterExecute(r, t); } }The afterExecute() method (line 12) is executed after each processed task by the thread that processed that given task. The implementation sends the isEmpty signal in line 18. The second part you need two gracefully close a channel is a custom implementation of the close() method of AsynchronousFileChannel. /** * Method that closes this file channel gracefully without loosing any data. */ @Override public void close() throws IOException { AsynchronousFileChannel writeableChannel = innerChannel; System.out.println("Starting graceful shutdown ..."); closeLock.lock(); try { state = PREPARE; innerChannel = AsynchronousFileChannel.open(Paths.get(uri), new HashSet<StandardOpenOption>(Arrays.asList(StandardOpenOption.READ)), pool); System.out.println("Channel blocked for write access ..."); if (!pool.getQueue().isEmpty()) { System.out.println("Waiting for signal that queue is empty ..."); isEmpty.await(); System.out.println("Received signal that queue is empty ... closing"); } else { System.out.println("Don't have to wait, queue is empty ..."); } } catch (InterruptedException e) { Thread.interrupted(); throw new RuntimeException("Interrupted on awaiting Empty-Signal!", e); } catch (Exception e) { throw new RuntimeException("Unexpected error" + e); } finally { closeLock.unlock(); writeableChannel.force(false); writeableChannel.close(); // close the writable channel innerChannel.close(); // close the read-only channel System.out.println("File closed ..."); pool.shutdown(); // allow clean up tasks from previous close() operation to finish safely try { pool.awaitTermination(1, TimeUnit.MINUTES); } catch (InterruptedException e) { Thread.interrupted(); throw new RuntimeException("Could not terminate thread pool!", e); } System.out.println("Pool closed ..."); } }Study that code for a while. The interesting bits are in line 11 where the innerChannel gets replaced by a read-only channel. That causes any subsequent asynchronous write requests to fail with an NonWritableChannelException. In line 16 the close() method waits for the isEmpty signal to happen. When this signal is send after the last write task the close() method continues with an orderly shutdown procedure (line 27 ff.). Basically, the code adds a shared lifecycle state across the file channel and the associated thread pool. That way both objects can communicate during the shutdown procedure and avoid data loss. Here is a logging client that uses the GracefulAsynchronousFileChannel. public class MyLoggingClient { private static AtomicInteger fileindex = new AtomicInteger(0); private static final String FILE_URI = "file:/E:/temp/afile.out";public static void main(String[] args) throws IOException { new Thread(new Runnable() { // arbitrary thread that writes stuff into an asynchronous I/O data sink@Override public void run() { try { for (;;) { GracefulAsynchronousFileChannel.get(FILE_URI).write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5); } } catch (NonWritableChannelException e) { System.out.println("Deal with the fact that the channel was closed asynchronously ... " + e.toString()); } catch (Exception e) { e.printStackTrace(); } } }).start();Timer timer = new Timer(); // asynchronous channel closer timer.schedule(new TimerTask() { public void run() { try { GracefulAsynchronousFileChannel.get(FILE_URI).close(); long size = Files.size(Paths.get("E:/temp/afile.out")); System.out.println("Expected file size (bytes): " + (fileindex.get() - 1) * 5); System.out.println("Actual file size (bytes): " + size); if (size == (fileindex.get() - 1) * 5) System.out.println("No write operation was lost!"); Files.delete(Paths.get("E:/temp/afile.out")); } catch (IOException e) { e.printStackTrace(); } } }, 1000);} }The client starts two threads, one thread issues write operations in an infinite loop (line 6 ff.). The other thread closes the file channel asynchronously after one second of processing (line 25 ff.). If you run that client, then the following output is produced: Starting graceful shutdown ... Deal with the fact that the channel was closed asynchronously ... java.nio.channels.NonWritableChannelException Channel blocked for write access ... Waiting for signal that queue is empty ... Issueing signal that queue is empty ... Received signal that queue is empty ... closing File closed ... Pool closed ... Expected file size (bytes): 400020 Actual file size (bytes): 400020 No write operation was lost!The output shows the orderly shutdown procedure of participating threads. The logging thread needs to deal with the fact that the channel was closed asynchronously. After the queued tasks are processed the channel resources are closed. No data was lost, everything that the client issued was really written to the file destination. No AsynchronousClosedExceptions or RejectedExecutionExceptions in such a graceful closing procedure. That’s all in terms of safely closing asynchronous file channels. The complete code is here in my Git repository. I hope you’ve enjoyed it a little. Looking forward to your comments. Reference: “Java 7: Closing NIO.2 file channels without loosing data” from our JCG partner Niklas....
netbeans-logo

NetBeans 7.1: Create a Custom Hint

I have talked about some of my favorite NetBeans hints in the posts Seven NetBeans Hints for Modernizing Java Code and Seven Indispensable NetBeans Java Hints. The fourteen hints covered in those two posts are a small fraction of the total number of hints that NetBeans supports “out of the box.” However, even greater flexibility is available to the NetBeans user because NetBeans 7.1 makes it possible to write custom hints. I look at a simple example of this in this post. Geertjan Wielenga‘s post Custom Declarative Hints in NetBeans IDE 7.1 begins with coverage of NetBeans’s “Inspect and Transform” (AKA “Inspect and Refactor“) dialog, which is available from the “Refactor” menu (which in turn is available via the dropdown “Refactor” menu along the menu bar or via right-click in the NetBeans editor). The following screen snapshot shows how this looks.The “Inspect” field of the “Inspect and Transform” dialog allows the NetBeans user to tailor which project or file should be inspected. The “Use” portion of the “Inspect and Transform” dialog allows that NetBeans user to specify which hints to inspect for. In this case, I am inspecting using custom hints and I can see that by clicking on the “Manage” button and selecting the “Custom” checkbox. Note that if “Custom” is not an option when you first bring this up, you probably need to click the “New” button in the bottom left corner. When I click on “Manage” and check the “Custom” box, it expands and I can see the newly created “Inspection” hint. If I click on this name, I can rename it and do so in this case. The renamed inspection (“CurrentDateDoesNotNeedSystemCurrentMillis”) is shown in the next screen snapshot.To create the hint and provide the description seen in the box, I can click on the “Edit Script” button. Doing so leads to the small editor window shown in the next screen snapshot.If more space is desired for editing the custom inspection/hint, the “Open in Editor” button will lead to the text being opened in the NetBeans text editor in which normal Java code and XML code is edited.With the custom inspection/hint in place, it’s time to try it out on some Java code. The following code listing uses an extraneous call to System.currentTimeMillis() and passes its result to the java.util.Date single long argument constructor. This is unnecessary because Date’s no-arguments constructor will automatically instantiate an instance of Date based on the current time (time now). RedundantSystemCurrentTimeMillis.java package dustin.examples;import static java.lang.System.out; import java.util.Date;/** * Simple class to demonstrate NetBeans custom hint. * * @author Dustin */ public class RedundantSystemCurrentTimeMillis { public static void main(final String[] arguments) { final Date date = new Date(System.currentTimeMillis()); out.println(date); } }The above code works properly, but could be more concise. When I tell NetBeans to associate my new inspection with this project in the “Inspect and Transform” dialog, NetBeans is able to flag this for me and recommend the fix. The next three screen snapshots demonstrate that NetBeans will flag the warning with the yellow light bulb icon and yellow underlining, will recommend the fix when I click on the light bulb, and implements the suggested fix when I select it.As the above has shown, a simple custom hint allows NetBeans to identify, flag, and fix at my request the unnecessary uses of System.curentTimeMillis(). I’ve written before that NetBeans’s hints are so handy because they do in fact do three things for the Java developer: automatically flag areas for code improvement for the developer, often automatically fix the issue if so desired, and communicate better ways of writing Java. For the last benefit in this case, the existence of this custom hint helps convey to other Java developers a little more knowledge about the Date class and a better way to instantiate it when current date/time is desired. The most difficult aspect of using NetBeans’s custom hints is finding documentation on how to use them. The best sources currently available appear to be the NetBeans 7.1 Release Notes, several Wielenga posts (Custom Declarative Hints in NetBeans IDE 7.1, Oh No Vector!, Oh No @Override! / Oh No Utilities.loadImage!), and Jan Lahoda‘s jackpot30 Rules Language (covers the rules language syntax used by the custom inspections/hints and shown in the simple example above). The Refactoring with Inspect and Transform in the NetBeans IDE Java Editor tutorial also includes a section on managing custom hints. Hopefully, the addressing of Bug 210023 will help out with this situation. My example custom NetBeans hint works specifically with the Date class. An interesting and somewhat related StackOverflow thread asks if a NetBeans custom hint could be created to recommend use of Joda Time instead of Date or Calendar. A response on that thread refers to the NetBeans Java Hint Module Tutorial. Looking over that tutorial reminds me that the approach outlined in this post and available in NetBeans 7.1 is certainly improved and easier to use.Incidentally, a hint like that asked for in the referenced StackOverflow thread is easy to write in NetBeans 7.1. There is no transform in this example because a change of the Date class to a Joda Time class would likely require more changes in the code than the simple transform could handle. This hint therefore becomes one that simply recommends changing to Joda Time. The next screen snapshots show the simple hint and how they appear in the NetBeans editor. Each release of NetBeans seems to add more useful hints to the already large number of helpful hints that NetBeans supports. However, it is impossible for the NetBeans developers to add every hint that every team or project might want. Furthermore, it is not desirable to have every possible hint that every community member might come up with added to the IDE. For this reason, the ability to specify custom hints in NetBeans and the ability to apply those hints selectively to projects and files are both highly desirable capabilities. Reference: Creating a NetBeans 7.1 Custom Hint from our JCG partner Dustin Marx at the Inspired by Actual Events blog....
java-logo

What’s Cooking in Java 8 – Project Jigsaw

What is Project Jigsaw: Project Jigsaw is the project to make the java compiler module aware. For years java API has been monolithic, i.e. the whole API was seen from any part of the code equally. There has also not been any way to declare a code’s dependency on any other user libraries. Project Jigsaw attempts to solve these problems along with others in a very eligant way. In this article, I will highlight the basic concepts of Jigsaw module systems and also explain how it would work with the commands so as to provide a real feel of it. Currently, Jigsaw is targetted to be included in the release of Java 8. In my opinion, this is a change bigger than generics that came with verion 5 of java platform. What is Achieved by Project Jigsaw: As I explained earlier, project Jigsaw solves the problem of the whole java API being used as a single monolithic codebase. The following points highlight the main advantages. 1. Dependency Graph: Jigsaw gives a way to uniquely identify a particular codebase, and also to declare a codebase’s dependencies on other codebases. This creates a complete dependency graph for a particular set of classes. Say for example, you want to write a program that depends on Apache BCEL library. Until now, there was no way for you to express this requirement in the code itself. Using Jigsaw, you can express this requirement in the code itself, allowing tools to resolve this dependency. 2. Multiple Versions of the Same Code: Suppose you write a program that depends on both libray A and library B. Now suppose library A depends on version 1.0 of library C and library B depends on version 2.0 of library C. In the current java runtime, you cannot use library A and B at the same time without creating a complex hierarchy of custom classloaders, even that would not work in all cases. After Jigsaw becomes part of java, this is not a problem as a class will be able to see only the versions of its dependent classes that are part of the module versions required by the classes container module. That is to say, since module A depends on version 1.0 of module C, and module B depends on version 2.0 of module C, the java runtime can figure out which version of the classes in module C to be seen by either module A or module B. This is something similar to OSGi project. 3. Modularization of Java Platform Itself: The current java platform API is huge and not all parts of it may be relevant in every case. For example, a java platform intended to run a Java EE server does not have to implement the Swing API as that would not make any sense. Similarly, embedded environments can stripdown some not so important APIs (for embedded) like compiler API to make it smaller and faster. Under current java platform, its not possible as any certified java platform must implement all the APIs. Jigsaw will provide a way to implement only a part of the API set relevant to the particular platform. Since a module can explicitly declare its dependency on any particular java API module, it will be run only when the platform has an implementation of the modules requred by the module. 4. Integration with OS native installation: Since the module system is very similar to what is currently available for installation of programs and libraries in modern operating systems, the java modules can be integrated with those systems. These are in fact out of the scope of Jigsaw project itself, but the OS vendors are encouraged to enable this and they would most likely do so. For example, the rpm based repository system available in Redhat based linux systems and apt based repository systems available in Debian based linux systems can easily be enhanced to support java module systems. 5. Module Entry Point: Java modules can specify an entry point class just like the jars can specify it. When a module is run, the entry-point’s main method is invoked. Now since the OS can now install a java module and the java module can be executed, its very similar to installing an OS’s native program. 5. Efficiency: Currenly, every time a JVM is run, it verifies the integrity of every single class that is loaded during the run of the program. This takes a considerable amount of time. Also the classes are accessed individually from the OS file system. Since modules can be installed before running, the installation itself can now include the verification step which will eliminate the need to verify the classes at runtime. This will lead to considerable performance improvement. Also, the module system can store the classes in its own optimized manner leading to further improvement in the performance. 6. Module Abstraction: It is possible to provide an abstraction for a particular module. Say module A depends on module X. Now module D can provide for module X thus providing its implementation. For example, the Apache Xerces modules would want to provide for jdk.jaxp module and would be able to satisfy a dependency requirement for jdk.jaxp. Basics of Modular Codebase: All the above discussion are pretty vague without a real example of modular codebase and its usage. A modular codebase can either be single module or multi-module. In case of single module, all we need to enable module is to create a file named module-info.java at the base of the source path, outside any package. The module-info.java file is a special java file written in a special syntax designed to declare module information. The following is an example of such a mdoule-info.java. module com.a @ 1.0{requires com.b @ 1.0; class com.a.Hello; }In this case the module is named com.a and it has got a dependency on com.b. It also declares an entry point com.a.Hello. Note that it is not required that the package structure ressembles the module name, although that would probably be a best practice. Now you might be thinking that if it is a single module mode, then why is there a dependency on a different module, does not that make it two modules. Notice that even if there is only one explicit declaration of a dependency module, there is implicit dependency on all java API modules. If none of the java API modules are declared explicitly as dependencies, all of the them are included. The only reason its still single module is that the com.b must be available in binary form in the module library. Its multi-module when more than one module is being compiled at the same time. Compiling a source in single module is as simple as how we compile a non-modular source. Only difference is that module-info.java will be present in the source root. Multi-module Source: In case the source contains multiple modules, they must be given a directory structure. Its pretty simple though. The source under a particular module must be kept in a directory of the name of the module. For example, the source for the class com.a.Hello in the module com.a must be kept in [source-root]/com.a/com/a/Hello.java and the module-info.java must be kept in the directory [source-root]/com.a Compiling Multi-module Source: For this let us consider an example of compiling two modules com.a and com.b. Let us first take a look at the directory structure. as below: classes src |--com.a | |--module-info.java | |--com | |--a | |--Hello.java |--com.b |--module-info.java |--com |--b |--Printer.javaThe code for module-info.java in com.a would be like this. module com.a @ 1.0{requires com.b @ 1.0; class com.a.Hello; }The module-info.java in com.b module com.b @ 1.0{ exports com.b; }Printer.java in com.b/com/b package com.b;public class Printer{ public static void print(String toPrint){ System.out.println(toPrint); } }Hello.java in com.a/com/a package com.a; import com.b.Printer;public class Hello{ public static void main(String [] args){ Printer.print("Hello World!"); } }The codes are pretty self explanatory, we are trying to use com.b.Printer class in module com.b from com.a.Hello class in module com.a. For this, its mandatory for com.a module-info.java to declare com.b as a dependency with the requires keyword. We are trying to create the output class files in the classes directory. The following javac command would do that. javac -d classes -modulepath classes -sourcepath src `find src -name '*.java'` Note that we have used find command in backquotes(`) so that the command’s output will be included as the file list. This will work in linux and unix environments. In case of others we might simply type in the list of files. After compilation, classes directory will have a similar structure of classes. Now we can install the modules using jmod command. jmod create -L mlib jmod install -L mlib classes com.b jmod install -L mlib classes com.aWe first created a module library mlib and installed our modules in the library. We could also have used the default library by not specifying the -L option to the install command in jmod. Now we can simply run module com.a using java -L mlib -m com.aHere too we could have used the default module. It is also possible to create a distributable module package [equivalent to a jar in today's distribution mechanism] that can directly be installed. For example, the following will create com.a@1.0.jmod for com.a jpkg -m classes/com.a jmod com.aI have tried to outline the module infrastructure in the upcoming java release. However project Jigsaw is being modified everyday and can turn up to be a completely differnt being altogether at the end. But it is expected that the basic concepts would still remain the same. The total module concepts are more complex and I will cover the details in an upcoming article. Reference: What’s Cooking in Java 8 – Project Jigsaw from our JCG partner Debasish Ray Chawdhuri at the Geeky Articles blog....
scala-logo

Scala Basic XML processing

Introduction Pretty much everybody knows what XML is: it is a structured, machine-readable text format for representing information that can be easily checked for the “grammaticality” of the tags, attributes, and their relationship to each other (e.g. using DTD’s). This contrasts with HTML, which can have elements that don’t close (e.g. <p>foo<p>bar rather than <p>foo</p><p>bar</p>) and still be processed. XML was only ever meant to be a format for machines, but it morphed into a data representation that many people ended up (unfortunately, for them) editing by hand. However, even as a machine readable format it has problems, such as being far more verbose than is really required, which matters quite a bit when you need to transfer lots of data from machine to machine — in the next post, I’ll discuss JSON and Avro, which can be viewed as evolutions of what XML was intended for and which work much better for lots of the applications that matter in the “big data” context. Regardless, there is plenty of legacy data that was produced as XML, and there are many communities (e.g. the digital humanities community) who still seem to adore XML, so people doing any reasonable amount of text analysis work will likely find themselves eventually needing to work with XML-encoded data. There are a lot of tutorials on XML and Scala — just do a web search for “Scala XML” and you’ll get them. As with other blog posts, this one is aimed at being very explicit so that beginners can see examples with all the steps in them, and I’ll use it to set up a JSON processing post. A simple example of XML To start things off, let’s consider a very basic example of creating and processing a bit of XML. The first thing to know about XML in Scala is that Scala can process XML literals. That is, you don’t need to put quotes around XML strings — instead, you can just write them directly, and Scala will automatically interpret them as XML elements (of type scala.xml.Element). scala> val foo = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo> foo: scala.xml.Elem = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>Now let’s do a little bit of processing on this. You can get all the text by using the text method. scala> foo.text res0: String = hi1yellowSo, that munged all the text together. To get them printed out with spaces between, let’s first get all the bar nodes and then get their texts and use mkString on that sequence. To get the bar nodes, we can use the \ selector. scala> foo \ "bar" res1: scala.xml.NodeSeq = NodeSeq(<bar type="greet">hi</bar>, <bar type="count">1</bar>, <bar type="color">yellow</bar>)This gives us back a sequence of the bar nodes that occur directly under the foo node. Note that the \ operator (selector) is just a mirror image of the / selector used in XPath. Of course, now that we have such a sequence, we can map over it to get what we want. Since the text method returns the text under a node, we can do the following. scala> (foo \ "bar").map(_.text).mkString(" ") res2: String = hi 1 yellowTo grab the value of the type attribute on each node, we can use the \ selector followed by “@type”. scala> (foo \ "bar").map(_ \ "@type") res3: scala.collection.immutable.Seq = List(greet, count, color)(foo \ "bar").map(barNode => (barNode \ "@type", barNode.text)) res4: scala.collection.immutable.Seq[(scala.xml.NodeSeq, String)] = List((greet,hi), (count,1), (color,yellow))Note that the \ selector can only retrieve children of the node you are selecting from. To dig arbitrarily deep to pull out all nodes of a given type no matter where they are, use the \\ selector. Consider the following (bizarre) XML snippet with ‘z’ nodes at different levels of embedding. <a> <z x="1"/> <b> <z x="2"/> <c> <z x="3"/> </c> <z x="4"/> </b> </a>Let’s first put it into the REPL. scala> val baz = <a><z x="1"/><b><z x="2"/><c><z x="3"/></c><z x="4"/></b></a> baz: scala.xml.Elem = <a><z x="1"></z><b><z x="2"></z><c><z x="3"></z></c><z x="4"></z></b></a>If we want to get all of the ‘z’ nodes, we do the following. scala> baz \\ "z" res5: scala.xml.NodeSeq = NodeSeq(<z x="1"></z>, <z x="2"></z>, <z x="3"></z>, <z x="4"></z>)And we can of course easily dig out the values of the x attributes on each of the z’s. scala> (baz \\ "z").map(_ \ "@x") res6: scala.collection.immutable.Seq = List(1, 2, 3, 4)Throughout all of the above, we have used XML literals — that is, expressions typed directly into Scala, which interprets them as XML types. However, we usually need to process XML that is saved in a file, or a string, so the scala.xml.XML object has several methods for creating scala.xml.Elem objects from other sources. For example, the following allows us to create XML from a string. scala> val fooString = """<foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>""" fooString: java.lang.String = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>scala> val fooElemFromString = scala.xml.XML.loadString(fooString) fooElemFromString: scala.xml.Elem = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>This Elem is the same as the one created using the XML literal, as shown by the following test. scala> foo == fooElemFromString res7: Boolean = trueSee the Scala XML object for other ways to create XML elements, e.g. from InputStreams, Files, etc. A richer XML example As a more interesting example of some XML to process, I’ve created the following short XML string describing artist, albums, and songs, which you can see in the github gist music.xml. https://gist.github.com/2597611 I haven’t put any special care into this, other than to make sure it has embedded tags, some of which have attributes, and some reasonably interesting content (and some great songs). You should save this in a file called /tmp/music.xml. Once you’ve done that, you can run the following code, which just prints out each artist, album and song, with an indent for each level. val musicElem = scala.xml.XML.loadFile("/tmp/music.xml")(musicElem \ "artist").foreach { artist => println((artist \ "@name").text + "\n") val albums = (artist \ "album").foreach { album => println(" " + (album \ "@title").text + "\n") val songs = (album \ "song").foreach { song => println(" " + (song \ "@title").text) } println } }Converting objects to and from XML One of the use cases for XML is to provide a machine-readable serialization format for objects that can still be easily read, and at times edited, by humans. The process of shuffling objects from memory into a disk-format like XML is called marshalling. We’ve started with some XML, so what we’ll do is define some classes and “unmarshall” the XML into objects of those classes. Put the following into the REPL. (Tip: You can use “:paste” to enter multi-line statements like those below. These will work without paste, but it is necessary to use it in some contexts, e.g. if you define Artist before Song.) case class Song(val title: String, val length: String) { lazy val time = { val Array(minutes, seconds) = length.split(":") minutes.toInt*60 + seconds.toInt } }case class Album(val title: String, val songs: Seq[Song], val description: String) { lazy val time = songs.map(_.time).sum lazy val length = (time / 60)+":"+(time % 60) }case class Artist(val name: String, val albums: Seq[Album])Pretty simple and straightforward. Note the use of lazy vals for defining things like the time (length in seconds) of a song. The reason for this is that if we create a Song object but never ask for its time, then the code needed to compute it from a string like “4:38? is never run; however, if we had left lazy off, then it would be computed when the Song object is created. Also, we don’t want to use a def here (i.e. make time a method) because its value is fixed based on the length string; using a method would mean recomputing time every time it is asked for of a particular object. Given the classes above, we can create and use objects from them by hand. scala> val foobar = Song("Foo Bar", "3:29") foobar: Song = Song(Foo Bar,3:29)scala> foobar.time res0: Int = 209Using the native Scala XML API Of course, we’re more interested in constructing Artist, Album, and Song objects from information specified in files like the music example. Though I don’t show the REPL output here, you should enter all of the commands below into it to see what happens. To start off, make sure you have loaded the file. val musicElem = scala.xml.XML.loadFile("/tmp/music.xml")Now we can work with the file to select various elements, or create objects of the classes defined above. Let’s start with just Songs. We can ignore all the artists and albums and dig straight in with the \\ operator. val songs = (musicElem \\ "song").map { song => Song((song \ "@title").text, (song \ "@length").text) }scala> songs.map(_.time).sum res1: Int = 11311And, we can go all the way and construct Artist, Album and Song objects that directly mirror the data stored in the XML file. val artists = (musicElem \ "artist").map { artist => val name = (artist \ "@name").text val albums = (artist \ "album").map { album => val title = (album \ "@title").text val description = (album \ "description").text val songList = (album \ "song").map { song => Song((song \ "@title").text, (song \ "@length").text) } Album(title, songList, description) } Artist(name, albums) }With the artists sequence in hand, we can do things like showing the length of each album. val albumLengths = artists.flatMap { artist => artist.albums.map(album => (artist.name, album.title, album.length)) } albumLengths.foreach(println)Which gives the following output. (Radiohead,The King of Limbs,37:34) (Radiohead,OK Computer,53:21) (Portished,Dummy,48:46) (Portished,Third,48:50)Marshalling objects to XML In addition to constructing objects from XML specifications (also referred to as de-serializing and un-marshalling), it is often necessary to marshal objects one has constructed in code to XML (or other formats). The use of XML literals is actually quite handy in this regard. To see this, let’s start with the first song of the first album of the first album (Bloom, by Radiohead). scala> val bloom = artists(0).albums(0).songs(0) bloom: Song = Song(Bloom,5:15)We can construct an Elem from this as follows. scala> val bloomXml = <song title={bloom.title} length={bloom.length}/> bloomXml: scala.xml.Elem = <song length="5:15" title="Bloom"></song>The thing to note here is that an XML literal is used, but when we want to use values from variables, we can escape from literal-mode with curly brackets. So, {bloom.title} becomes “Bloom”, and so on. In contrast, one could do it via a String as follows. scala> val bloomXmlString = "<song title=\""+bloom.title+"\" length=\""+bloom.length+"\"/>" bloomXmlString: java.lang.String = <song title="Bloom" length="5:15"/>scala> val bloomXmlFromString = scala.xml.XML.loadString(bloomXmlString) bloomXmlFromString: scala.xml.Elem = <song length="5:15" title="Bloom"></song>So, the use of literals is a bit more readable (though it comes at the cost of making it hard in Scala to use “<” as an operator for many use cases, which is one of the reasons XML literals are considered by many to be not a great idea). We can create the whole XML for all of the artists and albums in one fell swoop. Note that one can have XML literals in the escaped bracketed portions of an XML literal, which allows the following to work. Note: you need to use the :paste mode in the REPL in order for this to work. val marshalled = <music> { artists.map { artist => <artist name={artist.name}> { artist.albums.map { album => <album title={album.title}> { album.songs.map(song => <song title={song.title} length={song.length}/>) } <description>{album.description}</description> </album> }} </artist> }} </music>Note that in this case, the for-yield syntax is perhaps a bit more readable since it doesn’t require the extra curly braces. val marshalledYield = <music> { for (artist <- artists) yield <artist name={artist.name}> { for (album <- artist.albums) yield <album title={album.title}> { for (song <- album.songs) yield <song title={song.title} length={song.length}/> } <description>{album.description}</description> </album> } </artist> } </music>One could of course instead add a toXml method to each of the Song, Album, and Artist classes such that at the top level you’d have something like the following. val marshalledWithToXml = <music> { artists.map(_.toXml) } </music>This is a fairly common strategy. However, note that the problem with this solution is that it produces a very tight coupling between the program logic (e.g. of what things like Songs, Albums and Artists can do) with other, orthogonal logic, like serializing them. To see a way of decoupling such different needs, check out Dan Rosen’s excellent tutorial on type classes. Conclusion The standard Scala XML API comes packaged with Scala, and it is actually quite nice for some basic XML processing. However, it caused some “controversy” in that it was felt by many that the core language has no business providing specialized processing for a format like XML. Also, there are some efficiency issues. Anti-XML is a library that seeks to do a better job of processing XML (especially in being more scalable and more flexible in allowing programmatic editing of XML). As I understand things, Anti-XML may become a sort of official XML processing library in the future, with the current standard XML library being phased out. Nonetheless, many of the ways of interacting with an XML document shown above are similar, so being familiar with the standard Scala XML API provides the core concepts you’ll need for other such libraries. Reference: Basic XML processing with Scala from our JCG partner Jason Baldridge at the Bcomposes blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close