Featured FREE Whitepapers

What's New Here?

java-logo

Implementing the State Machine Pattern as a Stream Processor

In my last blog, I said that I really thought that some of the Gang Of Four (GOF) patterns were becoming somewhat obsolete, and if not obsolete then certainly unpopular. In particular I said that StateMachine wasn’t that useful as you can usually think of another, simpler way of doing whatever it is you’re doing rather than using it. In order to make amends, both for preaching obsolescence and for the hideous ‘C’ code that I attached to the end of my last blog, I thought that I’d demonstrate the use of StateMachine in converting Twitter tweets into HTML. The scenario, just for once, isn’t contrived or far fetched, but something that I had to do the other day. In this scenario I have an app that’s just down loaded a bunch of timeline tweets for an authenticated Twitter user. Having parse the XML (or JSON) and got hold of the tweets I needed to format them for display. The problem was that they’re in plain text and I needed to convert them into HTML, adding anchor tags along the way to produce something similar to the way that twitter does when it formats the same thing on your twitter home page. Just for reference, a users Tweets can be retrieved using the Twitter API via the following URL: <a href="https://api.twitter.com/1/statuses/user_timeline.xml?include_entities=true&include_rts=true&screen_name=BentleyMotors&count=2" target="new">https://api.twitter.com/1/statuses/user_timeline.xml?include_entities=true&include_rts=true&screen_name=BentleyMotors&count=2</a>…where the username in this case is “BentleyMotors”. If you specify XML formatting in the URL, the a tweet is returned in the text tag and looks something like this: Deputy PM Nick Clegg visits #Bentley today to tour Manufacturing facilities. #RegionalGrowthFund http://t.co/kX81aZmY http://t.co/Eet31cCA…and this needed converting into something like this: Deputy PM Nick Clegg visits <a href=\"https://twitter.com/#!/search/%23Bentley\">#Bentley</a> today to tour Manufacturing facilities. <a href=\"https://twitter.com/#!/search/%23RegionalGrowthFund\">#RegionalGrowthFund</a> <a href=\"http://t.co/kX81aZmY\">t.co/kX81aZmY</a> <a href=\"http://t.co/Eet31cCA\">t.co/Eet31cCA</a>The big idea in solving this problem1 is to use a State Machine that reads an input stream a byte a time to find the hashtags, user names and URLS and convert them into HTML anchor tags. For example, from the complete tweet above #Bentley becomes <a href=\"https://twitter.com/#!/search/%23Bentley\">#Bentley</a> and http://t.co/Eet31cCA becomes <a href=\"http://t.co/Eet31cCA\">t.co/Eet31cCA</a>. The means that the code has to find every word that begins with either ‘#’ or ‘@’ or a URL that begins with ‘http://’. The URL diagram for this State Machine looks something like this:This implementation does differ from the GOF diagram below in that for this application I’ve separated the state from the event/action. This has the benefits of improved decoupling and that actions can be associated with multiple states.Gathering Up Your States The first thing to do when building any state machine is to gather together your states. In the original GOF pattern states were abstract classes; however, I prefer to use more modern enums for simplicity. The states for this state machine are: public enum TweetState {OFF("Off - not yet running"), //RUNNING("Running - happily processing any old byte bytes"), //READY("Ready - found a space, so there's maybe soemthing to do, but that depends upon the next byte"), //HASHTAG("#HashTag has been found - process it"), //NAMETAG("@Name has been found - process it"), //HTTPCHECK("Checking for a URL starting with http://"), //URL("http:// has been found so capture the rest of the URL");private final String description;TweetState(String description) { this.description = description; }@Override public String toString() { return "TweetState: " + description; }} Reading the Bytes The next thing that’s needed is a class that reads an input stream a byte at a time, gets hold of the action class that’s associated with the machine’s current state and the process the byte using the action. This is done by the StateMachine class shown below: public class StateMachine<T extends Enum<?>> {private final byte[] inputBuffer = new byte[32768]; private T currentState; private final Map<T, AbstractAction<T>> stateActionMap = new HashMap<T, AbstractAction<T>>();public StateMachine(T startState) {this.currentState = startState;} /*** Main method that loops around and processes the input stream*/public void processStream(InputStream in) {// Outer loop - continually refill the buffer until there's nothing // left to readtry { processBuffers(in); terminate(); } catch (Exception ioe) { throw new StateMachineException("Error processing input stream: " + ioe.getMessage(), ioe); }}private void processBuffers(InputStream in) throws Exception {for (int len = in.read(inputBuffer); (len != -1); len = in .read(inputBuffer)) {// Inner loop - process the contents of the Bufferfor (int i = 0; i < len; i++) {processByte(inputBuffer[i]);}}}/*** Deal with each individual byte in the buffer*/private void processByte(byte b) throws Exception { // Get the set of actions associated with this stateAbstractAction<T> action = stateActionMap.get(currentState);// do the action, get the next statecurrentState = action.processByte(b, currentState);}/*** The buffer is empty. Make sue that we tidy up*/private void terminate() throws Exception {AbstractAction<T> action = stateActionMap.get(currentState);action.terminate(currentState);}/*** Add an action to the machine and associated state to the machine. A state* can have more than one action associated with it*/public void addAction(T state, AbstractAction<T> action) {stateActionMap.put(state, action);}/*** Remove an action from the state machine*/public void removeAction(AbstractAction<T> action) {stateActionMap.remove(action); // Remove the action - if it's there}}The key method here is processByte(...) /*** Deal with each individual byte in the buffer*/private void processByte(byte b) throws Exception {// Get the set of actions associated with this stateAbstractAction<T> action = stateActionMap.get(currentState);// do the action, get the next statecurrentState = action.processByte(b, currentState);} For every byte this method gets hold of the an action that’s associated with the current state from the stateActionMap. The action is then called and performed updating the current state ready for the next byte. Having sorted out the states and the state machine the next step is to write the actions. At this point I follow the GOF pattern more closely by creating an AbstractAction class that processes each event with… public abstract T processByte(byte b, T currentState) throws Exception; This method, given the current state, processes a byte of information and uses that byte to return the next state. The full implementation of the AbstractAction is: public abstract class AbstractAction<T extends Enum<?>> { /*** This is the next action to take - See the Chain of Responsibility Pattern*/protected final AbstractAction<T> nextAction;/** Output Stream we're using */protected final OutputStream os;/** The output buffer */protected final byte[] buff = new byte[1];public AbstractAction(OutputStream os) {this(null, os);}public AbstractAction(AbstractAction<T> nextAction, OutputStream os) {this.os = os;this.nextAction = nextAction;}/*** Call the next action in the chain of responsibility** @param b* The byte to process* @param state* The current state of the machine.*/protected void callNext(byte b, T state) throws Exception {if (nextAction != null) { nextAction.processByte(b, state); }}/*** Process a byte using this action** @param b* The byte to process* @param currentState* The current state of the state machine** @return The next state*/public abstract T processByte(byte b, T currentState) throws Exception;/*** Override this to ensure an action tides up after itself and returns to a* default state. This may involve processing any data that's been captured** This method is called when the input stream terminates*/public void terminate(T currentState) throws Exception {// blank}protected void writeByte(byte b) throws IOException {buff[0] = b; // Write the data to the output directoryos.write(buff); }protected void writeByte(char b) throws IOException {writeByte((byte) b); }}Building the State Machine So far all the code that I’ve written has been generic and can be reused time and time again 2, all of which means that the next step is to write some domain specific code. From the UML diagram above, you can see that the domain specific actions are: DefaultAction, ReadyAction and CaptureTags. Before I go on to describe what they do, you may have guessed that some I need to inject the actions in to the StateMachine and associate them with a TweetState. The JUnit code below shows how this is done… StateMachine<TweetState> machine = new StateMachine<TweetState>(TweetState.OFF); // Add some actions to the statemachine// Add the default actionmachine.addAction(TweetState.OFF, new DefaultAction(bos));machine.addAction(TweetState.RUNNING, new DefaultAction(bos));machine.addAction(TweetState.READY, new ReadyAction(bos));machine.addAction(TweetState.HASHTAG, new CaptureTag(bos, new HashTagStrategy()));machine.addAction(TweetState.NAMETAG, new CaptureTag(bos, new UserNameStrategy()));machine.addAction(TweetState.HTTPCHECK, new CheckHttpAction(bos));machine.addAction(TweetState.URL, new CaptureTag(bos, new UrlStrategy()));From the code above you can see that DefaultAction is linked the OFF and RUNNING states, the ReadyAction is linked to the READY state, the CaptureTag action is linked to the HASHTAG, NAMETAG and URL states and the HttpCheckAction is linked to the HTTPCHECK state. You may have noticed that the CaptureTag action is linked to more that one state. This is fine because the CaptureTag employs the Strategy pattern to change its behaviour on the fly; hence I have one action with some common code that, after injecting a strategy object, can do three things. Writing Actions Getting back to writing actions, the first action to write is usually the DefaultAction, which is the action that’s called when nothing interesting is happening. This action happily takes input characters and puts them into the output stream, whilst looking out for certain characters or character/state combinations. The heart of the DefaultAction is the switch statement in the processByte(...) method. public class DefaultAction extends AbstractAction<TweetState> { public DefaultAction(OutputStream os) {super(os);}/*** Process a byte using this action** @param b* The byte to process* @param currentState* The current state of the state machine*/@Override public TweetState processByte(byte b, TweetState currentState) throws Exception {TweetState retVal = TweetState.RUNNING;// Switch state if a ' ' charif (isSpace(b)) {retVal = TweetState.READY; writeByte(b); } else if (isHashAtStart(b, currentState)) { retVal = TweetState.HASHTAG; } else if (isNameAtStart(b, currentState)) { retVal = TweetState.NAMETAG; } else if (isUrlAtStart(b, currentState)) { retVal = TweetState.HTTPCHECK;} else { writeByte(b);}return retVal;}private boolean isSpace(byte b) {return b == ' ';}private boolean isHashAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == '#');}private boolean isNameAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == '@');}private boolean isUrlAtStart(byte b, TweetState currentState) {return (currentState == TweetState.OFF) && (b == 'h');}} From the code above you can see that the central switch statement is checking each byte. If the byte is a space, then the next byte maybe a special character: ‘#’ for the start of a hashtag, ‘@’ for the start of a name tag and ‘h’ for the start of a URL; hence, if a space is found then the DefaultAction returns the READY state as there may be more work to do. If a space isn’t found then it returns a RUNNING state which tells StateMachine to call the DefaultAction when the next byte is read. The DefaultAction also checks for special characters at the start of a line as the first character of a tweet maybe a ‘#’, ‘@’ or ‘h’. Control has now been passed back to the StateMachine object, which reads the next byte from the input stream. As the state is now READY, the next call to processByte(...) retrieves the ReadyAction. @Override public TweetState processByte(byte b, TweetState currentState) throws Exception {TweetState retVal = TweetState.RUNNING;switch (b) {case '#': retVal = TweetState.HASHTAG; break;case '@': retVal = TweetState.NAMETAG; break;case 'h': retVal = TweetState.HTTPCHECK; break;default: super.writeByte(b); break; }return retVal;} From ReadyAction’s switch statement you can see that its responsibility is to confirm that the code has found a hashtag, name or URL by checking for a ‘#’, ‘@’ and ‘h’ respectively. If it finds one then it returns one of the following states: HASHTAG, NAMETAG or HTTPCHECK to the StateMachine Assuming that the ReadyAction found a ‘#’ character and returned a HASHTAG state, then StateMachine, when it reads the next byte, will pull the CaptureTag class with the injected HashTagStrategy class from the stateActionMap public class CaptureTag extends AbstractAction<TweetState> {private final ByteArrayOutputStream tagStream; private final byte[] buf; private final OutputStrategy output; private boolean terminating;public CaptureTag(OutputStream os, OutputStrategy output) {super(os); tagStream = new ByteArrayOutputStream(); buf = new byte[1]; this.output = output; }/*** Process a byte using this action * @param b * The byte to process * @param currentState * The current state of the state machine */@Override public TweetState processByte(byte b, TweetState currentState) throws Exception { TweetState retVal = currentState;if (b == ' ') {retVal = TweetState.READY; // fix 1 output.build(tagStream.toString(), os); if (!terminating) { super.writeByte(' '); }reset();} else { buf[0] = b; tagStream.write(buf); }return retVal;}/*** Reset the object ready for processing*/public void reset() { terminating = false; tagStream.reset();}@Override public void terminate(TweetState state) throws Exception {terminating = true; processByte((byte) ' ', state);}} The idea behind the CaptureTag code is that it captures characters adding them to a ByteArrayOutputStream until it detects a space or the input buffer is empty. When a space is detected, the CaptureTag call its OutputStrategy interface, which in this case is implemented by HashTagStrategy. public class HashTagStrategy implements OutputStrategy { /** * @see state_machine.tweettohtml.OutputStrategy#build(java.lang.String, * java.io.OutputStream) */@Override public void build(String tag, OutputStream os) throws IOException {String url = "<a href=\"https://twitter.com/#!/search/%23" + tag + "\">#" + tag + "</a>"; os.write(url.getBytes());} }The HashTagStrategy builds a hashtag search URL and writes it to the output stream. Once the URL has been written to the stream, the CaptureTag returns a state of READY – as a space has been detected and returns control to the StateMachine. The StateMachine reads the next byte and so the process continues. Processing a hashtag is only one of several possible scenarios that this code can handle and in demonstrating this scenario I’ve tried to demonstrate how a state machine can be used to process an input stream a byte at a time in order to realize some predefined solution. If you’re interested in how the other scenarios are handled take a look at the source code on github In Summary In summary, this isn’t a technique that you’d want to use on a regular basis; it’s complex, pretty hard to implement and prone to error, plus there’s usually a simpler way of parsing incoming data. However, there are those odd few times when it is useful, when, despite its complexity, it is a good solution, so I’d recommend keeping it in your metaphorical toolbox and saving it for a rainy day. 1There are several ways of solving this puzzle some of which may be simpler and less complex than State Machine 2This version of StateMachine was written in 2006 to process XML. In this scenario the code had to unzip some Base 64 XML fields and as the pattern was re-usable I just dug it out of my toolbox of code samples for the Tweet to HTML case. 3The complete project is available on github … Reference: Implementing the State Machine Pattern as a Stream Processor from our JCG partner Roger Hughes at the Captain Debug’s Blog blog....
jboss-hibernate-logo

The Future of NoSQL with Java EE

I’ve been following the recent NoSQL momentum since some time now and it seems as if this buzzword also is drawing some kind of attention in the enterprise java world. Namely EclipseLink 2.4 started supporting MongoDB and Oracle NoSQL. Having EclipseLink as the JPA reference implementation you might wonder what this means for Java EE 7. A short side-note here: Even if I am part of the JSR-342 EG this isn’t meant to be an official statement. In the following I simply try to summarize my own personal experiences and feelings towards NoSQL support with future Java EE versions. A big thank you goes out to Emmanuel Bernard for providing early feedback! Happy to discuss what follows: What is NoSQL? NoSQL is a classification of database systems that do not conform to the relational database or SQL standard. Most often they are categorized according to the way they store the data and fall under categories such as key-value stores, BigTable implementations, document store databases, and graph databases. In general the term isn’t well enough defined to reduce it to a single supporting JSR or technology. So the only way to find suitable integration technologies is to dig through every single category. Key/Value Stores Key/Value stores allow data storage in a schema-less way. It could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is obviously comparable to parts of JSR 338 (Java Persistence 2.1) and JSR 347 ( Data Grids for the Java Platform) and also to what is done with JSR 107( JCACHE – Java Temporary Caching API). with native JPA2 Also primary aimed at caching is the JPA L2 Cache. The JPA Cache API is good for basic cache operations, while L2 cache shares the state of an entity — which is accessed with the help of the entity manager factory — across various persistence contexts. Level 2 cache underlies the persistence context, which is highly transparent to the application. When Level 2 cache is enabled, the persistence provider will look for the entities in the persistence context first. If it does not find them there, the persistence provider will look in the Level 2 cache next instead of sending a query to the database. The drawback here obviously is, that as of today this only works with NoSQL as some kind of “Cache”. And not as a replacement for the RDBMS data store. Given the scope of this spec it would be a good fit: But I strongly believe that JPA is designed to be an abstraction on RDBS and nothing else. If there has to be some kind of support for non relational databases we might end up having a more high level abstraction layer in place which tons of different persistence modes and features (maybe something like Spring Data). Generally mapping at the object level has many advantages including the ability to think object and let the underlying engine drive the de-normalization if needed. So reducing JPA to the caching features probably is the wrong decision. with JCache JCache having a CacheManager that holds and controls a collection of Caches and every single Caches have it’s entries. The basic API can be thought of map-­like with additional features (compare Greg’s blog). With JCache being designed as a “Cache” using it as a standardised interface against NoSQL data stores this isn’t a good fit on the first look. But given the nature of the use-cases for unstructured Key/Value based data with enterprise java this might be the right kind of integration. And the NoSQL concept also allows for the “Key-value cache in RAM” category which is an exact fit for both JCache and DataGrids. with DataGrids This JSR proposes an API for interacting with in-memory and disk-based distributed data grids. The API aims to allow users to perform operations on the data grid (PUT, GET, REMOVE) in an asynchronous and non-blocking manner returning a java.util.concurrent.Futures rather than the actual return values. The process here is not really visible at the moment (at least to me). So there aren’t any examples or concepts for integration of a NoSQL Key/Value store available until today. Beside this the same reservations as for the JCache API are in place. with EclipseLink EclipseLink’s NoSQL support is based on previous EIS support offered since EclipseLink 1.0. EclipseLink’s EIS support allowed persisting objects to legacy and non-relational databases. EclipseLink’s EIS and NoSQL support uses the Java Connector Architecture (JCA) to access the data-source similar to how EclipseLink’s relational support uses JDBC. EclipseLink’s NoSQL support is extendable to other NoSQL databases, through the creation of an EclipseLink EISPlatform class and a JCA adapter. At the moment it supports MongoDB (Document Oriented) and Oracle NoSQL (BigData). It’s interesting to see, that Oracle doesn’t address the Key/Value DBs first. Might be because of the possible confusion with the Cache features (e.g. Coherence). Column based DBs Read and write is done using columns rather than rows. The best known examples are Google’s BigTable and the likes of HBase and Cassandra that were inspired by BigTable. The BigTable paper says that BigTable is a sparse, distributed, persistent, multidimensional sorted Map. GAE for example works only with BigTable. It offers variety of APIs: from “native” low-level API to “native” high-level ones ( JDO and JPA). With the older Datanucleus version used by Google there seem to be a lot of limitations in place which could be removed ( see comments) but still are in place. Document-oriented DBs The Document-oriented DBs are most obviously best addressed by JSR 170 (Content Repository for Java) and JSR 283 (Content Repository for Java Technology API Version 2.0). With JackRabbit as a reference implementation it’s a strong sign for that :) The support for other NoSQL document stores is non existent as of today. Even Apache’s CouchDB doesn’t provide a JSR 170/283 compliant way of accessing the documents. The only drawback is that both JSR’s aren’t sexy or bleeding edge. But for me this would be the right bucket to put support for document-oriented DBs.Flip side of the medal?The content repository API isn’t exactly a natural model for an application. Does an app really want to deal with Nodes and attributes in Java?The notion of a domain model works nicely for many apps and if there is no chance to use it, you probably would be better off going native and use the MondoDB driver directly. Graph oriented DBs This kind of databases are thought for data whose relations are well represented with a graph-style (elements interconnected with an undetermined number of relations between them). Aiming primarily at any kind of network topology the recently rejected JSR 357 (Social Media API) would have been a good place to put support. At least from a use-case point of view. If those graph-oriented DBs are considered as a data-store there are a couple of options. If the Java EE persistence is steering into the direction of a more general data abstraction layer the 338 or it’s successors would be the right place to put support. If you know a little bit about how Coherence works internally and what had to be done to put JPA on top of it you also could consider 347 a good fit for it. With all the drawbacks already mentioned. Another alternative would be to have a separate JSR for it. The most prominent representative of this category is Neo4J which itself has an easy API available to simply include everything you need directly into your project. There is additional stuff to consider if you need to control the Neo4J instance via the application server. Conclusion To sum it up: We already have a lot in place for the so-called “NoSQL” DBs. And the groundwork for integrating this into new Java EE standards is promising. Control of embedded NoSQL instances should be done via JSR 322 (Java EE Connector Architecture) with this being the only allowed place spawn threads and open files directly from a filesystem. I’m not a big supporter of having a more general data abstraction JSR for the platform comparable to what Spring is doing with Spring Data. To me the concepts of the different NoSQL categories are too different than to have a one-size-fits-all approach.The main pain point of NoSQL besides the lack of standard API is that users are forced to denormalize and maintain de-normalization by hand. What I would like to see are some smaller changes to both the products to be more Java EE ready and also to the way the integration into the specs is done. Might be a good idea to simply define the different persistence types and generally define the JSRs which could be influenced by this and noSQLing those accordingly. For users willing to facilitate domain model (ie a higher level of abstraction compared to the raw NoSQL API), JPA might be the best vehicle for that at the moment. The feedback from both EclipseLink and Hibernate OGM users is needed to value what is working and what not. From a political point of view it might also make sense to pursue 347. Especially since main big players are present here already. The really hard part is querying.Should there be standardised query APIs for each family? With Java EE? Or would that better be placed within the NoSQL space? Would love to read your feedback on this! Reference: The Future of NoSQL with Java EE from our JCG partner Markus Eisele at the Enterprise Software Development with Java blog....
java-logo

Java 7: Closing NIO.2 file channels without loosing data

Closing an asynchronous file channel can be very difficult. If you submitted I/O tasks to the asynchronous channel you want to be sure that the tasks are executed properly. This can actually be a tricky requirement on asynchronous channels for several reasons. The default channel group uses deamon threads as worker threads, which isn’t a good choice, cause these threads just abandon if the JVM exits. If you use a custom thread pool executor with non-deamon threads you need to manage the lifecycle of your thread pool yourself. If you don’t the threads just stay alive when the main thread exits. Hence, the JVM actually does not exit at all, what you can do is kill the JVM. Another issue when closing asynchronous channels is mentioned in the javadoc of AsynchronousFileChannel: “Shutting down the executor service while the channel is open results in unspecified behavior.” This is because the close() operation on AsynchronousFileChannel issues tasks to the associated executor service that simulate the failure of pending I/O operations (in that same thread pool) with an AsynchronousCloseException. Hence, you’ll get RejectedExecutionException if you perform close() on an asynchronous file channel instance when you previously closed the associated executor service. That all being said, the proposed way to safely configure the file channel and shutdown that channel goes like this: public class SimpleChannelClose_AsynchronousCloseException {private static final String FILE_NAME = "E:/temp/afile.out"; private static AsynchronousFileChannel outputfile; private static AtomicInteger fileindex = new AtomicInteger(0); private static ThreadPoolExecutor pool = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());public static void main(String[] args) throws InterruptedException, IOException, ExecutionException { outputfile = AsynchronousFileChannel.open( Paths.get(FILE_NAME), new HashSet<StandardOpenOption>(Arrays.asList(StandardOpenOption.WRITE, StandardOpenOption.CREATE,StandardOpenOption.DELETE_ON_CLOSE)), pool); List<Future<Integer>> futures = new ArrayList<>(); for (int i = 0; i < 10000; i++) { futures.add(outputfile.write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5)); } outputfile.close(); pool.shutdown(); pool.awaitTermination(60, TimeUnit.SECONDS); for (Future<Integer> future : futures) { try { future.get(); } catch (ExecutionException e) { System.out.println("Task wasn't executed!"); } } } }The custom thread pool executor service is defined in lines 6 and 7. The file channel is defined in lines 10 to 13. In the lines 18 to 20 the asynchronous channel is closed in an orderly manner. First the channel itself is closed, then the executor service is shutdown and last not least the thread awaits termination of the thread pool executor. Although this is a safe way to close a channel with a custom executor service, there’s a new issue introduced. The clients submitted asynchronous write tasks (line 16) and may want be sure that, once they’ve been submitted successfully, those tasks will definitely be executed. Always waiting for Future.get() to return (line 23), isn’t an option, cause in many cases this would lead *asynchronous* file channels ad adsurdum. The snippet above will return lot’s of “Task wasn’t executed!” messages cause the channel is closed immediately after the write operations were submitted to the channel (line 18). To avoid such ‘data loss’ you can implement your own CompletionHandler and pass that to the requested write operation. public class SimpleChannelClose_CompletionHandler { ... public static void main(String[] args) throws InterruptedException, IOException, ExecutionException { ... outputfile.write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5, "", defaultCompletionHandler); ... }private static CompletionHandler<integer, string=""> defaultCompletionHandler = new CompletionHandler<Integer, String>() { @Override public void completed(Integer result, String attachment) { // NOP }@Override public void failed(Throwable exc, String attachment) { System.out.println("Do something to avoid data loss ..."); } }; }The CompletionHandler.failed() method (line 16) catches any runtime exception during task processing. You can implement any compensation code here to avoid data loss. When you work on mission critical data, then it may be a good idea to use CompletionHandlers. But *still* there’s another issue. The clients can submit tasks but they don’t know if the pool will successfully process these tasks. Successful in this context means that the bytes submitted actually reach their destination (the file on the hard disk). If you want to be sure that all submitted tasks are actually processed before closing, it gets a little trickier. You need a ‘graceful’ closing mechanism, that waits until the work queue is empty *before* it actually closes the channel and the associated executor service (this isn’t possible using standard lifecycle methods).Introducing GracefulAsynchronousChannel My last snippets introduce the GracefulAsynchronousFileChannel. You can get the complete code here in my Git repository. The behaviour of that channel is like this: guarantee to process all successfully submitted write operations and throw an NonWritableChannelException if the channel prepares shutdown. It takes two things to implement that behaviour. Firstly, you’ll need to implement the afterExecute() in an extension of ThreadPoolExecutor that sends a signal when the queue is empty. This is what DefensiveThreadPoolExecutor does. private class DefensiveThreadPoolExecutor extends ThreadPoolExecutor {public DefensiveThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, LinkedBlockingQueue<Runnable> workQueue, ThreadFactory factory, RejectedExecutionHandler handler) { super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, factory, handler); }/** * "Last" task issues a signal that queue is empty after task processing was completed. */ @Override protected void afterExecute(Runnable r, Throwable t) { if (state == PREPARE) { closeLock.lock(); // only one thread will pass when closer thread is awaiting signal try { if (getQueue().isEmpty() && state < SHUTDOWN) { System.out.println("Issueing signal that queue is empty ..."); isEmpty.signal(); state = SHUTDOWN; // -> no other thread can issue empty-signal } } finally { closeLock.unlock(); } } super.afterExecute(r, t); } }The afterExecute() method (line 12) is executed after each processed task by the thread that processed that given task. The implementation sends the isEmpty signal in line 18. The second part you need two gracefully close a channel is a custom implementation of the close() method of AsynchronousFileChannel. /** * Method that closes this file channel gracefully without loosing any data. */ @Override public void close() throws IOException { AsynchronousFileChannel writeableChannel = innerChannel; System.out.println("Starting graceful shutdown ..."); closeLock.lock(); try { state = PREPARE; innerChannel = AsynchronousFileChannel.open(Paths.get(uri), new HashSet<StandardOpenOption>(Arrays.asList(StandardOpenOption.READ)), pool); System.out.println("Channel blocked for write access ..."); if (!pool.getQueue().isEmpty()) { System.out.println("Waiting for signal that queue is empty ..."); isEmpty.await(); System.out.println("Received signal that queue is empty ... closing"); } else { System.out.println("Don't have to wait, queue is empty ..."); } } catch (InterruptedException e) { Thread.interrupted(); throw new RuntimeException("Interrupted on awaiting Empty-Signal!", e); } catch (Exception e) { throw new RuntimeException("Unexpected error" + e); } finally { closeLock.unlock(); writeableChannel.force(false); writeableChannel.close(); // close the writable channel innerChannel.close(); // close the read-only channel System.out.println("File closed ..."); pool.shutdown(); // allow clean up tasks from previous close() operation to finish safely try { pool.awaitTermination(1, TimeUnit.MINUTES); } catch (InterruptedException e) { Thread.interrupted(); throw new RuntimeException("Could not terminate thread pool!", e); } System.out.println("Pool closed ..."); } }Study that code for a while. The interesting bits are in line 11 where the innerChannel gets replaced by a read-only channel. That causes any subsequent asynchronous write requests to fail with an NonWritableChannelException. In line 16 the close() method waits for the isEmpty signal to happen. When this signal is send after the last write task the close() method continues with an orderly shutdown procedure (line 27 ff.). Basically, the code adds a shared lifecycle state across the file channel and the associated thread pool. That way both objects can communicate during the shutdown procedure and avoid data loss. Here is a logging client that uses the GracefulAsynchronousFileChannel. public class MyLoggingClient { private static AtomicInteger fileindex = new AtomicInteger(0); private static final String FILE_URI = "file:/E:/temp/afile.out";public static void main(String[] args) throws IOException { new Thread(new Runnable() { // arbitrary thread that writes stuff into an asynchronous I/O data sink@Override public void run() { try { for (;;) { GracefulAsynchronousFileChannel.get(FILE_URI).write(ByteBuffer.wrap("Hello".getBytes()), fileindex.getAndIncrement() * 5); } } catch (NonWritableChannelException e) { System.out.println("Deal with the fact that the channel was closed asynchronously ... " + e.toString()); } catch (Exception e) { e.printStackTrace(); } } }).start();Timer timer = new Timer(); // asynchronous channel closer timer.schedule(new TimerTask() { public void run() { try { GracefulAsynchronousFileChannel.get(FILE_URI).close(); long size = Files.size(Paths.get("E:/temp/afile.out")); System.out.println("Expected file size (bytes): " + (fileindex.get() - 1) * 5); System.out.println("Actual file size (bytes): " + size); if (size == (fileindex.get() - 1) * 5) System.out.println("No write operation was lost!"); Files.delete(Paths.get("E:/temp/afile.out")); } catch (IOException e) { e.printStackTrace(); } } }, 1000);} }The client starts two threads, one thread issues write operations in an infinite loop (line 6 ff.). The other thread closes the file channel asynchronously after one second of processing (line 25 ff.). If you run that client, then the following output is produced: Starting graceful shutdown ... Deal with the fact that the channel was closed asynchronously ... java.nio.channels.NonWritableChannelException Channel blocked for write access ... Waiting for signal that queue is empty ... Issueing signal that queue is empty ... Received signal that queue is empty ... closing File closed ... Pool closed ... Expected file size (bytes): 400020 Actual file size (bytes): 400020 No write operation was lost!The output shows the orderly shutdown procedure of participating threads. The logging thread needs to deal with the fact that the channel was closed asynchronously. After the queued tasks are processed the channel resources are closed. No data was lost, everything that the client issued was really written to the file destination. No AsynchronousClosedExceptions or RejectedExecutionExceptions in such a graceful closing procedure. That’s all in terms of safely closing asynchronous file channels. The complete code is here in my Git repository. I hope you’ve enjoyed it a little. Looking forward to your comments. Reference: “Java 7: Closing NIO.2 file channels without loosing data” from our JCG partner Niklas....
netbeans-logo

NetBeans 7.1: Create a Custom Hint

I have talked about some of my favorite NetBeans hints in the posts Seven NetBeans Hints for Modernizing Java Code and Seven Indispensable NetBeans Java Hints. The fourteen hints covered in those two posts are a small fraction of the total number of hints that NetBeans supports “out of the box.” However, even greater flexibility is available to the NetBeans user because NetBeans 7.1 makes it possible to write custom hints. I look at a simple example of this in this post. Geertjan Wielenga‘s post Custom Declarative Hints in NetBeans IDE 7.1 begins with coverage of NetBeans’s “Inspect and Transform” (AKA “Inspect and Refactor“) dialog, which is available from the “Refactor” menu (which in turn is available via the dropdown “Refactor” menu along the menu bar or via right-click in the NetBeans editor). The following screen snapshot shows how this looks.The “Inspect” field of the “Inspect and Transform” dialog allows the NetBeans user to tailor which project or file should be inspected. The “Use” portion of the “Inspect and Transform” dialog allows that NetBeans user to specify which hints to inspect for. In this case, I am inspecting using custom hints and I can see that by clicking on the “Manage” button and selecting the “Custom” checkbox. Note that if “Custom” is not an option when you first bring this up, you probably need to click the “New” button in the bottom left corner. When I click on “Manage” and check the “Custom” box, it expands and I can see the newly created “Inspection” hint. If I click on this name, I can rename it and do so in this case. The renamed inspection (“CurrentDateDoesNotNeedSystemCurrentMillis”) is shown in the next screen snapshot.To create the hint and provide the description seen in the box, I can click on the “Edit Script” button. Doing so leads to the small editor window shown in the next screen snapshot.If more space is desired for editing the custom inspection/hint, the “Open in Editor” button will lead to the text being opened in the NetBeans text editor in which normal Java code and XML code is edited.With the custom inspection/hint in place, it’s time to try it out on some Java code. The following code listing uses an extraneous call to System.currentTimeMillis() and passes its result to the java.util.Date single long argument constructor. This is unnecessary because Date’s no-arguments constructor will automatically instantiate an instance of Date based on the current time (time now). RedundantSystemCurrentTimeMillis.java package dustin.examples;import static java.lang.System.out; import java.util.Date;/** * Simple class to demonstrate NetBeans custom hint. * * @author Dustin */ public class RedundantSystemCurrentTimeMillis { public static void main(final String[] arguments) { final Date date = new Date(System.currentTimeMillis()); out.println(date); } }The above code works properly, but could be more concise. When I tell NetBeans to associate my new inspection with this project in the “Inspect and Transform” dialog, NetBeans is able to flag this for me and recommend the fix. The next three screen snapshots demonstrate that NetBeans will flag the warning with the yellow light bulb icon and yellow underlining, will recommend the fix when I click on the light bulb, and implements the suggested fix when I select it.As the above has shown, a simple custom hint allows NetBeans to identify, flag, and fix at my request the unnecessary uses of System.curentTimeMillis(). I’ve written before that NetBeans’s hints are so handy because they do in fact do three things for the Java developer: automatically flag areas for code improvement for the developer, often automatically fix the issue if so desired, and communicate better ways of writing Java. For the last benefit in this case, the existence of this custom hint helps convey to other Java developers a little more knowledge about the Date class and a better way to instantiate it when current date/time is desired. The most difficult aspect of using NetBeans’s custom hints is finding documentation on how to use them. The best sources currently available appear to be the NetBeans 7.1 Release Notes, several Wielenga posts (Custom Declarative Hints in NetBeans IDE 7.1, Oh No Vector!, Oh No @Override! / Oh No Utilities.loadImage!), and Jan Lahoda‘s jackpot30 Rules Language (covers the rules language syntax used by the custom inspections/hints and shown in the simple example above). The Refactoring with Inspect and Transform in the NetBeans IDE Java Editor tutorial also includes a section on managing custom hints. Hopefully, the addressing of Bug 210023 will help out with this situation. My example custom NetBeans hint works specifically with the Date class. An interesting and somewhat related StackOverflow thread asks if a NetBeans custom hint could be created to recommend use of Joda Time instead of Date or Calendar. A response on that thread refers to the NetBeans Java Hint Module Tutorial. Looking over that tutorial reminds me that the approach outlined in this post and available in NetBeans 7.1 is certainly improved and easier to use.Incidentally, a hint like that asked for in the referenced StackOverflow thread is easy to write in NetBeans 7.1. There is no transform in this example because a change of the Date class to a Joda Time class would likely require more changes in the code than the simple transform could handle. This hint therefore becomes one that simply recommends changing to Joda Time. The next screen snapshots show the simple hint and how they appear in the NetBeans editor. Each release of NetBeans seems to add more useful hints to the already large number of helpful hints that NetBeans supports. However, it is impossible for the NetBeans developers to add every hint that every team or project might want. Furthermore, it is not desirable to have every possible hint that every community member might come up with added to the IDE. For this reason, the ability to specify custom hints in NetBeans and the ability to apply those hints selectively to projects and files are both highly desirable capabilities. Reference: Creating a NetBeans 7.1 Custom Hint from our JCG partner Dustin Marx at the Inspired by Actual Events blog....
java-logo

What’s Cooking in Java 8 – Project Jigsaw

What is Project Jigsaw: Project Jigsaw is the project to make the java compiler module aware. For years java API has been monolithic, i.e. the whole API was seen from any part of the code equally. There has also not been any way to declare a code’s dependency on any other user libraries. Project Jigsaw attempts to solve these problems along with others in a very eligant way. In this article, I will highlight the basic concepts of Jigsaw module systems and also explain how it would work with the commands so as to provide a real feel of it. Currently, Jigsaw is targetted to be included in the release of Java 8. In my opinion, this is a change bigger than generics that came with verion 5 of java platform. What is Achieved by Project Jigsaw: As I explained earlier, project Jigsaw solves the problem of the whole java API being used as a single monolithic codebase. The following points highlight the main advantages. 1. Dependency Graph: Jigsaw gives a way to uniquely identify a particular codebase, and also to declare a codebase’s dependencies on other codebases. This creates a complete dependency graph for a particular set of classes. Say for example, you want to write a program that depends on Apache BCEL library. Until now, there was no way for you to express this requirement in the code itself. Using Jigsaw, you can express this requirement in the code itself, allowing tools to resolve this dependency. 2. Multiple Versions of the Same Code: Suppose you write a program that depends on both libray A and library B. Now suppose library A depends on version 1.0 of library C and library B depends on version 2.0 of library C. In the current java runtime, you cannot use library A and B at the same time without creating a complex hierarchy of custom classloaders, even that would not work in all cases. After Jigsaw becomes part of java, this is not a problem as a class will be able to see only the versions of its dependent classes that are part of the module versions required by the classes container module. That is to say, since module A depends on version 1.0 of module C, and module B depends on version 2.0 of module C, the java runtime can figure out which version of the classes in module C to be seen by either module A or module B. This is something similar to OSGi project. 3. Modularization of Java Platform Itself: The current java platform API is huge and not all parts of it may be relevant in every case. For example, a java platform intended to run a Java EE server does not have to implement the Swing API as that would not make any sense. Similarly, embedded environments can stripdown some not so important APIs (for embedded) like compiler API to make it smaller and faster. Under current java platform, its not possible as any certified java platform must implement all the APIs. Jigsaw will provide a way to implement only a part of the API set relevant to the particular platform. Since a module can explicitly declare its dependency on any particular java API module, it will be run only when the platform has an implementation of the modules requred by the module. 4. Integration with OS native installation: Since the module system is very similar to what is currently available for installation of programs and libraries in modern operating systems, the java modules can be integrated with those systems. These are in fact out of the scope of Jigsaw project itself, but the OS vendors are encouraged to enable this and they would most likely do so. For example, the rpm based repository system available in Redhat based linux systems and apt based repository systems available in Debian based linux systems can easily be enhanced to support java module systems. 5. Module Entry Point: Java modules can specify an entry point class just like the jars can specify it. When a module is run, the entry-point’s main method is invoked. Now since the OS can now install a java module and the java module can be executed, its very similar to installing an OS’s native program. 5. Efficiency: Currenly, every time a JVM is run, it verifies the integrity of every single class that is loaded during the run of the program. This takes a considerable amount of time. Also the classes are accessed individually from the OS file system. Since modules can be installed before running, the installation itself can now include the verification step which will eliminate the need to verify the classes at runtime. This will lead to considerable performance improvement. Also, the module system can store the classes in its own optimized manner leading to further improvement in the performance. 6. Module Abstraction: It is possible to provide an abstraction for a particular module. Say module A depends on module X. Now module D can provide for module X thus providing its implementation. For example, the Apache Xerces modules would want to provide for jdk.jaxp module and would be able to satisfy a dependency requirement for jdk.jaxp. Basics of Modular Codebase: All the above discussion are pretty vague without a real example of modular codebase and its usage. A modular codebase can either be single module or multi-module. In case of single module, all we need to enable module is to create a file named module-info.java at the base of the source path, outside any package. The module-info.java file is a special java file written in a special syntax designed to declare module information. The following is an example of such a mdoule-info.java. module com.a @ 1.0{requires com.b @ 1.0; class com.a.Hello; }In this case the module is named com.a and it has got a dependency on com.b. It also declares an entry point com.a.Hello. Note that it is not required that the package structure ressembles the module name, although that would probably be a best practice. Now you might be thinking that if it is a single module mode, then why is there a dependency on a different module, does not that make it two modules. Notice that even if there is only one explicit declaration of a dependency module, there is implicit dependency on all java API modules. If none of the java API modules are declared explicitly as dependencies, all of the them are included. The only reason its still single module is that the com.b must be available in binary form in the module library. Its multi-module when more than one module is being compiled at the same time. Compiling a source in single module is as simple as how we compile a non-modular source. Only difference is that module-info.java will be present in the source root. Multi-module Source: In case the source contains multiple modules, they must be given a directory structure. Its pretty simple though. The source under a particular module must be kept in a directory of the name of the module. For example, the source for the class com.a.Hello in the module com.a must be kept in [source-root]/com.a/com/a/Hello.java and the module-info.java must be kept in the directory [source-root]/com.a Compiling Multi-module Source: For this let us consider an example of compiling two modules com.a and com.b. Let us first take a look at the directory structure. as below: classes src |--com.a | |--module-info.java | |--com | |--a | |--Hello.java |--com.b |--module-info.java |--com |--b |--Printer.javaThe code for module-info.java in com.a would be like this. module com.a @ 1.0{requires com.b @ 1.0; class com.a.Hello; }The module-info.java in com.b module com.b @ 1.0{ exports com.b; }Printer.java in com.b/com/b package com.b;public class Printer{ public static void print(String toPrint){ System.out.println(toPrint); } }Hello.java in com.a/com/a package com.a; import com.b.Printer;public class Hello{ public static void main(String [] args){ Printer.print("Hello World!"); } }The codes are pretty self explanatory, we are trying to use com.b.Printer class in module com.b from com.a.Hello class in module com.a. For this, its mandatory for com.a module-info.java to declare com.b as a dependency with the requires keyword. We are trying to create the output class files in the classes directory. The following javac command would do that. javac -d classes -modulepath classes -sourcepath src `find src -name '*.java'` Note that we have used find command in backquotes(`) so that the command’s output will be included as the file list. This will work in linux and unix environments. In case of others we might simply type in the list of files. After compilation, classes directory will have a similar structure of classes. Now we can install the modules using jmod command. jmod create -L mlib jmod install -L mlib classes com.b jmod install -L mlib classes com.aWe first created a module library mlib and installed our modules in the library. We could also have used the default library by not specifying the -L option to the install command in jmod. Now we can simply run module com.a using java -L mlib -m com.aHere too we could have used the default module. It is also possible to create a distributable module package [equivalent to a jar in today's distribution mechanism] that can directly be installed. For example, the following will create com.a@1.0.jmod for com.a jpkg -m classes/com.a jmod com.aI have tried to outline the module infrastructure in the upcoming java release. However project Jigsaw is being modified everyday and can turn up to be a completely differnt being altogether at the end. But it is expected that the basic concepts would still remain the same. The total module concepts are more complex and I will cover the details in an upcoming article. Reference: What’s Cooking in Java 8 – Project Jigsaw from our JCG partner Debasish Ray Chawdhuri at the Geeky Articles blog....
scala-logo

Scala Basic XML processing

Introduction Pretty much everybody knows what XML is: it is a structured, machine-readable text format for representing information that can be easily checked for the “grammaticality” of the tags, attributes, and their relationship to each other (e.g. using DTD’s). This contrasts with HTML, which can have elements that don’t close (e.g. <p>foo<p>bar rather than <p>foo</p><p>bar</p>) and still be processed. XML was only ever meant to be a format for machines, but it morphed into a data representation that many people ended up (unfortunately, for them) editing by hand. However, even as a machine readable format it has problems, such as being far more verbose than is really required, which matters quite a bit when you need to transfer lots of data from machine to machine — in the next post, I’ll discuss JSON and Avro, which can be viewed as evolutions of what XML was intended for and which work much better for lots of the applications that matter in the “big data” context. Regardless, there is plenty of legacy data that was produced as XML, and there are many communities (e.g. the digital humanities community) who still seem to adore XML, so people doing any reasonable amount of text analysis work will likely find themselves eventually needing to work with XML-encoded data. There are a lot of tutorials on XML and Scala — just do a web search for “Scala XML” and you’ll get them. As with other blog posts, this one is aimed at being very explicit so that beginners can see examples with all the steps in them, and I’ll use it to set up a JSON processing post. A simple example of XML To start things off, let’s consider a very basic example of creating and processing a bit of XML. The first thing to know about XML in Scala is that Scala can process XML literals. That is, you don’t need to put quotes around XML strings — instead, you can just write them directly, and Scala will automatically interpret them as XML elements (of type scala.xml.Element). scala> val foo = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo> foo: scala.xml.Elem = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>Now let’s do a little bit of processing on this. You can get all the text by using the text method. scala> foo.text res0: String = hi1yellowSo, that munged all the text together. To get them printed out with spaces between, let’s first get all the bar nodes and then get their texts and use mkString on that sequence. To get the bar nodes, we can use the \ selector. scala> foo \ "bar" res1: scala.xml.NodeSeq = NodeSeq(<bar type="greet">hi</bar>, <bar type="count">1</bar>, <bar type="color">yellow</bar>)This gives us back a sequence of the bar nodes that occur directly under the foo node. Note that the \ operator (selector) is just a mirror image of the / selector used in XPath. Of course, now that we have such a sequence, we can map over it to get what we want. Since the text method returns the text under a node, we can do the following. scala> (foo \ "bar").map(_.text).mkString(" ") res2: String = hi 1 yellowTo grab the value of the type attribute on each node, we can use the \ selector followed by “@type”. scala> (foo \ "bar").map(_ \ "@type") res3: scala.collection.immutable.Seq = List(greet, count, color)(foo \ "bar").map(barNode => (barNode \ "@type", barNode.text)) res4: scala.collection.immutable.Seq[(scala.xml.NodeSeq, String)] = List((greet,hi), (count,1), (color,yellow))Note that the \ selector can only retrieve children of the node you are selecting from. To dig arbitrarily deep to pull out all nodes of a given type no matter where they are, use the \\ selector. Consider the following (bizarre) XML snippet with ‘z’ nodes at different levels of embedding. <a> <z x="1"/> <b> <z x="2"/> <c> <z x="3"/> </c> <z x="4"/> </b> </a>Let’s first put it into the REPL. scala> val baz = <a><z x="1"/><b><z x="2"/><c><z x="3"/></c><z x="4"/></b></a> baz: scala.xml.Elem = <a><z x="1"></z><b><z x="2"></z><c><z x="3"></z></c><z x="4"></z></b></a>If we want to get all of the ‘z’ nodes, we do the following. scala> baz \\ "z" res5: scala.xml.NodeSeq = NodeSeq(<z x="1"></z>, <z x="2"></z>, <z x="3"></z>, <z x="4"></z>)And we can of course easily dig out the values of the x attributes on each of the z’s. scala> (baz \\ "z").map(_ \ "@x") res6: scala.collection.immutable.Seq = List(1, 2, 3, 4)Throughout all of the above, we have used XML literals — that is, expressions typed directly into Scala, which interprets them as XML types. However, we usually need to process XML that is saved in a file, or a string, so the scala.xml.XML object has several methods for creating scala.xml.Elem objects from other sources. For example, the following allows us to create XML from a string. scala> val fooString = """<foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>""" fooString: java.lang.String = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>scala> val fooElemFromString = scala.xml.XML.loadString(fooString) fooElemFromString: scala.xml.Elem = <foo><bar type="greet">hi</bar><bar type="count">1</bar><bar type="color">yellow</bar></foo>This Elem is the same as the one created using the XML literal, as shown by the following test. scala> foo == fooElemFromString res7: Boolean = trueSee the Scala XML object for other ways to create XML elements, e.g. from InputStreams, Files, etc. A richer XML example As a more interesting example of some XML to process, I’ve created the following short XML string describing artist, albums, and songs, which you can see in the github gist music.xml. https://gist.github.com/2597611 I haven’t put any special care into this, other than to make sure it has embedded tags, some of which have attributes, and some reasonably interesting content (and some great songs). You should save this in a file called /tmp/music.xml. Once you’ve done that, you can run the following code, which just prints out each artist, album and song, with an indent for each level. val musicElem = scala.xml.XML.loadFile("/tmp/music.xml")(musicElem \ "artist").foreach { artist => println((artist \ "@name").text + "\n") val albums = (artist \ "album").foreach { album => println(" " + (album \ "@title").text + "\n") val songs = (album \ "song").foreach { song => println(" " + (song \ "@title").text) } println } }Converting objects to and from XML One of the use cases for XML is to provide a machine-readable serialization format for objects that can still be easily read, and at times edited, by humans. The process of shuffling objects from memory into a disk-format like XML is called marshalling. We’ve started with some XML, so what we’ll do is define some classes and “unmarshall” the XML into objects of those classes. Put the following into the REPL. (Tip: You can use “:paste” to enter multi-line statements like those below. These will work without paste, but it is necessary to use it in some contexts, e.g. if you define Artist before Song.) case class Song(val title: String, val length: String) { lazy val time = { val Array(minutes, seconds) = length.split(":") minutes.toInt*60 + seconds.toInt } }case class Album(val title: String, val songs: Seq[Song], val description: String) { lazy val time = songs.map(_.time).sum lazy val length = (time / 60)+":"+(time % 60) }case class Artist(val name: String, val albums: Seq[Album])Pretty simple and straightforward. Note the use of lazy vals for defining things like the time (length in seconds) of a song. The reason for this is that if we create a Song object but never ask for its time, then the code needed to compute it from a string like “4:38? is never run; however, if we had left lazy off, then it would be computed when the Song object is created. Also, we don’t want to use a def here (i.e. make time a method) because its value is fixed based on the length string; using a method would mean recomputing time every time it is asked for of a particular object. Given the classes above, we can create and use objects from them by hand. scala> val foobar = Song("Foo Bar", "3:29") foobar: Song = Song(Foo Bar,3:29)scala> foobar.time res0: Int = 209Using the native Scala XML API Of course, we’re more interested in constructing Artist, Album, and Song objects from information specified in files like the music example. Though I don’t show the REPL output here, you should enter all of the commands below into it to see what happens. To start off, make sure you have loaded the file. val musicElem = scala.xml.XML.loadFile("/tmp/music.xml")Now we can work with the file to select various elements, or create objects of the classes defined above. Let’s start with just Songs. We can ignore all the artists and albums and dig straight in with the \\ operator. val songs = (musicElem \\ "song").map { song => Song((song \ "@title").text, (song \ "@length").text) }scala> songs.map(_.time).sum res1: Int = 11311And, we can go all the way and construct Artist, Album and Song objects that directly mirror the data stored in the XML file. val artists = (musicElem \ "artist").map { artist => val name = (artist \ "@name").text val albums = (artist \ "album").map { album => val title = (album \ "@title").text val description = (album \ "description").text val songList = (album \ "song").map { song => Song((song \ "@title").text, (song \ "@length").text) } Album(title, songList, description) } Artist(name, albums) }With the artists sequence in hand, we can do things like showing the length of each album. val albumLengths = artists.flatMap { artist => artist.albums.map(album => (artist.name, album.title, album.length)) } albumLengths.foreach(println)Which gives the following output. (Radiohead,The King of Limbs,37:34) (Radiohead,OK Computer,53:21) (Portished,Dummy,48:46) (Portished,Third,48:50)Marshalling objects to XML In addition to constructing objects from XML specifications (also referred to as de-serializing and un-marshalling), it is often necessary to marshal objects one has constructed in code to XML (or other formats). The use of XML literals is actually quite handy in this regard. To see this, let’s start with the first song of the first album of the first album (Bloom, by Radiohead). scala> val bloom = artists(0).albums(0).songs(0) bloom: Song = Song(Bloom,5:15)We can construct an Elem from this as follows. scala> val bloomXml = <song title={bloom.title} length={bloom.length}/> bloomXml: scala.xml.Elem = <song length="5:15" title="Bloom"></song>The thing to note here is that an XML literal is used, but when we want to use values from variables, we can escape from literal-mode with curly brackets. So, {bloom.title} becomes “Bloom”, and so on. In contrast, one could do it via a String as follows. scala> val bloomXmlString = "<song title=\""+bloom.title+"\" length=\""+bloom.length+"\"/>" bloomXmlString: java.lang.String = <song title="Bloom" length="5:15"/>scala> val bloomXmlFromString = scala.xml.XML.loadString(bloomXmlString) bloomXmlFromString: scala.xml.Elem = <song length="5:15" title="Bloom"></song>So, the use of literals is a bit more readable (though it comes at the cost of making it hard in Scala to use “<” as an operator for many use cases, which is one of the reasons XML literals are considered by many to be not a great idea). We can create the whole XML for all of the artists and albums in one fell swoop. Note that one can have XML literals in the escaped bracketed portions of an XML literal, which allows the following to work. Note: you need to use the :paste mode in the REPL in order for this to work. val marshalled = <music> { artists.map { artist => <artist name={artist.name}> { artist.albums.map { album => <album title={album.title}> { album.songs.map(song => <song title={song.title} length={song.length}/>) } <description>{album.description}</description> </album> }} </artist> }} </music>Note that in this case, the for-yield syntax is perhaps a bit more readable since it doesn’t require the extra curly braces. val marshalledYield = <music> { for (artist <- artists) yield <artist name={artist.name}> { for (album <- artist.albums) yield <album title={album.title}> { for (song <- album.songs) yield <song title={song.title} length={song.length}/> } <description>{album.description}</description> </album> } </artist> } </music>One could of course instead add a toXml method to each of the Song, Album, and Artist classes such that at the top level you’d have something like the following. val marshalledWithToXml = <music> { artists.map(_.toXml) } </music>This is a fairly common strategy. However, note that the problem with this solution is that it produces a very tight coupling between the program logic (e.g. of what things like Songs, Albums and Artists can do) with other, orthogonal logic, like serializing them. To see a way of decoupling such different needs, check out Dan Rosen’s excellent tutorial on type classes. Conclusion The standard Scala XML API comes packaged with Scala, and it is actually quite nice for some basic XML processing. However, it caused some “controversy” in that it was felt by many that the core language has no business providing specialized processing for a format like XML. Also, there are some efficiency issues. Anti-XML is a library that seeks to do a better job of processing XML (especially in being more scalable and more flexible in allowing programmatic editing of XML). As I understand things, Anti-XML may become a sort of official XML processing library in the future, with the current standard XML library being phased out. Nonetheless, many of the ways of interacting with an XML document shown above are similar, so being familiar with the standard Scala XML API provides the core concepts you’ll need for other such libraries. Reference: Basic XML processing with Scala from our JCG partner Jason Baldridge at the Bcomposes blog....
enterprise-java-logo

Dynamic ADF Train: Adding train stops programmatically

I’m going to show how to add train stops to ADF train programmatically “on-the-fly”. In my use-case I have some ticket-booking application. It has a bounded task flow with train model. At the first stop of the train users input number of passengers and at the following stops they input some passengers’ info. The number of stops with passengers’ info has to be changed dynamically depending on the value submitted at the first train stop. So, the result of described behaviour should look like this:The bounded task flow has the following structure:StartView activity is a page fragment where we input number of passengers and DynamicView activity provides a page fragment to input passenger’s info. At the moment we have only one activity for passenger’s info and I will add extra activities if the number of passengers is greater than one. The inputNumberSpinbox in StartView page fragment submits its value to passengersNumber property of some PageFlowScope backing bean and action for the Submit button is a method of the same bean: public class MainTrain { //Extra added train stops private List<ActivityId> dynamicStops = new ArrayList<ActivityId>(); //Value of inputNumberSpinbox private int passengersNumber = 1; public String buttonPress(){ //The number of extra added train stops is greater than needed if (passengersNumber <= dynamicStops.size()) clearExtraStops(); else //The number of extra added train stops is less than needed if (passengersNumber-1 > dynamicStops.size()) addDynamicStops(); return null; }So, by pressing on Submit button we either add some train stops or clear extra stops depending on the value of inputNumberSpinbox. We save all added dynamic stops in dynamicStops list. Let’s have a look at the clearExtraStops() method: private void clearExtraStops() { for (int i = dynamicStops.size(); i >= passengersNumber; i--) { //Get ActivityId to be removed ActivityId removeActivityId = dynamicStops.get(i-1);//Get current train model and remove train stop TrainModel trainModel = TrainUtils.findCurrentTrainModel(); trainModel.getTrainStops().remove(removeActivityId); //Remove activity from task flow definition getTaskFlowDefinition().getActivities().remove(removeActivityId); dynamicStops.remove(i-1); } }The method removes two things: the train stop from the train model and the activity from the task flow definition. The addDynamicStops() method is going to be much more interesting: private void addDynamicStops() { for (int i = dynamicStops.size(); i < passengersNumber - 1; i++) { //Creating new ActivityId ActivityId activityId = new ActivityId(getTaskFlowId(), new StringBuilder("DynamicView").append(i).toString());//The main trick of the post. //We consider DynamicView activity as a base for new train stop and new activity //Get base activity (DynamicView) and its train stop Activity baseActivity = getBaseDynamicActivity(); TrainStopContainer stopContainer = (TrainStopContainer)baseActivity.getMetadataObject(); TrainStop baseTrainStop = stopContainer.getTrainStop();//Create new Activity based on DynamicView but with new ActivityId ActivityImpl activityImpl = new ActivityImpl(baseActivity, activityId); //Add created activity to the task flow definition getTaskFlowDefinition().getActivities().put(activityId, activityImpl);//Create new train stop based on the DynamicView's train stop TrainStopModel trainStopModel = new TrainStopModel( new TrainStopImpl(baseTrainStop, i+2), activityId); //Add created train stop to the train stop model TrainModel trainModel = TrainUtils.findCurrentTrainModel(); trainModel.getTrainStops().put(activityId, trainStopModel); //Add created activity to our list dynamicStops.add(activityId); } } private Activity getBaseDynamicActivity() { ActivityId baseActivityId = new ActivityId(getTaskFlowId(), "DynamicView"); MetadataService metadataService = MetadataService.getInstance(); return metadataService.getActivity(baseActivityId); }private TaskFlowDefinition getTaskFlowDefinition() { MetadataService metadataService = MetadataService.getInstance(); return metadataService.getTaskFlowDefinition(getTaskFlowId()); }private TaskFlowId getTaskFlowId() { ControllerContext controllerContext = ControllerContext.getInstance(); ViewPortContext currentViewPortCtx = controllerContext.getCurrentViewPort(); TaskFlowContext taskFlowCtx = currentViewPortCtx.getTaskFlowContext(); return taskFlowCtx.getTaskFlowId(); }So, the principal trick of this post is to create new activity and train stops basing on existing ones for DynamicView. In order to implement the idea I created two classes: ActivityImpl and TrainStopImpl. The classes are nothing else than just proxy classes implementing Activity and TrainStop interfaces correspondently. They delegates interface implementation to the base instances except some specific methods like getters for Id and DisplayName: public class TrainStopImpl implements TrainStop { //Base instance private TrainStop baseTrainStop; private int mpassNo; private static final String PASSANGER_FORM = "Passenger's data: "; public TrainStopImpl(TrainStop trainStop, int passNo) { baseTrainStop = trainStop; mpassNo = passNo; }//Specific implementation public String getDisplayName() { return new StringBuilder(PASSANGER_FORM).append(mpassNo).toString(); }public String getOutcome() { return baseTrainStop.getOutcome(); }public String getSequential() { return baseTrainStop.getSequential(); }...public class ActivityImpl implements Activity { private Activity baseActivity; private ActivityId mid; public ActivityImpl(Activity activity, ActivityId id) { baseActivity = activity; mid = id; }//Specific implementation public ActivityId getId() { return mid; }public String getType() { return baseActivity.getType(); }public Object getMetadataObject() { return baseActivity.getMetadataObject(); } ...And one more picture for this post, just to show it’s working:That’s all! You can download sample application for JDeveloper 11.1.1.2.0. Reference: Dynamic ADF Train. Adding train stops programmatically from our JCG partner Eugene Fedorenko at the ADF Practice blog....
owasp-logo

AppSensor – Intrusion Detection

Imagine that you have created a nice web application and secured it to your best. Users came, used it and everything was OK until someone stumbled upon vulnerability in your application and used it. Of course, you analyzed logs and found that the bad guy was looking for the vulnerability for weeks until he found one. Creators of AppSensor intrusion detection framework believe that the above situation should not happen. The application should not just lie there and let itself beat with SQL injections, XSS attacks and whatever else. It should take active measures to protect itself. As the average attacker has to make several attempts to find the vulnerability in the application, it should by possible to detect hacking attempts. The idea of in-application intrusion detection framework seems to be quite original. While intrusion detection frameworks are common in network security world, we found no alternative to AppSensor. If you know about one, please let us know. AppSensor Project AppSensor is part of Open Web Application Security Project (OWASP) project. It started as a conceptual framework and evolved into a real library. The project is currently in beta. AppSensor is located in two places:page on OWASP (Open Web Application Security Project) wiki, google code page.The creator wrote short and easily readable book with project overview, recommendations and best practices. The book is worth reading even if you decide to implement your own solution. If you decide to use AppSensor, getting started guide for developers is available on Google code. Overview High level design is nice and simple. AppSensor has detection points, response actions and intrusion detector. Detection points are able to detect ongoing or probable attacks. They are spread through the application and raise app sensor exceptions whenever intrusion condition is met. For example, “GET When Expecting POST” detection point triggers warning whenever page expecting only POST requests, is requested by HTTP method GET. Or “Cross Site Scripting Attempt” detection point raises an exception if the request contains common XSS attack pattern. Intrusion detector analyses all raised exceptions. If it detects an ongoing attack, it decides which response action is needed. Response action may do anything from raising up log level to shutting down whole application. Which action is selected depends on both suspicious activity and application nature. Where trading application would immediately disable user account, discussion forum would only raise log level and warn user to stop the activity. AppSensor is part of Enterprise Security API (ESAPI) project. Out of the box version supports only ESAPI secured projects. However, our example application is secured with Shiro framework. The integration between both frameworks is described in another blog post. Do no Harm While the whole point of intrusion detection system is to detect and stop an ongoing hack attack, it is equally important not to harm innocent users. AppSensor book suggests to divide detection points into two categories:attack events, suspicious events.If a hidden form variable suddenly contains “‘ OR 1=1″”, there is no doubt that someone is hacking the application. The event is considered an attack event and the system may take strong action (log out, disable account) against the user. On the other hand, unexpected “;’” in request parameter is considered only suspicious. Maybe it was a typo. Maybe it was malicious hacker trying to keep low profile. As we do not know, we can conclude that the system is under attack only if the event occurs multiple times on multiple fields. In this case, the system should record the event and raise log level for current user. Stronger action will be taken only after repeated events. Of course, application context does matter too. Unexpected <img> html tag submitted into trading applications ‘last name’ field must be taken seriously. The same prohibited <img> tag in discussion forum can be taken lightly. The user may be trying to add an animation or whatever to his username. It is annoying, but hardly an attack. Example Application We have been testing AppSensor on SimpleShiroSecuredApplication created as part of Apache Shiro series. You can download example from intrusion_detection branch on Github. Use test class RunTestWait test case to run a web server with application deployed on https://localhost:8443/simpleshirosecuredapplication/ URL. The application has seven users: administrator, friendlyrepairman, unfriendlyrepairman, mathematician, physicien, productsales and servicessales. They all share the same password: ‘heslo’. Add AppSensor First, we have to add AppSensor dependency into the application and enable the framework as intrusion detection provider in ESAPI configuration. We have to do this even if we are not using ESAPI in the project. Add AppSensor dependency to pom.xml: org.owasp.appsensor AppSensor 0.1.3.2 jar compileCreate ‘.esapi’ folder inside src/main/resources folder. Hint for Eclipse users: eclipse package explorer does not show files and folders whose names begin with a . (period). To show them, go to the package explorer view and press little down arrow in the upper right corner of the view. Click Filters and uncheck ‘.* resources’ entry. Unpack AppSensor jar file and copy three configuration files.esapi/appsensor.properties – AppSensor configuration, .esapi/validation.properties – ESAPI input validation configuration, .esapi/ESAPI.properties – ESAPI configurationinto resource/.esapi folder. Alternatively, you can download them from Google code repository. Change ESAPI.IntrusionDetector entry in ESAPI.properties file to use AppSensor as intrusion detector: ESAPI.IntrusionDetector = org.owasp.appsensor.intrusiondetection.AppSensorIntrusionDetectorIntegration Some user actions are considered to be threat only if they are repeated by the same user too many times. Moreover, intrusion detector may decide that ‘log out user’ or ‘disable account’ response actions are needed. All that requires access to underlying application data and APIs. Default AppSensor configuration assumes that application supports ESAPI interfaces. However, SimpleShiroSecuredApplication is secured by Shiro framework which has different set of features. As the integration between two frameworks is more about Shiro than about AppSensor, we moved it to separate post. Detection Points OWASP page hosts extensive list of suggested detection points. It contains almost everything possible. Detection points are divided into several categories:request and session – manipulation with URL, request, cookies, … input – user input contains common XSS attack, unusual character, entered text is longer than input field size, … honey trap – otherwise unused hidden field or request variable with tempting name is_admin is modified, url available only in html comment is requested, … user or system trend – too many log outs across the site, user clicks faster than is humanly possible, user changes his first name field too often, … …A lot of effort went to finding out detection points that could be triggered by an innocent activity. Most of these detection points are not implemented yet. The class AttackDetectorUtils contains the majority of implemented detection points:unexpected request method, xss attack, sql injection, null byte in request parameter, cookies presence, carriage return or line feed presence.Additionally, two servlet filters are available:validate whether user agent changed mid session (UserAgentChangeDetectionFilter), validate whether IP address changed mid session (IpAddressChangeDetectionFilter).Beware: be careful, xss and sql injection detection points need to be reconfigured before use (more about it later). Adding Detection Points Each detection point has unique code. If the application detects an intrusion or suspect event, it can use the code to raise an AppSensorException: if (intrusionDetected()) { new AppSensorException("CODE", "Message to user.", "Log message."); };Note: there is no ‘throw’ clause. Exception constructor automatically triggers it. Alternatively, prepared detection points are available in AttackDetectorUtils class: AttackDetectorUtils.verifyXSSAttack(inputValue)or you can configure servlet filters in web.xml: IpChangedDetectionPoint org.owasp.appsensor.filters.IpAddressChangeDetectionFilter IpChangedDetectionPoint /*Each detection point has intrusion threshold and associated response actions. Threshold is a number of events needed within time period before an associated action is taken. For example, intrusion exception CODE is considered an attack if it is raised four times within ten minutes. If the threshold is reached first time, the event is only logged. User is signed out after second offense. Third offense (e.g. the exception was raised twelve times within ten minutes) will completely disable user account: # number of intrusions in a specified segment of time that constitutes the upper threshold - once crossed, it's considered an "attack" IntrusionDetector.CODE.count=4 # time period (in seconds) IntrusionDetector.CODE.interval=600 # list of actions you want executed in the specified order IntrusionDetector.CODE.actions=log, logout, disableXSS Attack and SQL Injection Detection We start with detection points detecting two popular attacks on web applications: SQL injection and XSS attack. Whenever application reads value from request parameter, the value is passed to detection points: /** * Reads sanitized request parameter value from request. */ protected String getField(HttpServletRequest request, String parameter) { String dirtyValue = request.getParameter(parameter); // detection point: XSS, SQL injection AttackDetectorUtils.verifySQLInjectionAttack(dirtyValue); AttackDetectorUtils.verifyXSSAttack(dirtyValue); return sanitizer.sanitize(dirtyValue); }According to AppSensor book, these detection points detect intrusion whenever an attacker tries to put in SQL injection or XSS attack. If that would be a case, system should take strong action against user. We configured it in appsensor.properties file: # XSS attack -- be careful about these settings # clear attack event, taking action immediately IntrusionDetector.IE1.count=1 # clear log after 20 minutes IntrusionDetector.IE1.interval=1200 # first offense log user out, second disable account IntrusionDetector.IE1.actions=logout, disable# SQL injection -- be careful about these settings # clear attack event, taking action immediately IntrusionDetector.CIE1.count=1 # clear log after 20 minutes IntrusionDetector.CIE1.interval=1200 # first offense log user out, second disable account IntrusionDetector.CIE1.actions=logout, disableHowever, we suggest to be very careful about it. Open ‘personal account page’ of SimpleShiroSecuredApplication and put one of these strings into any field: The dog should fetch the stick as soon as possible.please delete all unused configuration filesAny text with ; in it.SQL injection detection point triggers an exception after any of them. The problem is caused by the word fetch in the first message, by the word delete in the second message and by the symbol ‘;’ in the last message. They are quite common in standard English texts, but all of them raise the SQL injection exception. XSS attack detection point may cause false positives too (however less likely): Hi, thank you for installation scripts and requirements document.cookies has been delicious, please bring them again :)). See you soon, AndyDefault attack patterns are configured in appsensor.properties file. Use standard java regular expressions: # This collection of strings is the XSS attack pattern list xss.attack.patterns=\"><script>,script.*document\\.cookie,<script>,<IMG.*SRC.*=.*script,<iframe>.*</iframe># This collection of strings is the SQL Injection attack pattern list sql.injection.attack.patterns=\\-\\-,\\;,\\/\\*,\\*\\/,\\@\\@,\\@,nchar ,varchar,nvarchar, alter,cursor,delete,drop,exec,fetch,insert,kill,sysobjects,syscolumnsWe strongly suggest to change these lists to something less strict. Response Actions The project contains also list of possible response actions. Suggested actions are divided into following categories:silent – logging change, administrator notification, … passive – show warning, slow down application for the user, … active – log out, disable module, require additional verification (captcha), … intrusive – be careful to stay on the legal sideAppSensor implements only seven response actions. Five are available in our application:log event, logout user, disable user account, send sms to administrator, email admin. disable component, disable component for user.Both disable component response actions lock access to last called URL. They both require additional configuration. Keep in mind, that response action may significantly change code behavior. For example, it may cause seemingly random unexpected user log outs, which may cause unexpected null pointer exceptions or other similar problems. Your design have to be robust enough to handle such situations correctly. Disable Component If you wish to use ‘disable component’ or ‘disable component for user’ response actions, you have to create page that will show instead of blocked page: <title>Access Denied</title> </head> <body> Sorry, you do not have access rights to that area. </body> </html>Only URLs that ends with suffixes configured in appsensor.properties are blockable. We will use only jsp and html pages. All our servlets will have suffix servlet: # This is the list of extensions to check for disabling, ie. jsp (for jsp's), do (for struts 1), UpdateProfile (for the UpdateProfile servlet) disable.component.extensionsToCheck=jsp,html,servlet,It is not possible to configure ‘all pages suffix does not matter’. You have to list all possible suffixes. However, you can configure exceptions that will never be blocked. As we do not use it, we will leave there default configuration: # This is the list of exceptions to disabling components, ie this list should never be disabled disable.component.exceptions=/AppSensorDemo/appsensor_locked.jsp, /AppSensorDemo/login.jsp,/AppSensorDemo/updateProfile.jspEnable disable component actions in web.xml: AppSensorBlockComponent org.owasp.appsensor.filters.AppSensorRequestBlockingFilter redirectURL /simpleshirosecuredapplication/account/accessdenied.jsp AppSensorBlockComponent /*AppSensor redirects blocked requests to redirectURL. Finally, disable component response actions are configured inside detection point configuration in appsensor.properties file. Configuration of ‘disable component for user’ looks like this: # some integer - duration of time to disable IntrusionDetector.ACTION_DOES_NOT_EXISTS.disableComponentForUser.duration=30 # some measure of time, currently supported are s,m,h,d (second, minute, hour, day) IntrusionDetector.ACTION_DOES_NOT_EXISTS.disableComponentForUser.timeScale=mConfiguration of ‘disable component’ is similar. Default duration is 0, so the action does nothing if you do not configure it. Use constant -1 to disable component forever. Custom Response Action If you plan to use AppSensor in your application, you will probably want to add a new response action or modify a default one. Default response actions are handled by DefaultResponseAction class. If you need different set of response actions, you have to implement ResponseAction interface. Following class adds a ‘warn user’ action: public class CustomResponseAction implements ResponseAction { private final ResponseAction delegee=DefaultResponseAction.getInstance();@Override public boolean handleResponse(..., AppSensorIntrusion currentIntrusion) { if ("warn".equals(action)) { Exception securityException = currentIntrusion.getSecurityException(); String localizedMessage = securityException.getLocalizedMessage(); ASUtilities asUtilities = APPSENSOR.asUtilities(); HttpServletRequest request = asUtilities.getCurrentRequest(); request.setAttribute("securityWarning", localizedMessage); return true; } return delegee.handleResponse(action, currentIntrusion); }}Enable new class in appsensor.properties file: # This is the class that handles the response actions AppSensor.responseAction = org.meri.simpleshirosecuredapplication.intrusiondetection.responseaction.CustomResponseActionNote: If you want to extend DefaultResponseAction, you have to override static method getInstance(). The class responsible for initialization would otherwise create an instance of DefaultResponseAction. Intrusion Store All detected intrusions are stored in intrusion store object. Default intrusion store provides very simple implementation: it stores all intrusions in static in memory map. It is not suitable for clustered environment and all stored data are lost after system restart. If you have special needs, you can replace it with own solution. Implement IntrusionStore interface and configure it in appsensor.properties file: # This is the class that handles the intrusion store AppSensor.intrusionStore = org.owasp.appsensor.intrusiondetection.reference.DefaultIntrusionStoreOverall Impressions AppSensor is a great idea and the only intrusion detection framework I know about. The project produced easy to read free e-book and is worth spending some time playing with. Framework design, e.g. division to detection points, intrusion detector and response action are solid and practical. The project is still in beta and it occasionally shows. Default configuration of XSS attack and SQL injection detection points is too strict and we found some small bugs while writing this article. However, project committers response to submitted issues was always fast. Multiple expected small features are missing. It is not possible to say ‘all URLs are blockable’. User has to add all suffixes into extensionsToCheck manually. As default response actions are configured in detection point blocks in appsensor.properties file, we would also expect support for same configuration for custom actions. Or, some classes (DefaultResponseAction) are singletons for no good reason. If you decide to use intrusion detection in your project, your design have to be robust enough to handle all those random log outs and account disables. Reference: AppSensor – Intrusion Detection from our JCG partner Maria Jurcovicova at the This is Stuff blog....
software-development-2-logo

The Perils of Not Unit Testing

Overview Unit testing is a widely accepted practice in most development shops in this day and age, especially with the advent of the tool JUnit. JUnit was so widely effective and used early on that it has been included in the default distribution of eclipse as long as I can remember and I have been programming professionally in Java for about 8 years. However, the drawbacks of not unit testing are concrete and arise acutely from time to time. This article aims to give a few specific examples of the perils of not unit testing. Unit Testing Benefits Unit testing has several basic tangible benefits that has reduced the painstaking troubles of the days when it was not widely used. Without getting into the specifics of the needs and arguments for unit testing, let’s simply highlight the benefits as they are universally accepted by Java development professionals, especially within the Agile community.an automated regression unit test suite has the ability to isolate bugs by unit, as tests focus on a unit and mock out all other dependencies unit tests give feedback to the developer immediately during the test, code, test, code rhythm of development unit tests find defects early in the life cycle. units tests provide a safety net that facilitates necessary refactoring to improve the design of code without breaking existing functionality unit tests, along with a code coverage tool, can produce tangible metrics such as code coverage which is valuable given good quality of tests unit tests provide an executable example of how client code can use the various interfaces of the code base. the code resulting from unit testing is typically more readable and concise, as code which is not so is difficult to unit test. Thus it follows that code which is written in tandem with unit tests tends to be more modular and higher quality.Perils of Not Unit Testing Let’s explore by example how not unit testing can adversely affect code and allow bugs to easily enter a code base. The focus will be on the method level where methods are simple and straight forward, yet there still can be problems when code is not unit tested. Example 1: Reuse some code, but you introduce a bug This example illustrates a situation where a developer has good intentions of reusing some code, but due to a lack of unit testing, the developer unintentionally introduces a bug. If unit tests exists, the developer could have safely refactored and could rely on the unit tests to inform him some requirement had not been covered. Let’s introduce a simple scenario where a clothing store has as system that has users input sales of its clothes. Two objects in the system are: Shirt and ShirtSaleValidator. The ShirtSaleValidator checks the Shirt to see if the sale prices inputted are correct. In this case, a shirt sale price has to be between $0.01 and $15. (Note this example is overly simplified, but still illustrates the benefits of unit testing.) Coder Joe implementes the isShirtSalePriceValid method but writes no unit tests. He follows the requirements correctly. The code is correct. package com.assarconsulting.store.model;public class Shirt {private Double salePrice; private String type; public Shirt() { }public Double getSalePrice() { return salePrice; }public void setSalePrice(Double salePrice) { this.salePrice = salePrice; }public String getType() { return type; }public void setType(String type) { this.type = type; } }package com.assarconsulting.store.validator;import com.assarconsulting.store.model.Shirt; import com.assarconsulting.store.utils.PriceUtility;public class ShirtSaleValidator {public ShirtSaleValidator() { } public boolean isShirtSalePriceValid(Shirt shirt) { if (shirt.getSalePrice() > 0 && shirt.getSalePrice() <= 15.00) { return true; } return false; } }Coder Bob comes along and he is “refactor” minded, he loves the DRY principle and wants to reuse code. During some other requirement he implemented a Range object. He sees its usage in the shirt pricing requirement as well. Note that Bob is not extensively familiar with Joe’s requirement, but familiar enough to feel competent enough to make a change. In addition, their group abides by the Extreme Programming principle of collective ownership. Thus, Bob nobly makes the change to reuse some code. He quickly translates the existing code to use the utility method, and moves on satisfied. package com.assarconsulting.store.validator;import com.assarconsulting.store.model.Shirt; import com.assarconsulting.store.utils.Range;public class ShirtSaleValidator {public ShirtSaleValidator() { } public boolean isShirtSalePriceValid(Shirt shirt) { Range< Double > range = new Range< Double >(new Double(0), new Double(15));if (range.isValueWithinRange(shirt.getSalePrice())) { return true; } return false; } } package com.assarconsulting.store.utils;import java.io.Serializable;public class Range< T extends Comparable> implements Serializable { private T lower; private T upper; public Range(T lower, T upper) { this.lower = lower; this.upper = upper; } public boolean isValueWithinRange(T value) { return lower.compareTo(value) <= 0 && upper.compareTo(value) >= 0; }public T getLower() { return lower; } public T getUpper() { return upper; } }Since there were no unit tests, a bug was created and never caught at time of implementation. This bug will go unnoticed until a developer or user specifically runs manual tests through the UI or some other client. What is the bug? The new code allows 0 to be a price of the Shirt, which is not specified by requirements. This could have been easily caught if there was an existing set of unit tests to regression test this requirement. We could have a minimum set of simple tests that checked the range of prices for a shirt. The set of unit tests could run on each check in of code or each build. For example, the test suit could have asserted the following.$0 = price executes isShirtSalePriceValid to false $0.01 = price executes isShirtSalePriceValid to true $5 = price executes isShirtSalePriceValid to true $15 = price executes isShirtSalePriceValid to true $16 = price executes isShirtSalePriceValid to false $100 = price executes isShirtSalePriceValid to falseIf Bob has these tests to rely on, the first bullet point test would have failed, and he would have caught his bug immediately. Peril – Imagine hundreds of business requirements that are more complicated than this without unit testing. The compounding effect of not unit testing resulting in bugs, repeated code and difficult maintenance could be exponential compared to the safety net and reduced cost unit testing provides. Example 2: Code not unit tested yields untestable code, which leads to unclean, hard to understand code. Let’s continue the clothing store system example, which involves pricing of a shirt object. The business would like to introduce Fall Shirt Sale, which can be described as: For the Fall, a shirt is eligible to be discounted by 20% if it is priced less than $10 and is a Polo brand. The Fall sales last from Sept 1, 2009 till Nov 15th 2009. This functionality will be implemented in the ShirtSaleValidator class by Coder Joe who plans not to write unit tests. Since testing methods is not on his radar, he is not concerned with making the method testable, ie, making short and concise methods to not introduce too much McCabe’s cyclomatic complexity. Increased complexity is difficult to unit test as many test cases are necessary to achieve code coverage. His code is correct, but may turn out something like below. package com.assarconsulting.store.validator;import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar;import com.assarconsulting.store.model.Shirt; import com.assarconsulting.store.utils.PriceUtility;public class ShirtSaleValidator {private Calendar START_FALL_SALE_AFTER = new GregorianCalendar(2009, Calendar.AUGUST, 31); private Calendar END_FALL_SALE_BEFORE = new GregorianCalendar(2009, Calendar.NOVEMBER, 16); public ShirtSaleValidator() { } public boolean isShirtEligibleForFallSaleNotTestable(Shirt shirt) { Date today = new Date(); if (today.after(START_FALL_SALE_AFTER.getTime()) && today.before(END_FALL_SALE_BEFORE.getTime())) { if (shirt.getSalePrice() > 0 && shirt.getSalePrice() <= 10 ) { if (shirt.getType().equals("Polo")) { return true; } } } return false; } }The problems with this code are numerous, including misplacement of logic according to OO principles and lack of Enums. However, putting these other concerns aside, let’s focus on the readability of this this method. It is hard to ascertain the meaning of this code by just looking at it in a short amount of time. A developer has to study the code to figure out what requirements it is addressed. This is not optimal. Now’s lets think about the testability of this method. If anyone was to test Joe’s code, after he decided to leave it this way due to his NOT unit testing, it would be very difficult to test. The code contains 3 nested if statements where 2 of them have ‘ands’ and they all net result in many paths through the code. The inputs to this test would be a nightmare. I view this type of code as a consequence of not following TDD, i.e. writing code without the intention of testing it. A more TDD oriented way of writing this code would be as follows. package com.assarconsulting.store.validator;import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar;import com.assarconsulting.store.model.Shirt; import com.assarconsulting.store.utils.PriceUtility;public class ShirtSaleValidator {private Calendar START_FALL_SALE_AFTER = new GregorianCalendar(2009, Calendar.AUGUST, 31); private Calendar END_FALL_SALE_BEFORE = new GregorianCalendar(2009, Calendar.NOVEMBER, 16); public ShirtSaleValidator() { } public boolean isShirtEligibleForFallSale(Shirt shirt) { return isFallSaleInSession() && isShirtLessThanTen(shirt) && isShirtPolo(shirt);}protected boolean isFallSaleInSession() { Date today = new Date(); return today.after(START_FALL_SALE_AFTER.getTime()) && today.before(END_FALL_SALE_BEFORE.getTime()); } protected boolean isShirtLessThanTen(Shirt shirt) { return shirt.getSalePrice() > 0 && shirt.getSalePrice() <= 10; } protected boolean isShirtPolo(Shirt shirt) { return shirt.getType().equals("Polo"); } }From this code we can see that the method isShirtEligibleForFallSale() reads much like the requirement. The methods that compose it are readable. The requirements are broken up amongst the methods. We can test each component of the requirement separately with 2-3 test methods each. The code is clean and with a set of unit tests, there is proof of its correctness and a safety net for refactoring. Peril – Writing code without the intention of testing can result in badly structured code as well as difficult to maintain code. Conclusion The above examples are only simple illustrations of the drawbacks of foregoing unit testing. The summation and compounding effect of the perils of not unit testing can make development difficult and costly to a system. I hope the illustrations above communicate the importance of unit testing code. Source Code peril-not-unit-testing.zip Reference: GWT and HTML5 Canvas Demo from our JCG partner Nirav Assar at the Assar Java Consulting blog....
software-development-2-logo

Application Performance and Antipatterns

Any application you pick up, there are some issues – big or small. There will be copy-paste code, mistakes, algorithms which could have better thought through. But what distinguishes an antipattern from these normal errors is that like patterns these antipatterns are recurring throughout the code base. In my recent experience in dealing with performance issues, I had observed certain recurrent themes that are undermining the overall application performance. Most of these antipatterns are well documented but it seems we do not learn from others mistakes. We need to make our own mistakes. I am recounting some of the common patterns that I observed in the recent months.Excessive Layering – Most of the underlying performance starts with the excessive layering antipattern. The application design has grown over the usage of controllers, commands and facades. In order to decouple each layer, the designers are adding facades at each of the tiers. Now, for every request at the web tier, the request call goes through multiple layers just to fetch the results. Imagine doing this for thousands of requests coming in and the load the JVM need to handle to process these requests. The number of objects that get created and destroyed when making these calls add to the memory overhead. This further limits the amount of requests that can be handled by each server node. Based on the size of the application, deployment model, the number of user’s, appropriate decision need to be taken to reduce the number of layers. E.g. if the entire application gets deployed in the same container, there is no need to create multiple layers of process beans, service beans(business beans), data access objects etc. Similarly, when developing an internet scale application, large number of layers start adding overheads to the request processing. Remember, large number of layers means large number of classes which effectively start impacting the overall application maintainability.Round Tripping- With the advent of ORM mappings, Session/DAO objects, the programmer starts making calls to beans for every data. This leading to excessive calls between the layers. Another side issue is the number of method calls each layer start having to support this model. Worse case is, when the beans are web service based. Client tier making multiple web service calls within a single user request have a direct impact on the application performance. To reduce the round tripping, the application needs to handle or combine multiple requests at the business tier.Overstuffed Session- Session object is a feature provided by the JEE container to track user session during the web site visit. The application start with the promise of putting very minimal information in the session but over a period of time, the session object keeps on growing. Too much of data or wrong kind of data is stuffed into the session object. Large data objects will mean that the objects placed in the session will linger on till the session object is destroyed. This impacts the number of user’s that can be served by the application server node. Further, I have seen, application using session clustering to support availability requirements but adding significant overheads to the network traffic and ability of application to handle higher number of users. To unstuff the session object, take an inventory of what all goes there, see what is necessary, what objects can be defaulted to request scope. For others, remove the objects from session when their usage is over.Golden Hammer (Everything is a Service) – With the advent of SOA, there is tendency to expose the business services, which can be orchestrated into process services. In the older applications, one can observe similar pattern being implemented with EJBs. This pattern coupled with the bottom up design approach at times, means exposing each and every data entity as a business service. This kind of design might be working correctly functionally, but from the performance and maintenance point of view, it soon becomes a night mare. Every web service call adds overhead in terms of data serialization and deserialization. At times, the data(XML) being passed with web service calls is also huge leading to performance issues. The usage of services or ejb’s should to be evaluated from application usage perspective. Attention needs to be paid on the contract design.Chatty Services – Another pattern observed is the way the service is implemented via multiple web service calls each of which is communicating a small piece of data. This results in explosion of web services and which leads to degradation of performance and unmaintainable code. Also, from the deployment perspective, the application starts running into problems. I have come across projects which have hundred plus services all getting crammed into a single deployment unit. When the application comes up, the base heap requirement is already in 2Gb range leaving not much space for application to run. If the application is having too many fine grained services, then it an indication towards the application of this antipattern.The above mentioned antipatterns are frequent causes of application performance issues. The teams usually start with the right intentions but over a period of time, things will start slipping. Some of the common reasonsLack of design standards and reviews processes – even if these exists, the delivery pressure is leading to skipping these processes Team members inexperience or narrow view leads to every programmer only looking at their module and nobody is looking at the overall application performance Continuous Integration(CI) tools not integrated with compliance check tools like PMD, Checkstyle, FindBugs etc No focus on profiling of the application on regular basis during the code construction phase Not evaluating the results from the Load tests to decipher and fix the underlying issue (blaming the poor infrastructure setup)What are the other antipatterns you have observed that have contributed to the degradation in the application performance. Do share! Reference: Application Performance and Antipatterns from our JCG partner Munish K Gupta at the Tech Spot blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books