Featured FREE Whitepapers

What's New Here?

java-logo

Escaping the JVM heap for memory intensive applications

If you’ve ever allocated large Java heaps, you know that at some point – typically starting at around 4 GiB – you will start having issues with your garbage collection pauses. I won’t go into detail about why pauses happen in the JVM, but in short it happens when the JVM does full collections and you have a large heap. As the heap increases, those collections might become longer.           The simplest way to overcome this is to tune your JVM garbage collection parameters to match the memory allocation and deallocation behaviour of your particular application. It is a bit of a dark art and requires careful measurements, but it’s possible to have very large heaps while avoiding mostly old generation garbage collections. If you want to learn more about Garbage Collection tuning, check out this JVM GC tuning guide. If you get really interested about GC in general, this is an excellent book: The Garbage Collection Handbook. There are JVM implementations that guarantee much lower pause times than the Sun VM, such as the Zing JVM – but normally at other costs in your system, such as increased memory usage and single threaded performance. The ease of configuration and low gc guarantees is still very appealing. For the purpose of this article, I will use the example of an in-memory cache or store in Java, mainly because I’ve built a couple in the past while using some of these techniques. We’ll assume we have a basic cache interface definition like so: import java.io.Externalizable;public interface Cache<K extends Externalizable, V extends Externalizable> { public void put(K key, V value); public V get(K key); } We’re requiring that keys and values are Externalizable just for this simple example, wouldn’t be like this IRL. We will show how to have different implementations of this cache that store data in memory in different ways. The simplest way to implement this cache would be using Java collections: import java.io.Externalizable; import java.util.HashMap; import java.util.Map;public class CollectionCache<K extends Externalizable, V extends Externalizable> implements Cache<K, V> { private final Map<K, V> backingMap = new HashMap<K, V>();public void put(K key, V value) { backingMap.put(key, value); }public V get(K key) { return backingMap.get(key); } } This implementation is straighforward. However, as the map size increases, we will be allocating a large number of objects (and deallocating), we are using boxed primitives which takes more space in memory that primitives and the map needs to be resized from time to time. We could certainly improve this implementation simply by using a primitive-based map. It would use less memory and objects but would still take space in the heap and possibly partition the heap, leading to longer pauses if for other reasons we do full GCs. Let’s look at other ways to store similar data without using the heap:Use a separate process to store the data. Could be something like a Redis or Memcached instance that you connect through sockets or unix sockets. It’s fairly straightforward to implement. Offload data to disk, using memory mapped files. The OS is your friend and will do a lot of heavy work predicting what you’ll read next from the file and your interface to it is just like a big blob of data. Use native code and access it through JNI or JNA. You’ll get better performance with JNI and ease of use with JNA. Requires you to write native code. Use direct allocated buffers from the NIO package. Use the Sun specific Unsafe class to access memory directly from your Java code.I will focus on the solutions that use exclusively Java for this article, direct allocated buffers and the Unsafe class. Direct Allocated Buffers Direct Allocated Buffers are extremely useful and used extensively when developing high-performance network applications in Java NIO. By allocating data directly outside the heap, in a number of cases you can write software where that data actually never touches the heap. Creating a new direct allocated buffer is as simple as it gets: int numBytes = 1000; ByteBuffer buffer = ByteBuffer.allocateDirect(numBytes); After creating a new buffer, you can manipulate the buffer in a few different ways. If you’ve never used Java NIO buffers you should definitely take a look as they are really cool. Besides ways to fill, drain and mark different points in the buffer, you can opt to have different view on the buffer instead of a ByteBuffer – e.g. buffer.asLongBuffer() gives you a view on the ByteBuffer where you manipulate elements as longs. So how could these be used in our Cache example? There are a number of ways, the most straightforward way would be to store the serialized/externalized form of the value record in a big array along with a map of keys to offsets and sizes of the record in that array. It could look like this (very liberal approach, missing implementations and assuming fixed size records): import java.io.Externalizable; import java.nio.ByteBuffer; import java.util.HashMap; import java.util.Map;public class DirectAllocatedCache<K extends Externalizable, V extends Externalizable> implements Cache<K,V> { private final ByteBuffer backingMap; private final Map<K, Integer> keyToOffset; private final int recordSize;public DirectAllocatedCache(int recordSize, int maxRecords) { this.recordSize = recordSize; this.backingMap = ByteBuffer.allocateDirect(recordSize * maxRecords); this.keyToOffset = new HashMap<K, Integer>(); }public void put(K key, V value) { if(backingMap.position() + recordSize < backingMap.capacity()) { keyToOffset.put(key, backingMap.position()); store(value); } }public V get(K key) { int offset = keyToOffset.get(key); if(offset >= 0) return retrieve(offset);throw new KeyNotFoundException(); }public V retrieve(int offset) { byte[] record = new byte[recordSize]; int oldPosition = backingMap.position(); backingMap.position(offset); backingMap.get(record); backingMap.position(oldPosition);//implementation left as an exercise return internalize(record); }public void store(V value) { byte[] record = externalize(value); backingMap.put(record); } } As you can see, this code has a number of limitations: fixed record size, fixed backing map size, limited way in which externalization is done, difficult to delete and reuse space, etc. While some of these are possible to overcome with clever ways to represent the record in byte arrays (and representing the keyToOffset map in direct allocated buffers also) or dealing with deletions (we could implement our own SLAB allocator) others such as resizing the backing map are difficult to overcome. An interesting improvement is to implement records as offsets to records and fields, thus reducing the amount of data we copy and do so only on demand. Be aware that the JVM imposes a limit to the amount of memory used by direct allocated buffers. You can tune this with the -XX:MaxDirectMemorySize option. Check out the ByteBuffer javadocs Unsafe Another way to manage memory directly from Java is using the hidden Unsafe class. Technically we’re not supposed to use this and it is implementation specific as it lives in a sun package, but the possibilities offered are endless. What Unsafe gives us is the ability to allocate, deallocate and manage memory directly from Java code. We can also get the actual pointers and pass them between native and java code interchangebly. In order to get an Unsafe instance, we need to cut a few corners: private Unsafe getUnsafeBackingMap() { try { Field f = Unsafe.class.getDeclaredField('theUnsafe'); f.setAccessible(true); return (Unsafe) f.get(null); } catch (Exception e) { } return null; } Once we have the unsafe, we can apply this to our previous Cache example: import java.io.Externalizable; import java.lang.reflect.Field; import java.util.HashMap; import java.util.Map;import sun.misc.Unsafe;public class UnsafeCache<K extends Externalizable, V extends Externalizable> implements Cache<K, V> { private final int recordSize; private final Unsafe backingMap; private final Map<K, Integer> keyToOffset; private long address; private int capacity; private int currentOffset;public UnsafeCache(int recordSize, int maxRecords) { this.recordSize = recordSize; this.backingMap = getUnsafeBackingMap(); this.capacity = recordSize * maxRecords; this.address = backingMap.allocateMemory(capacity); this.keyToOffset = new HashMap<K, Integer>(); }public void put(K key, V value) { if(currentOffset + recordSize < capacity) { store(currentOffset, value); keyToOffset.put(key, currentOffset); currentOffset += recordSize; } }public V get(K key) { int offset = keyToOffset.get(key); if(offset >= 0) return retrieve(offset);throw new KeyNotFoundException(); }public V retrieve(int offset) { byte[] record = new byte[recordSize];//Inefficient for(int i=0; i<record.length; i++) { record[i] = backingMap.getByte(address + offset + i); }//implementation left as an exercise return internalize(record); }public void store(int offset, V value) { byte[] record = externalize(value);//Inefficient for(int i=0; i<record.length; i++) { backingMap.putByte(address + offset + i, record[i]); } }private Unsafe getUnsafeBackingMap() { try { Field f = Unsafe.class.getDeclaredField('theUnsafe'); f.setAccessible(true); return (Unsafe) f.get(null); } catch (Exception e) { } return null; } } There’s a lot of space for improvement and you need to do a number of things manually, but it’s very powerful. You can also explicitly free and reallocate memory that you’ve allocated in this way, which allows you to write some code in the same way you would to with C. Check out the javadocs for Unsafe Conclusion There’s a number of ways to avoid using the heap in Java and in this way, use a lot more memory. You don’t need to do this and I’ve personally seen properly tuned JVMs with 20GiB-30GiB running with no long garbage collection pauses, but it is fairly interesting. If you want to check out how some projects use this for the basic (and honestly untested, almost written on a napkin) cache code I wrote here, have a look at EHCache’s BigMemory or Apache Cassandra which uses Unsafe also for this type of approach.   Reference: Escaping from the JVM heap for memory intensive applications from our JCG partner Ruben Badaro at the Java Advent Calendar blog. ...
groovy-logo

Groovy: Multiple Values for a Single Command-line Option

One of the many features that makes Groovy an attractive scripting language is its built-in command-line argument support via CliBuilder. I have written about CliBuilder before in the posts Customizing Groovy’s CliBuilder Usage Statements and Explicitly Specifying ‘args’ Property with Groovy CliBuilder. In this post, I look at Groovy‘s CliBuilder’s support for multiple arguments passed via a single command-line flag. The Groovy API Documentation includes this sentence about CliBuilder: nbsp;      Note the use of some special notation. By adding ‘s’ onto an option that may appear multiple times and has an argument or as in this case uses a valueSeparator to separate multiple argument values causes the list of associated argument values to be returned.As this documentation states, Groovy’s built-in CliBuilder support allows a parsed command line flag to be treated as having multiple values and the convention for referencing this argument is to add an ‘s’ after the ‘short’ name of the command-line option. Doing so makes the multiple values associated with a single flag available as a collection of Strings that can be easily iterated to access the multiple values. In the post Customizing Groovy’s CliBuilder Usage Statements, I briefly looked at the feature supporting multiple values passed to the script via a single command line argument. I described the feature in that post as follows:The use of multiple values for a single argument can also be highly useful. The direct use of Apache Commons CLI’s Option class (and specifically its UNLIMITED_VALUES constant field) allows the developer to communicate to CliBuilder that there is a variable number of values that need to be parsed for this option. The character that separates these multiple values (a common in this example) must also be specified by specifying the character via ‘valueSeparator.’The usefulness of this Apache CLI-powered Groovy feature can be demonstrated by adapting a script for finding class files contained in JAR files that I talked about in the post Searching JAR Files with Groovy. The script in that post searched one directory recursively for a single specified String contained as an entry in the searched JARs. A few minor tweaks to this script changes it so that it can support multiple specified directories to recursively search for multiple expressions. The revised script is shown next. #!/usr/bin/env groovy/** * findClassesInJars.groovy * * findClassesInJars.groovy -d <<root_directories>> -s <<strings_to_search_for>> * * Script that looks for provided String in JAR files (assumed to have .jar * extensions) in the provided directory and all of its subdirectories. */def cli = new CliBuilder( usage: 'findClassesInJars.groovy -d <root_directories> -s <strings_to_search_for>', header: '\nAvailable options (use -h for help):\n', footer: '\nInformation provided via above options is used to generate printed string.\n') import org.apache.commons.cli.Option cli.with { h(longOpt: 'help', 'Help', args: 0, required: false) d(longOpt: 'directories', 'Two arguments, separated by a comma', args: Option.UNLIMITED_VALUES, valueSeparator: ',', required: true) s(longOpt: 'strings', 'Strings (class names) to search for in JARs', args: Option.UNLIMITED_VALUES, valueSeparator: ',', required: true) } def opt = cli.parse(args) if (!opt) return if (opt.h) cli.usage()def directories = opt.ds def stringsToSearchFor = opt.ssimport java.util.zip.ZipFile import java.util.zip.ZipExceptiondef matches = new TreeMap<String, Set<String>>() directories.each { directory -> def dir = new File(directory) stringsToSearchFor.each { stringToFind -> dir.eachFileRecurse { file-> if (file.isFile() && file.name.endsWith('jar')) { try { zip = new ZipFile(file) entries = zip.entries() entries.each { entry-> if (entry.name.contains(stringToFind)) { def pathPlusMatch = '${file.canonicalPath} [${entry.name}]' if (matches.get(stringToFind)) { matches.get(stringToFind).add(pathPlusMatch) } else { def containingJars = new TreeSet<String>() containingJars.add(pathPlusMatch) matches.put(stringToFind, containingJars) } } } } catch (ZipException zipEx) { println 'Unable to open file ${file.name}' } } } } }matches.each { searchString, containingJarNames -> println 'String '${searchString}' Found:' containingJarNames.each { containingJarName -> println '\t${containingJarName}' } } Lines 11 through 28 are where Groovy’s internal CliBuilder is applied. The ‘directories’ (short name of ‘d’) and ‘strings’ (short name of ‘s’) command-line flags are set up in lines 20 and 21. Those lines use the Option.UNLIMITED_VALUES to specify multiple values applicable for each argument and they also use valueSeparator to specify the token separating the multiple values for each flag (comma in these cases). Lines 27-28 obtain the multiple values for each argument. Although the options had short names of ‘d’ and ‘s’, appending ‘s’ to each of them (now ‘ds’ and ‘ss’) allows their multiple values to be accessed. The rest of the script takes advantage of these and iterates over the multiple strings associated with each flag. The next screen snapshot demonstrates the above script being executed.The above screen snapshot demonstrates the utility of being able to provide multiple values for a single command-line flag. Groovy’s built-in support for Apache CLI makes it easy to employ customizable command-line parsing.   Reference: Groovy: Multiple Values for a Single Command-line Option from our JCG partner Dustin Marx at the Inspired by Actual Events blog. ...
antlr-logo

ANTLR – Semantic Predicates

Parsing simple grammar with antlr is simple. All you have to do is to use regular expressions to describe your language and let antlr generate lexer and parser. Parsing big or complicated languages occasionally require more, because describing them in regular expressions only can be hard or even impossible. Semantic predicates are java (or C++, or JavaScript, or …) conditions written inside the grammar. Antlr uses them either to choose between multiple alternatives or as additional assertions to be checked. They can be placed inside both lexer and parser, but this post focus only on their usage within the parser. They add a lot of power to antlr.  This post assumes that you have general idea on what is antlr and how generated parser works. If you do not, please read linked posts first. They contain everything needed. First chapter contains two motivational use cases. Second chapter describes syntax, terminology and shows simple failed semantic predicate. The post then explains how semantic predicates influence prediction and generated code. It also shows how to write useful conditions and how to solve initial use cases. Final chapter wraps it all into short conclusion. All examples and grammars used in this post are available on Github.Table of ContentsUse CasesKeywords – nth Significant WhitespacesBasicsSyntax TerminologyDisambiguating Semantic Predicate Validating Semantic Predicate Gated Semantic PredicateFailed PredicatesHoisting and PredictionWhat It Is Consequences When It Is UsedIf Needed Hoisting – Disambiguating Predicate Always Hoisting – Gated Rules Never Hoisting – Middle of a RuleNuancesDisambiguating Predicates – Advanced Loops Uncovered Alternatives Combination of Gated and Disambiguated Predicates Additional ParenthesisBacktracking Writing ConditionsInput Token StreamExamplesLabels and ListsLabel Example Label List Example Undefined Labels Labels and HoistingAccess to Local VariablesSolving Initial Use CasesKeywords – nth Significant WhitespacesWrapping It UpValidating Semantic Predicates Disambiguating Semantic Predicates Gated Semantic PredicatesResourcesUse Cases As we spend some time parsing css-like language, both our use cases describe problem we had to solve while writing css part of that parser. First one is about issue encountered while working on pseudo classes and second deals with tricky whitespaces in selectors. If you never worked with css or are not interested in use cases, skip this chapter.Keywords – nth Some css pseudo classes require parameter which can be either a number, an identifier, a selector or something called nth expression. Nth expressions are allowed only inside some pseudo classes and names of these pseudo classes are not reserved keywords in css. Nth expression is an expression of the form an+b where a and b are optional integers. Examples of valid nth expressions: 5n+2, -5n-2, -5n, -2, -n, n. We wanted our grammar to accept nth expressions, but only as parameters of pseudo classes where it is actually allowed. We wanted it to reject nth expressions as parameters of all remaining pseudo classes. All names normally correspond to IDENT tokens. Creating special token corresponding to nth pseudo class name is unpractical, because they are not reserved keywords. For example, they are also perfectly valid class names or element names. Having special token would force us to replace almost all IDENT occurrences by IDENT | NTH. Therefore, we are left with general identifier IDENT which can be either normal or nth pseudo class name. Standard syntactical regular expressions are unable to distiguish between them, but semantic predicates can. Link to solution.Significant Whitespaces Css has semi-important whitespaces. Semi-important means that most of them represent only ends of tokens and that is where their usefulness ends. For example, whitespaces in declaration are irrelevant. Following declarations are equal: padding : 2; padding:2; Most of CSS grammar behave the above way, so there is strong temptation to throw all whitespaces away. However, if we do that, then the next two selectors end up as the same IDENT COLON IDENT LPAREN COLON IDENT RPAREN COLON IDENT LPAREN COLON IDENT RPAREN LBRACE token stream: div :not(:enabled) :not(:disabled) {} div:not(:enabled):not(:disabled) {} Whitespaces in selectors are significant. The first selector is equivalent to div *:not(:enabled) *:not(:disabled) while the second is not. Note: CSS 2.1 grammar available from antlr site ignores this issue. If you want to use it, you have to fix it first. One solution would be to stop hiding whitespaces. This would force us to add explicit whitespace handing WS* into all possible places of all parser rules. That would be a lot of work and the resulting grammar would be less readable. It is also possible to give up on selectors tree building in antlr parser and write custom hand made tree builder for it. This is how we did it originally and we can safely tell that it works, but requires more time and debugging then the final semantic predicates based solution. Link to solution.Basics We start with semantic predicates syntax and some needed terminology. This chapter also outlines basics of what happens if predicate fails. We will not go into details, those are described in the next chapter.Syntax Semantic predicate is always enclosed inside curly braces followed either by question mark or question mark and double arrow:{ condition }? { condition }?=>First example uses simple 1+2==3 and 2+3==5 conditions. The grammar is stored inside the Basics.g file: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;Terminology Depending on which syntax is used and where it is placed, semantic predicates can be called by one of three different names:disambiguating semantic predicate, validating semantic predicate, gated semantic predicate.Disambiguating Semantic Predicate Disambiguating predicates use the shorter {...}? syntax. However, a predicate is called disambiguating only if it is placed in the beginning of a rule or in the beginning of an alternative. Disambiguating semantic predicate: LETTER : 'a'..'z' | 'A'..'Z'; // beginning of a rule rule: {1+2==3}? LETTER LETTER;// beginning of an alternative alternatives: LETTER ( {2+3==5}? LETTER* | {2+3==5}? LETTER+ );Validating Semantic Predicate Validating predicates also use the shorter {...}? syntax. The difference against disambiguating predicates is only in placement. Validating predicates are placed in the middle of a rule or in the middle of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;Gated Semantic Predicate Gated semantic predicates use the longer {...}?=> syntax. The condition can be placed anywhere. Gated semantic predicate: NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;Failed Predicates As we explained in expression language tutorial post, parser starts knowing which rule should correspond to the input and then tries to match it to the input. Matching always starts from left-most element of the rule and continues to the right. If matching encounters semantic predicate, then it tests whether the condition is satisfied. If it is not satisfied, FailedPredicateException exception is thrown. Consider Basics.g grammar shown at the beginning of this chapter: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL; If you open generated BasicsParser class, you will find that each rule has corresponding method with the same name. Both of them contains a copy of the predicate and both of them throws an exception if the condition is not satisfied: // inside the generated word() method if ( !((1+2==3)) ) { throw new FailedPredicateException(input, 'word', '1+2==3'); } // inside the generated number() method if ( !((2+3==5)) ) { throw new FailedPredicateException(input, 'number', '2+3==5'); } Prediction, e.g. what happens if the parser encounters rule with both multiple alternatives and predicates, for example start: word | number rule, is described in the next chapter.Hoisting and Prediction Depending on where and how you use semantic predicates, the parser may try to avoid failed predicate exceptions. Used strategy is called ‘hoisting’ and it is what makes predicates useful. This chapter explains what hoisting is and what consequences it has. Then we will explain when it is used and when it is not used.What It Is A parser that encountered rule with multiple alternatives has to decide which of these alternatives should be used. If some of them start with a predicate, parser may use that predicate to help with the decision. Consider the grammar stored in DisambiguatingHoistingNeeded.g file: LETTER : 'a'..'z' | 'A'..'Z'; word: {1+2==3}? LETTER LETTER; sameWord: {1+2!=3}? LETTER LETTER;start: word | sameWord; Both word() and sameWord() methods of the generated parser contains the usual failed predicate check. DisambiguatingHoistingNeededParser class extract: //inside the word() method if ( !((1+2==3)) ) { throw new FailedPredicateException(input, 'word', '1+2==3'); } //inside the sameWord() method if ( !((1+2!=3)) ) { throw new FailedPredicateException(input, 'sameWord', '1+2!=3'); } In addition, the code corresponding to the start rule contains copies of both word and sameWord semantic predicates. The part that chooses which rule to use next contains following code (comments are mine): int LA1_2 = input.LA(3); //predicate copied from the word rule if ( ((1+2==3)) ) { alt1=1; } //predicate copied from the sameWord rule else if ( ((1+2!=3)) ) { alt1=2; } else { NoViableAltException nvae = new NoViableAltException('', 1, 2, input); throw nvae; } The act of copying the predicate into prediction part of the generated parser is called hoisting.Consequences If there would be no hoisting, predicates would act as assertions only. We could use them to validate some conditions and that would be it. The above grammar would be illegal – it has two syntactically equivalent alternatives if you ignore predicates. As hoisting copies predicates all over the grammar, it has also several limiting consequences for them. It is not just something that happens on the background you can safely ignore:each predicate can run several times, order in which predicates run may be hard to predict, local variables or parameters may not be available in hoisted copies.Conditions must be without side effects, repeatable and their evaluation order should not matter. If they are hoisted into other rules, then they can not reference local variables or parameters. Only predicates placed in the beginning of a rule are hoisted into other rules. Hoisting in the case of alternatives is only within the rule. Therefore, you can break the third rule if the predicate is not placed in the beginning of the rule.When It Is Used Hoisting is used only when the parser has to decide between multiple rules or alternatives and some of them begin with a predicate. If it is gated predicate e.g., condition inside the {...}?=> syntax, then the predicate is hoisted no matter what. If it is disambiguating predicate e.g., condition inside the {...}? syntax, then the predicate is hoisted only if it is actually needed. The term ‘actually needed’ means that multiple alternatives could match the same input. Otherwise said, it is used only if multiple alternatives are ambiguous for some input. Predicates placed in the middle of rules or in the middle of alternatives are never hoisted.If Needed Hoisting – Disambiguating Predicate Consider the rule start in the DisambiguatingHoistingNotNeeded.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; NUMBER : '0'..'9'; word: {1+2==3}? LETTER LETTER; differentWord: {1+2!=3}? LETTER NUMBER;start: word | differentWord; The rule start has to choose between the word and differentWord rules. Both of them start with predicate, but the predicate is not needed in order to differentiate between them. The second token of the word is LETTER while the second token of the differentWord is NUMBER. Hoisting will not be used. Instead, the grammar will look into upcoming 2 tokens to distinguish between these rules. To verify, open the start() method of the generated DisambiguatingHoistingNotNeededParser class in our sample project: neither 1+2==3 nor 1+2!=3 condition was copied into it. int alt1=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { alt1=1; } break; case NUMBER: { alt1=2; } break; default: NoViableAltException nvae = new NoViableAltException('', 1, 1, input);throw nvae; } On the other hand, consider the rule start in the DisambiguatingHoistingNeeded.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; word: {1+2==3}? LETTER LETTER; sameWord: {1+2!=3}? LETTER LETTER;start: word | sameWord; The rule start has to choose between the word and sameWord rules. These two rules match exactly the same sequence of tokens and differ only by the predicate. Hoisting will be used. To verify, open the start() method of the generated DisambiguatingHoistingNeededParser class in our sample project. It contains copies of both 1+2==3 and 1+2!=3 conditions. int alt1=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { int LA1_2 = input.LA(3);if ( ((1+2==3)) ) { alt1=1; } else if ( ((1+2!=3)) ) { alt1=2; } else { /* ... */ } } break; default: // ... } } break; default: // ... } The exactly same thing is happening with disambiguating predicates in alternatives. This will not be hoisted (DisambiguatingHoistingNotNeeded.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; alternatives: LETTER ( {2+3==5}? LETTER | {2+3==5}? NUMBER ); This will be hoisted (DisambiguatingHoistingNeeded.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; alternatives: LETTER ( {2+3==5}? LETTER | {2+3==5}? LETTER );Always Hoisting – Gated Rules Look at the start rule in the GatedHoisting.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';word: {1+2==3}?=> LETTER LETTER; differentWord: {1+2!=3}?=> LETTER NUMBER;start: word | differentWord; The rule start has to choose between the word and differentWord words. Both of them starts with predicate and that predicate is not needed in order to differentiate between them. However, hoisting will be used because we used gated semantic predicate. To verify, open the start() method of the generated GatedHoisting class in our sample project. It contains copies of both 1+2==3 and 1+2!=3 conditions. int LA1_0 = input.LA(1);if ( (LA1_0==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA1_1 = input.LA(2);if ( (LA1_1==LETTER) && ((1+2==3))) { alt1=1; } else if ( (LA1_1==NUMBER) && ((1+2!=3))) { alt1=2; } else { NoViableAltException nvae = new NoViableAltException('', 1, 1, input);throw nvae; } } else { NoViableAltException nvae = new NoViableAltException('', 1, 0, input);throw nvae; } The exactly same thing is happening with gated predicates in alternatives. This will be hoisted (GatedHoisting.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';alternatives: LETTER ( {2+3==5}?=> LETTER | {2+3==5}?=>NUMBER );Never Hoisting – Middle of a Rule Hoisting is never used if the predicate is located in the middle of a rule or an alternative. It does not matter which predicate type is used. Therefore, if your rules differ only by the predicate, that predicate must be placed in the beginning of a rule or an alternative. Non-hoisted gated predicate (GatedNoHoisting.g): LETTER: 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';//gated predicate in the middle of a rule word: LETTER {1+2==3}?=> LETTER; differentWord: LETTER {1+2!=3}?=> NUMBER;start: word | differentWord; Another non-hoisted gated predicate (GatedNoHoisting.g): LETTER: 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';//gated predicate in the middle of an alternative alternatives: LETTER ( LETTER {2+3==5}?=> LETTER | LETTER {2+3==5}?=> NUMBER ); Generated parser is in GatedNoHoistingParser class. The most important point is that if your rules differ only by the predicate and that predicate is placed in the middle of a rule, antlr will refuse to generate corresponding parser. Next expandable box contains several examples of syntactically incorrect grammars along with antr errors they cause. Incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER; sameWord: LETTER {1+2!=3}? LETTER;start: word | sameWord; Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:28:6: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:28:6: The following alternatives can never be matched: 2 Another incorrect grammar (SyntacticallyIncorrect.g): alternativesStart: LETTER ( LETTER {1+2==3}? | LETTER {1+2!=3}? ); Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:31:27: Decision can match input such as 'LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:31:27: The following alternatives can never be matched: 2 Yet another incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; gatedWord: LETTER {1+2==3}?=> LETTER; gatedSameWord: LETTER {1+2!=3}?=> LETTER;gatedStart: gatedWord | gatedSameWord; Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:40:11: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:40:11: The following alternatives can never be matched: 2 Last incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; gatedAlternativesStart: LETTER ( LETTER {1+2==3}?=> LETTER | LETTER {1+2!=3}?=> LETTER ); Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:43:32: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:43:32: The following alternatives can never be matched: 2Nuances Previous ‘When It Is Used’ sub-chapter shown how predicates behave in clearly hoisted and clearly non-hoisted situations. We selected examples to show as clear and simple situations as possible. This sub-chapter contains different set of examples. We picked most potentially confusing we have been aware of. All used examples are located in Nuances.g file.Disambiguating Predicates – Advanced Hoisted disambiguating predicates are used only if multiple alternatives are ambiguous for current input. Otherwise said, hoisted copy of the predicate is run only if the actual input could be parsed by multiple alternatives. Example: alternatives in the following rule are not syntactically equivalent, because they do not match the same set of inputs. First alternative matches exactly two letters and second alternative matches any number of letters: advancedDisambiguating: LETTER ( {1+2==3}? LETTER LETTER | {1+2!=3}? LETTER* ); If the input starts with exactly one LETTER, then it can not be parsed by the first alternative. As only second alternative matches it, predicate will not be used. Parser will use second alternative and if 1+2!=3 condition happen to be false, parser will throw failed predicate exception. However, if the input starts with two letters, then it could be matched by both alternatives and predicate will be used. This is how generated code looks: int alt4=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { int LA4_3 = input.LA(3); //predicate is used only if first two tokens are LETTER if ( ((1+2==3)) ) { alt4=1; } else if ( ((1+2!=3)) ) { alt4=2; } else { // ... irrelevant code ... } } break; //if the second token is not LETTER, predicate is not used case EOF: { alt4=2; } break; default: // ... } } break; //if the first token is not LETTER, predicate is not used case EOF: // ... default: // ... } Compare it to very similar gated rule: compareGated: LETTER ( {1+2==3}?=> LETTER LETTER | {1+2!=3}?=> LETTER* ); Parser will use the predicate no mater what. The second alternative will never be entered, because the predicate 1+2!=3 is never satisfied: int alt6=2; int LA6_0 = input.LA(1);if ( (LA6_0==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA6_1 = input.LA(2);if ( (LA6_1==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA6_3 = input.LA(3); Gated predicate causes antlr to throw different kind of exception in this case. As we will show later in this post, different hoisting of gated and disambiguating predicates can make much bigger difference with more complicated predicates. Namely, it can make difference between accepting and rejecting the input.LoopsAlthough it does not look like that at the first sight, loops are alternatives too. They use prediction to guess whether they should do one more round or whether they should end. Use predicate to stay in the loop only while it returns true: LETTER : 'a'..'z' | 'A'..'Z'; loop: ( {somePredicate()}?=> LETTER )*; The loop rule will match letters until the function somePredicate() returns false or until the rule runs out of LETTER tokens. loop1: do { int alt1=2; int LA1_0 = input.LA(1); // predicate is used during the prediction if ( (LA1_0==LETTER) && ((somePredicate()))) { alt1=1; } //matching: either jump out or match another LETTER switch (alt1) { case 1: { if ( !((somePredicate())) ) { throw new FailedPredicateException(...); } // ... match LETTER ... } break;default: break loop1; } } while (true); Disambiguating predicate can not be used for this purpose. Next predicate will not be used to decide whether parser should stay in the loop or not: LETTER : 'a'..'z' | 'A'..'Z'; loopDisambiguating: ( {somePredicate()}? LETTER )*; Technically, the loop is deciding between LETTER and <nothing> alternatives. Those are syntactically different and prediction uses disambiguating predicates only if it has to decide between syntactically ambiguous alternatives. The loopDisambiguating rule will match letters until it runs out of LETTER tokens. If the function somePredicate() returns false during that time, the rule will throw FailedPredicateException exception. Generated code is very similar to the previous one, only the prediction part changes. Predicate is not used: loop2: do { int alt2=2; //prediction ignores the predicate switch ( input.LA(1) ) { case LETTER: { alt2=1; } break; } //matching: either jump out or match another LETTER switch (alt2) { case 1: { if ( !((somePredicate())) ) { throw new FailedPredicateException(...); } // ... match LETTER ... } break;default: break loop2; } } while (true);Uncovered Alternatives It is perfectly ok to leave some alternatives uncovered. Alternatives with predicates will work as expected. Gated predicate is always hoisted and disambiguated predicate is hoisted only if there are multiple ambiguous alternatives. Gated predicate is always hoisted: uncoveredGated: LETTER ( {3+4==7}?=> LETTER | NUMBER ); Hoisted disambiguated predicate: uncoveredDisambiguatingHoisted: LETTER ( {2+5==7}? LETTER* | LETTER+ ); Non hoisted disambiguated predicate: uncoveredDisambiguatingNotHoisted: LETTER ( {2+4==6}? LETTER | NUMBER );Combination of Gated and Disambiguated PredicatesIf one alternative is gated and the other is disambiguated, then gated predicate is always hoisted and disambiguated predicate is hoisted only if it is actually needed. Gated predicate is hoisted while disambiguated predicate is not hoisted: combinationDisambiguatingNotHoisted: LETTER ( {1+4==5}?=> LETTER | {1+4!=5}? NUMBER ); Both predicates are hoisted: combinationDisambiguatingHoisted: LETTER ( {1+5==6}?=> LETTER* | {1+5!=6}? LETTER+ );Additional Parenthesis If you close disambiguating predicate in parenthesis, the predicate is still be treated like disambiguating predicate. Another way to write disambiguating predicate: stillDisambiguating: ({2+2==4}?) LETTER; testStillDisambiguating: stillDisambiguating | LETTER; If you put additional parenthesis around gated predicate, the predicate will be ignored: ignored: ({3+3==6}?)=>LETTER;Backtracking Predicates run even if the parser is backtracking e.g, if it is inside syntactical predicate. If the parser is backtracking and the predicate fails, backtracking fails too. Failed predicate exception is thrown only if the parser is not backtracking. The backtrack rule initiates backtracking (Backtracking.g): LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER; number: LETTER {2+3!=5}? LETTER;backtrack: (number)=> number | word; Since backtracking is possible, generated predicate check is different: if ( !((2+3!=5)) ) { if (state.backtracking>0) {state.failed=true; return retval;} throw new FailedPredicateException(input, 'number', '2+3!=5'); } Backtracking is yet another reason why conditions must not have side effects, must be repeatable and their evaluation order must not matter.Writing Conditions This chapter shows how to write advanced conditions for semantic predicates. First, we will show how to access and use input token stream. We will also explain how to reference labeled tokens. Last part is about local variables in non-hoisted conditions. Unless specified otherwise, all used examples are located in Environnement.g file.Input Token Stream Each generated parser has public TokenStream input field. This field provides access to the whole input token stream and also to current position in that token stream. Its most important method is the Token LT(int k) method. The parameter k contains relative position of the token you are interested in. The number 1 means ‘look ahead one token’, 2 means ‘second token ahead’ and so on. Negative numbers reference passed tokens: -1 will return previous token, -2 returns the one before it and so on. Do not use 0. Its meaning is undefined and the default parser returns null. Note: relative referencing works correctly even when the grammar is in backtracking state. -1 is always previous token and 1 is always the next token.Examples Disambiguating: if the word starts with letter a, then it must have at least two letters: word: LETTER ( { input.LT(-1).getText().equals('a')}? LETTER+ | { !input.LT(-1).getText().equals('a')}? LETTER* ); Gated: if the second numeral of the number is 9, then it must have exactly 3 numerals: number: NUMERAL ( {input.LT(1).getText().equals('9')}?=> NUMERAL NUMERAL | {!input.LT(1).getText().equals('9')}?=> NUMERAL* ); Note: the choice of predicate slightly matter in this case. It influences what kind of error will be thrown if the input does not match the rule.Labels and Lists Predicates can reference and use any previously defined label or label list the same way as actions can.Label Example If the first letter of the word is a, then the word must have at least two letters: labeledWord: a=LETTER ( { $a.getText().equals('a')}? LETTER+ | { !$a.getText().equals('a')}? LETTER* );Label List ExampleIf the word starts with less then 3 letters, then it must end with a number: labeledListWord: a+=LETTER+ ( { $a.size() < 3 }?=> NUMERAL | { $a.size() >= 3}?=> ); Note: the choice of predicate does matter in this case. The above example works correctly only if it uses gated {...}?=> predicate instead of disambiguating {...}? one. The NUMERAL and <nothing> are syntactically different. Disambiguating predicate would not be used for prediction e.g., it would not be hoisted. The parser would base its decision solely on the next token (is it NUMERAL?). The condition would be used as an afterwards assertion to check whether the number of letters was right. Such grammar would throw an exception on the abcd9 input, while ours would accept it.Undefined Labels Predicate can NOT reference not-yet-defined labels. The parser is generated, but the first attempt to use the rule throws null pointer exception in runtime: //this would cause null pointer exception nullPointerAtPredicate: LETTER { $a.getText().equals('a') }? a=LETTER;Labels and Hoisting Into Other Rules As the label has to be defined before being used in the predicate and predicates are copied into other rules only if they are located in the very beginning of a rule, you do not have to worry about hoisting into other rules.Access to Local Variables Antlr allows you to define custom local variables and use them withing one rule. If you are sure that predicate will not be copied into other rules, it can use them. Of course, using local variables if the predicate can be copied into other rules will result in faulty parser. Create local variables and use them in the predicate. If the word starts with less then 10 letters, then it must end with a number localVariables @init {int num=0;} //define local variable num : (LETTER { num++; })* //raise the num variable by 1 for each letter ( // what should follow depends on the variable value { num < 10 }?=> NUMERAL | { num >= 10}?=> ); Note: The same warning as before applies. We must use gated predicates. You must be especially careful not to use local variables in potentialy hoisted predicates. For example, Antlr Reference book recommends following rule to match only numbers composed of less then four numerals (ANTLRReference3Error.g): localVariablesWarning @init {int n=1;} // n becomes a local variable : ( {n<=4}?=> NUMERAL {n++;} )+ // enter loop only if n<=4 ; The above rule works well in isolation, that is when it is not used in other rules. Unfortunately, if you include it into other rules, the predicate may be hoisted into that other rule (ANTLRReference3Error.g): // syntax error in generated parser syntaxError: localVariablesWarning | LETTER; The n<=4 predicate will be copied into the syntaxError rule. The variable n is not accessible inside that rule and generated parser will be syntactically incorrect.Solving Initial Use Cases Finally, we are going to solve both use cases described in the motivational chapter.Keywords – nth Link to original use case. We created function isInsideNth that returns true only if previous token matched name of some nth pseudo class. The function is used as condition inside gated predicate. Generated parser will assume that input contains nth expression if and only if it is inside nth pseudo class. UseCasesNth.g file: @parser::members { private static Set<String> NTH_PSEUDOCLASSES = new HashSet<String>(); static { NTH_PSEUDOCLASSES.add('nth-child'); NTH_PSEUDOCLASSES.add('nth-last-child'); NTH_PSEUDOCLASSES.add('nth-of-type'); NTH_PSEUDOCLASSES.add('nth-last-of-type'); }public boolean isInsideNth() { return isNthPseudoClass(input.LT(-1)); }private boolean isNthPseudoClass(Token a) { if (a == null) return false; String text = a.getText(); if (text == null) return false; return NTH_PSEUDOCLASSES.contains(text.toLowerCase()); }}LPAREN: '('; RPAREN: ')'; COLON: ':'; COMMA: ','; IDENT : ('a'..'z' | 'A'..'Z')+;//pseudoparameters and nth with dummy syntax pseudoparameters: IDENT (COMMA IDENT)*; nth: IDENT; //real nth syntax ommited for simplicity sake// pseudoclass pseudo : COLON COLON? IDENT (( { isInsideNth()}?=> LPAREN nth RPAREN | LPAREN pseudoparameters RPAREN )?) ; An alternative solution with labels and rewrite rule: //different solution - note that we need to use rewrite rules in this case pseudoDifferentSolution : COLON COLON? name=IDENT (( { isNthPseudoClass($name)}?=> LPAREN nthParameters=nth RPAREN | LPAREN parameters=pseudoparameters RPAREN )?) -> $name $nthParameters? $parameters? ;Significant Whitespaces Link to original use case. Css selectors can be composed of multiple parts separated by combinators >, +, ~ and <space>. Each part called simple selector starts with an optional element name and may be followed by multiple pseudo classes, attributes and similar structures. Ignoring the space as combinator problem, simplified simple selector grammar can look like this: COLON: ':'; STAR: '*'; NUMBER: ('0'..'9')+; IDENT : ('a'..'z' | 'A'..'Z')+;//some options have been removed from following rules for simplicity sake elementName: IDENT | STAR | NUMBER; pseudoClass: COLON COLON? IDENT; elementSubsequent: pseudoClass;simpleSelectorWrong: (elementName elementSubsequent*) | elementSubsequent+ ; The above simpleSelectorWrong rule matches valid simple selectors: h1, h1:first-child:hover, :first-child:hover and :hover. Unfortunately, as whitespaces are thrown away, the above rule matches more then that. For example, it would match also h1:first-child :hover which should be interpreted exactly the same way as h1:first-child *:hover selector e.g., as two simple selectors joined by <space>. We created method that returns true only if there is no hidden token between previous and next tokens. Unless configured otherwise, all tokens are instance of CommonToken class. Since common token knows its start and stop index, we can cast and compare them to see whether there was something between them. New parser methods (UseCasesSelectors.g): @parser::members { public boolean onEmptyCombinator(TokenStream input) { return !directlyFollows(input.LT(-1), input.LT(1)); }private boolean directlyFollows(Token first, Token second) { CommonToken firstT = (CommonToken) first; CommonToken secondT = (CommonToken) second;if (firstT.getStopIndex() + 1 != secondT.getStartIndex()) return false;return true; } } Fixed simple selector uses gated predicate to check whether it should or should not continue adding subsequent elements (UseCasesSelectors.g): simpleSelector: ( elementName ({!onEmptyCombinator(input)}?=>elementSubsequent)* ) | ( elementSubsequent ({!onEmptyCombinator(input)}?=>elementSubsequent)* ); We have to use gated predicates in this case. If we would use disambiguating predicate, generated parser would not use our predicate to decide whether to stay inside the loop or not. It is because the loop is technically deciding between elementSubsequent and <nothing> alternatives and those are syntactically different. The {...}? predicate would not be used during the prediction, it would just occasionally throw exceptions.Wrapping It Up Semantic predicates are java conditions written inside the grammar. They are copied into generated parser as they are, without any changes. If token matching algorithm reaches semantic predicate and that predicate fails, FailedPredicateException exception is thrown. If a rule or an alternative starts with semantic predicate, that semantic predicate can be used during prediction phase. Failed predicates during the prediction phase never throw exceptions, but they may disable some alternatives. This is called hoisting. Conditions must be without side effects, repeatable and their evaluation order should not matter. If they are hoisted into other rules, then they can not reference local variables or parameters. Semantic predicates are used in three different ways: as validating semantic predicates, as disambiguating semantic predicates and as gated semantic predicates.Validating Semantic Predicates Validating semantic predicates act as assertions only. As a result, validating semantic predicates are never hoisted. Condition is closed inside curly braces followed by question mark { condition }?. It must be placed either in the middle of a rule or in the middle of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;Disambiguating Semantic Predicates Disambiguating semantic predicates help to choose between syntactically equivalent alternatives. As a result, disambiguating semantic predicates are hoisted only if the parser has to choose between multiple ambiguous alternatives. Disambiguating semantic predicates use exactly the same syntax as validating predicates. Condition is closed inside curly braces followed by question mark { condition }?. However, they must be placed either in the beginning of a rule or in the beginning of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; // beginning of an alternative alternatives: LETTER ( {2+3==5}? LETTER* | {2+3==5}? LETTER+ );Gated Semantic Predicates Gated semantic predicates are used to dynamically turn on and off portions of grammar. As a result, all gated predicates placed in the beginning of a rule or an alternative are hoisted. Gated predicates placed in the middle of a rule or an alternative are never hoisted. Condition is closed inside curly braces followed by question mark and double arrow { condition }?=>: NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;ResourcesWincent Wiki Stack Overflow Question The Definitive ANTLR Reference  Reference: ANTLR – Semantic Predicates from our JCG partner Maria Jurcovicova at the This is Stuff blog. ...
java-logo

Checking out what is new with Servlet 3.0

With the JEE6 specification hitting the market, some major changes have taken place with respect to how you would approach developing applications in the enterprise application world. In this article i would be touching upon a few changes that were done with respect to web application development.                 First things first, say good bye to the web.xml deployment descriptor (at least for parts of it). Well its not like it is deprecated, but with the rise of the usage of annotations and their usage, the new specification allows us to define our configuration using annotations, though some thing such as welcome file lists, context params etc will still need to go inside your web.xml . Annotations available for use are;@WebServlet @WebFilter @WebInitParam @WebListener @MultipartConfigIn this article i would be checking out the @WebServlet and @WebFilter annotations. Let us see how we would usually map a servlet in the web.xml era; <servlet> <servlet-name>myservlet</servlet-name> <servlet-class>com.example.MyServlet</servlet-class> </servlet><servlet-mapping> <servlet-name>myservlet</servlet-name> <url-pattern>/hello</url-pattern> </servlet-mapping> With the Servlet 3.0 spec, now configuring a Servlet is as easy as annotating a class that extends HttpServlet. Lets see how that looks like; @WebServlet('/student') public class StudentServlet extends HttpServlet{/** * */ private static final long serialVersionUID = 2276157893425171437L;@Override protected void doPost(HttpServletRequest arg0, HttpServletResponse arg1) throws ServletException, IOException { StringBuilder response = new StringBuilder(500); response.append('<html><body>').append('Registered Student : ').append(arg0.getParameter('txtName')).append('</body></html>'); arg1.getOutputStream().write(response.toString().getBytes()); arg1.getOutputStream().flush(); arg1.getOutputStream().close(); } } All you need is the @WebServlet annotation. In order for this to work, the class should reside either in the WEB-INF/classes folder or within a jar residing in the WEB-INF/lib folder. Next up lets see how we would configure a filter with annotations. package com.blog.example.servlettest;import java.io.IOException;import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.annotation.WebFilter;@WebFilter('/student') public class StudentFilter implements Filter{@Override public void destroy() { }@Override public void doFilter(ServletRequest arg0, ServletResponse arg1, FilterChain arg2) throws IOException, ServletException {if(arg0.getParameter('txtName')==null || arg0.getParameter('txtName').isEmpty()) { arg1.getWriter().append('Invalid name supplied'); arg1.getWriter().flush(); arg1.getWriter().close(); } else { arg2.doFilter(arg0, arg1); } }@Override public void init(FilterConfig arg0) throws ServletException { // TODO Auto-generated method stub}} Again very easy. Just a mere annotation to notify it as a filter. Note that here we implement the Filter interface. The value or the urlPatterns should be available. Using both is illegal as per the specification. In the coming weeks i will cover the other new annotations available with JEE6 and wrap up with a comprehensive example using them together. If JEE6 will replace Spring framework or not is not a question by itself, but i believe we would be seeing some fierce competition between the two. The annotations vs xml debate is more or less resolved with people with preference for each holding their own grounds. I believe a little bit from both worlds would be beneficial for an application. You can download and run a sample example which i have uploaded here. If you are using JBoss-AS7 all you need to do is run the application server on standalone mode and do a mvn package jboss-as:deploy and point the browser to http://localhost:{port}/servlet3.0. That is it for today. Thank you for reading and if you have any comments or suggestions for improvement, please do leave by a comment. Have a good day all!!   Reference: Checking out what is new with Servlet 3.0 from our JCG partner Dinuka Arseculeratne at the My Journey Through IT blog. ...
java-logo

Functional Java collections

There is a lot of functional hype these days so I would give a short overview on what is out there at least when it comes to collections in Java. Personally I like standard collections API but i some cases can be awkward and add additional verboseness. This should not be a problem in latter version of Java 8+. There we would probably worry about not creating callback hell but hey there is no silver bullet for most stuff why should there be one for programming?             The Guava Way Guava project is one of Google’s core libraries where there are plenty of different core language aspects and problems covered. There are utilities and extensions for everyday usage like : collections, primitives, caching, common annotations, string processing, I/O, Math, Reflections and many others. We will only take a look at the Collections goodies so lets see some of them : // list ImmutableList<String> of = ImmutableList.of("a", "b", "c", "d"); // Same one for map ImmutableMap<String, String> map = ImmutableMap.of("key1", "value1", "key2", "value2"); //list of ints List<Integer> theList = Ints.asList(1, 2, 3, 4, 522, 5, 6);The Guava Collections are compatible with the JDK collections since they mostly extend or implement the standard classes. There are several cool utilities that are part of the API and have similar names with the ones from java.util.Collections. Basically any programmer who knows the JDK collections should be able to transfer to Guava easily. The ones for List is called Lists, one for Set is Sets, for Map is Maps and so on for the rest. For example: //create new List List<someLongName> list = Lists.newArrayList(); //create new LinkedHashMap Map<someKeyType, SomeValueType> map = Maps.newLinkedHashMap();//initalize Array List on the spot List<String> someList = Lists.newArrayList("one", "two", "three");//set inital size for readability as well as performance List<Type> exactly100 = Lists.newArrayListWithCapacity(100); List<Type> approx100 = Lists.newArrayListWithExpectedSize(100);Methods corresponding to a particular interface are grouped in a very intuitive manner. There are also some extremely good ways of building cache with various of features : Cache<Integer, Customer> cache = CacheBuilder.newBuilder() .weakKeys() .maximumSize(10000) .expireAfterWrite(10, TimeUnit.MINUTES) .build(new CacheLoader<Integer, Customer>() {@Override public Customer load(Integer key) throws Exception {return retreveCustomerForKey(key); } }); Since Guava is available in most of the maven repositories its very easy to add it to your build LAMBDAJ The idea behind the project is to manipulate collections in a functional and statically typed way. This is achieved in a way the avoids repetitions of simple tasks we usually do with collections. Repetition makes programmers to go with copy/pasting and creates makes them create bugs by doing so. Accessing collections without explicit looping provides way of filtering, sorting, extraction, grouping, transforming, invoking a method on each item or sum up the elements or fields of those element in a collections. Additionally to all these features lambdaj is also a DSL in a way since it adds very cool ‘sugar’ features to the syntax making it more readable in pseudo-english. This is done with static methods so in order to use them we include them directly: import static ch.lambdaj.Lambda.*; As it comes to checking and matching lambdaj relies deeply on Hamcrestmatchers. So for example to create a check for an odd integers and then filter a list with that check: Matcher<Integer> odd = new Predicate<Integer>() { public boolean apply(Integer item) { return item % 2 == 1; } }; List<Integer> oddNumbers = filter(odd, asList(1, 2, 3, 4, 5)); and as expected the list will return the list [1,3,5]. Lambdaj take a step further with it’s DSL, for example : List<Beneficiary> beneficiaries = with(transactions) .retain(having(on(Transaction.class) .getQunatity(), lessThan(100))) .extract(on(Transaction.class).getBeneficiary()) .sort(on(Beneficiary.class).getName()); Performance costs Although the best way to make your application fast is to have the cleanest code as possible there comes a time when you must optimize.In order to do that there is some info provided by the creators on the memory usage and time. Lambdaj has a performance wiki page with code examples. There are also some test done by other programmers where they compare lambdaj with JDK8 for example. There are also some measurements on memory usage of Guava. As for performance of Guava most of it’s functionality is standard JDK classes builders and utilities so the overhead is minimal. At the end of the day it’s up to you to decide how much effect each of these libraries will have on your project and if that is positive. I’m for the idea that almost every project must have Guava on it’s classpath. Related links summaryGuavahttp://code.google.com/p/guava-libraries/ lambdaj http://code.google.com/p/lambdaj/ Hamcrest http://hamcrest.org/ Guava Links http://www.tfnico.com/presentations/google-guava Guava examples https://github.com/mitemitreski/guava-examples Guava presentation http://blog.mitemitreski.com/2012/07/google-guava-for-cleaner-code.html  Reference: Functional Java collections from our JCG partner Mite Mitresky at the Java Advent Calendar blog. ...
quartz-scheduler-logo

Getting started with Quartz Scheduler on MySQL database

Here are some simple steps to get you fully started with Quartz Scheduler on MySQL database using Groovy. The script below will allow you to quickly experiment different Quartz configuration settings using an external file.                   First step is to setup the database with tables. Assuming you already have installed MySQL and have access to create database and tables. bash> mysql -u root -psql> create database quartz2; sql> create user 'quartz2'@'localhost' identified by 'quartz2123'; sql> grant all privileges on quartz2.* to 'quartz2'@'localhost'; sql> exit;bash> mysql -u root -p quartz2 < /path/to/quartz-dist/docs/dbTables/tables_mysql.sql The tables_mysql.sql can be found from Quartz distribution download, or directly from their source here. Once the database is up, you need to write some code to start up the Quartz Scheduler. Here is a simply Groovy script quartzServer.groovy that will run as a tiny scheduler server. // Run Quartz Scheduler as a server // Author: Author: Zemian Deng, Date: 2012-12-15_16:46:09 @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.22') @Grab('org.slf4j:slf4j-simple:1.7.1') @Grab('org.quartz-scheduler:quartz:2.1.6') import org.quartz.* import org.quartz.impl.* import org.quartz.jobs.*config = args.length > 0 ? args[0] : 'quartz.properties' scheduler = new StdSchedulerFactory(config).getScheduler() scheduler.start()// Register shutdown addShutdownHook { scheduler.shutdown() }// Quartz has its own thread, so now put this script thread to sleep until // user hit CTRL+C while (!scheduler.isShutdown()) { Thread.sleep(Long.MAX_VALUE) } And now you just need a config file quartz-mysql.properties that looks like this: # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseScheduler org.quartz.scheduler.instanceId = NON_CLUSTERED org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5# JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 You can run the Groovy script as usual bash> groovy quartzServer.groovy quartz-mysql.properties Dec 15, 2012 6:20:26 PM com.mchange.v2.log.MLogINFO: MLog clients using java 1.4+ standard logging. Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.C3P0Registry banner INFO: Initializing c3p0-0.9.1.1 [built 15-March-2007 01:32:31; debug? true; trace:10] [main] INFO org.quartz.impl.StdSchedulerFactory - Using default implementation for ThreadExecutor [main] INFO org.quartz.core.SchedulerSignalerImpl - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl [main] INFO org.quartz.core.QuartzScheduler - Quartz Scheduler v.2.1.6 created. [main] INFO org.quartz.core.QuartzScheduler - JobFactory set to: org.quartz.simpl.SimpleJobFactory@1a40247 [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Using thread monitor-based data access locking (synchronization). [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - JobStoreTX initialized. [main] INFO org.quartz.core.QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.1.6) 'DatabaseScheduler' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 5 threads. Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is not clustered.[main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler 'DatabaseScheduler' initialized from the specified file : 'quartz-mysql.properties' from the class resource path. [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler version: 2.1.6 Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource getPoolManager INFO: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 1hge16k8r18mveoq1iqtotg|1486306, debugUnreturnedConnectionStackTraces -> fals e, description -> null, driverClass -> com.mysql.jdbc.Driver, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 1hge16k8r18mveoq1iqtotg|1486306, idleConnectionTestPeriod -> 0, initialPoolSize -> 3, jdbcUrl -> jdbc:mysql://localhost:3306/quartz2, lastAcquisitionFailureDefaultUser -> null, maxAdministrativeTaskTime -> 0 , maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 8, maxStatements -> 0, maxStatementsPerConnection -> 120, minPoolSize -> 1, numHelperThreads -> 3, numThreadsAwaitingCheckoutDefaultUser -> 0, pref erredTestQuery -> null, properties -> {user=******, password=******}, propertyCycle -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, usesTraditionalReflectiveProxies -> false ] [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Freed 0 triggers from 'acquired' / 'blocked' state.[main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovering 0 jobs that were in-progress at the time of the last shut-down. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovery complete. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 'complete' triggers. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 stale fired job entries. [main] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED started.... CTRL+C [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutting down. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED paused. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutdown complete. That’s a full run of above setup. Go ahead and play with different config. Read http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration for more details. Here I will post couple more easy config that will get you started in a commonly used config set:A MySQL cluster enabled configuration. With this, you can start one or more shell terminal and run different instance of quartzServer.groovy with the same config. All the quartz scheduler instances should cluster themselve and distribute your jobs evenly. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseClusteredScheduler org.quartz.scheduler.instanceId = AUTO org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.jobStore.isClustered = true org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5# JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 Here is another config set for a simple in-memory scheduler. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = InMemoryScheduler org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 Now, if you need more fancy UI management of Quartz, give MySchedule a try. Happy scheduling!   Reference: Getting started with Quartz Scheduler on MySQL database from our JCG partner Zemian Deng at the A Programmer’s Journal blog. ...
java-logo

Why can’t I turn off the Garbage Collector?

Let’s start with a quick rewind to the early days of my career as a Java developer. I wanted to eliminate Garbage Collection (GC) pauses from a test that I was conducting. Lo and behold I was annoyed when I discovered that it couldn’t be done. Back then I left the issue for a “design error” and moved on with my life. Angry at James Gosling or whoever was responsible for the decision. A couple of days ago I ran into a situation that reminded me of those old times. Luckily, the years gone by have accumulated some insight about JVM internals in me, and I thought I’d share my current thoughts in a form of a blog post.       To start off – some of you might remember Java 1.1 that we used back in late nineties. Back then you actually had the possibility to turn off the GC. At least in Solaris, where the Sun-provided version of the JVM offered a possibility to add a -noasyncgc option to your JVM startup parameters. The option was still supported until the release of JDK 1.4 for backward compatibility, but already starting from JDK 1.2 it didn’t do anything. Besides adding complexity to your startup scripts. The option turned off the JVM-controlled garbage collector. You could still collect the unused objects by explicitly invoking System.gc() from your code. Sounds like the level of flexibility an experienced engineer could put into good use. So – why was this option removed? In fact, the motivations behind this move start to make sense, when you consider the following:By disabling the GC you essentially claim that you know how much memory your application would require during runtime. But what if you are wrong? Once the heap fills up, having the GC disabled would cause the death of your application. Invoking System.gc() might not execute the garbage collection at all. In modern JVM’s it is nothing more but a recommendation to the JVM that “I think it is a good spot to run the GC”. And – your sysadmin might have disabled the System.gc() calls at all by specifying -XX:+DisableExplicitGC startup parameter. If System.gc() is actually executed by the JVM, it results in a full garbage collection. This tends to be very expensive with large heaps and result in long pause times. You will still not achieve predictable GC timing by invoking a System.gc(), especially in multithreaded applications. Now, look at the points above and imagine an application that would run in a JVM without automatic GC. You would probably not want to bet your house on its behavior. The hair on my back immediately start rising when I try to picture a debugging session to trace down any performance issues with that application. So maybe mr. Gosling wasn’t making a design error after all. But what if my stop-the-world pauses are intolerably long? And I really-really wish to turn the GC off? There actually are some possibilities:There is a part of memory allocated by the JVM in which you can turn off the GC. If you wish you can prohibit GC on your class definitions by specifying -Xnoclassgc in your JVM options. Depending on your application, you might get rid of full GC pauses in compaction phases by tweaking your young and tenured generation sizes. In those cases you can configure the JVM to run only concurrent GC cycles, which in most cases do not affect the performance. You can allocate memory outside the heap. Those allocations obviously aren’t visible to the garbage collector and thus will not be collected. It might sound scary, but already from Java 1.4 we have access to the java.nio.ByteBuffer class which provides us a method allocateDirect() for off-heap memory allocations. This allows us to create large data structures, which is especially useful when running on a 32-bit architecture. This solution is not too uncommon – many BigMemory implementations are using ByteBuffers under the hood. Terracotta BigMemory and Apache DirectMemory for example.However, I would advise you to turn to those solutions only if you are sure about what you are doing. In 99.9% of the cases the garbage collector will be smarter than you.   Reference: Why can’t I turn off the Garbage Collector? from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog. ...
software-development-2-logo

What Is Challenging For Developers?

In a previous post of mine I asked the question: do programmers get bored? And yes, sometimes they do, especially if there are no challenges. The usual software project out there is trivial – implementing business case after business case. But programmers still find challenges even in the trivial applications and websites they are writing:            Complex user interface – implementing a complex UI on the web is a good challenge – often good components come out of that, that are later open-sourced and have the potential of getting popular. And if you are really good, you may solve a problem that many people have and make your component a standard solution. Architecture – if a system is big, the architecture matters. Separating components and deciding on the way they communicate is a challenge that requires both advanced skills in concrete technologies and the ability to get a good overview on a complex system. Big data – if you are lucky enough to work at a company that handles billions of records, you would be required to solve problems related to that previously unusual amounts of data. That requires deep understanding in databases and strategies for distributing and synchronizing data. As a result tools or even new databases can emerge. Look at Cassandra, for example, which was initially developed inside Facebook. Algorithms – it is rare that a business application requires complex algorithms, but in case there is, this is a great point for putting your efforts into and get satisfaction from your work. For example, you can optimize the delivery routes of a transport company, which in turn saves thousands of dollars for fuel. If you can do that, not only the clients will be happy (which is rarely an incentive for developers), but you will be happy as well for having cracked down a complex problem. Low-level implementations. Normally you work with tools and frameworks that are already developed and stable. However, sometimes you need to implement some low-level detail yourself. My most recent experience is the connection pooling and high-availability for a RabbitMQ client. It is a gratifying experience when you finally see that thing working properly in production, under heavy load, and allowing for any node to go down without affecting the system health. Other examples may include distributing scheduled jobs, augmenting a distributed cache solution with additional eviction policies, etc. Open-ended problems. There are problems that don’t have a solution yet. There are problems that are not known to be problems yet. Web search was such an unsolved (or poorly-solved) problem back in the nineties, when Google solved it. You rarely find these problems in a regular company, so you usually have to take them up yourself at home. My most recent example is my algorithmic music generation serviceThe list is not exhaustive, of course (feel free to add to the list in the comments), but you get the idea – even in regular companies there can be tasks that go out of the standard business-requirement box which makes you bored. If you are good at what you do, you have a chance of picking these tasks yourself. Note a common pitfall here – if something sounds cool and challenging it isn’t automatically something to go for. Writing an ORM is challenging, but you should instead reuse an existing one. Writing a web framework is also cool, but it’s most likely to take a lot of time without providing the benefits and stability of existing frameworks. Evaluating whether an existing solution works is also a challenging task, by the way. Certainly, there are things you won’t be able to work on in your company. You have two options. Either change your job or do interesting things at home (and then, possibly, reuse them at work). The computer-generated music is nothing related to my current job, but I did it. The RabbitMQ pooling on the other hand was a task in my current job. Overall, if you are really interested in doing interesting stuff, rather than just “completing business goals and getting the paycheck”, then you should put some thought into what is interesting for you and how can you get to working on it.   Reference: What Is Challenging For Developers? from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog. ...
software-development-2-logo

Making the right decisions when optimizing code

You have an optimization task at hand. Which is great – you finally have the possibility to do something interesting instead of implementing yet another invoice processing screen. The optimization task is (hopefully) linked to some non-functional requirement, such as the number of transactions your application has to process per second or the maximum time per transaction allowed.        Now, you have identified the part of your application that is the source of the performance problem. This part of your application uses a particular algorithm. You are considering switching the solution to another implementation and want to measure the performance impact. As discussed in one of our previous articles, you wisely choose the number of user transactions per second as the metrics to measure. You run your stress test with old implementation, write down the number of operations per second achieved, then run the very same stress test with new implementation, write down new number of operations per second. Next, you compare the numbers and make some decisions. Some of these decisions may include the requirement for further performance measurements and/or optimizations of the new implementation. Let me present some very simplistic example:With your old algorithm 1,000 transactions took 100 seconds With your improved algorithm the same 1,000 transactions took 90 secondsGreat! The new version is faster! The changes you have made to the algorithm gave you 10% better performance. You now might need to make further decisions based on that information concerning next steps while optimizing your algorithm. But, let us bring one more piece of information into account: the time spent on garbage collection. From the GC log you can extract total GC pauses times. That is the amount of time when your application was stopped doing stop-the-world GC work. Then we can have the following picture:With your old algorithm 1,000 transactions took 100 seconds, out of which GC pauses used 20 seconds With improved algorithm 1,000 transactions took 90 seconds, from which GC pauses used 27 secondsWhat can we deduce from this information? First of all, your algorithm’s running time has decreased from 100 – 20 = 80 seconds down to 90 – 27 = 63 seconds, 21% speedup. Secondly, the GC takes about 30% of your CPU time. Based on that your further algorithm’s optimization plans should focus not only on running speed, but also on decreasing memory usage and GC time. How should you decide which direction to take? To answer that, we should look into your performance requirements. Maybe you have already fulfilled your requirement on how many transactions per seconds you need to process. But are all the transactions processed in short enough time? Maybe your original wise idea of measuring only the number of transactions per second might not have been the best one? Those requirements in technical terms can be translated into two different aspects, namely – throughput and latency. Your current work has improved the throughput, but may have penalized the latency by introducing more and/or longer GC pauses. Further optimizations of your algorithm would be steered by the NFRs. Let’s imagine you still have not reached your throughput requirements. What could be your next steps? Considering that your implementation spends 30% of time on GC pauses, this fact alone should raise a red flag. On a typical case (yes, I know its like measuring the average temperature in a hospital and making decisions based on that) the GC pauses should not take more than 5% of the total time. And if you exceed 10% it is extremely likely that you should do something about it. The first step to take in reducing GC pauses is investigating your JVM configuration. Maybe you just need to increase the maximum heap size (-Xmx)? Maybe you should tune your generation sizes (-XX:NewRatio, -XX:SurvivorRatio, …)? Maybe you should experiment with different garbage collectors (-XX:+UseConcMarkSweepGC, -XX:+UseParallelGC, …)? Depending on your application, the right answer could be a change in the combination of the above, in some of them or not to provide any help at all. When configuring the JVM does not provide sufficient results, the next step is to look into your data structures. Maybe you can reduce the GC pauses by making changes to your source code. Maybe you can get rid of all the wrapper classes around primitives and significantly reduce the overhead? Maybe you could take a look in the Collection classes used and reduce the overhead posed? Or for some exotic cases when your algorithm constantly keeps creating and destroying the same objects, perhaps object pooling might be a good idea? And in the case above, only after this you might actually want to start reducing the overhead posed by your algorithm itself. Which in some exotic cases might lead you to discover that division is too slow. And you need to find a clever way to replace this with less expensive operation supported by your data structures. Or that memory barrier accompanying each write to java.util.concurrent.atomic.AtomicBoolean is too expensive. But let’s leave those cases for another story when I will describe some of the weirdest CPU wasters I have dealt with in my life. In conclusion – if you take up a quest to optimize your code, make sure you have thought through the requirements to both throughput and latency. And that you won’t stick to just one technique of optimization. Hopefully after reading the article you now have more tools in your arsenal.   Reference: Making the right decisions when optimizing code from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog. ...
apache-maven-logo

Tips for Writing Maven Plugins

I’ve spent a lot of time recently writing or working on plugins for Maven recently. They’re simple, rewarding and fun to write. I thought I’d share a couple of tips for making life easier when writing them.                 Tip 1: Separate the Task from the Mojo Initially you’ll put all the code for the mojo into the mojo’s class (i.e. the class that extends AbstractMojo). Instead think about separating the task out and having minimal shim code for the mojo. This will:Make it easier to unit test. Mean you can easily integrate with other tools, such as Ant. Make it easier to convert the simple data types in the mojo so more complex types for your task (e.g. turn a String into a File). Separate exception translation from the task.Tip 2: Consider Externalizing Configuration Normally you configure a plugin using theelement in the POM. This is fine for simple cases. When you have a large sets of configuration items, or where you might have several configuration profiles, this will result in long, hard to understand POMs. You can follow the assembly plugin’s example, have a standardised directory for putting config in, e.g. src/main/myplugin/config-profile-1.xml. Tip 3: Match the Mojo to the Phase Consider which phase you might want the mojo to run in. If it is doing things that ought to split accross phases, then split the mojo up and bind to the appropriate phase. It’ll make it easier to test and to maintain. Tip 4: Don’t Repeat Time Consuming Work Your mojo will get run multiple times on small variations of the same source code and config. Does it do a lot of intensive work every execution? I was working with a Mojo that unzipped a file every time it ran, by changing the zip code to only freshen files by checking file modification times, the task went from taking over a minute to execute to less than 10 seconds. Tip 5: Plan Your Testing Initially you’re probably writing your mojo and manually testing on the first project you’re going to use it on. This will be a long testing cycle, and result in an unreliable mojo. Separating the task from the mojo makes testing the task easy, but you’ll want to have some smoke tests for the mojo. Bugs in mojos can be hard for users to notice as there’s a tendency to assume most mojos are well tested and reliable. Tip 6: Consider how you Provide Documentation and Help for your Mojo IDEs and Maven can be a bit unhelpful here. What does that config item mean? Can I see an example? The solution is to provide a “help” mojo and optionally a Maven site. For example, if you execute “mvn assembly:help” or “mvn surefire:help -Ddetail=true -Dgoal=test” you’ll see help.   Reference: Tips for Writing Maven Plugins from our JCG partner Alex Collins at the Java, *NIX, testing, performance, e-gaming technology et al blog. ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close