Featured FREE Whitepapers

What's New Here?


Multi-threading in Java Swing with SwingWorker

If you’re writing a desktop or Java Web Start program in Java using Swing, you might feel the need to run some stuff in the background by creating your own threads. There’s nothing stopping you from using standard multi-threading techniques in Swing, and the usual considerations apply. If you have multiple threads accessing the same variables, you’ll need to use synchronized methods or code blocks (or thread-safe classes like AtomicInteger or ArrayBlockingQueue).           However, there is a pitfall for the unwary. As with most user interface APIs, you can’t update the user interface from threads you’ve created yourself. Well, as every Java undergraduate knows, you often can, but you shouldn’t. If you do this, sometimes your program will work and other times it won’t. You can get around this problem by using the specialised SwingWorker class. In this article, I’ll show you how you can get your programs working even if you’re using the Thread class, and then we’ll go on to look at the SwingWorker solution. For demonstration purposes, I’ve created a little Swing program.As you can see, it consists of two labels and a start button. At the moment, clicking the start button invokes a handler method which does nothing. Here’s the Java code: import java.awt.Font; import java.awt.GridBagConstraints; import java.awt.GridBagLayout; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.util.List; import java.util.concurrent.ExecutionException;import javax.swing.JButton; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.SwingUtilities; import javax.swing.SwingWorker;public class MainFrame extends JFrame {private JLabel countLabel1 = new JLabel('0'); private JLabel statusLabel = new JLabel('Task not completed.'); private JButton startButton = new JButton('Start');public MainFrame(String title) { super(title);setLayout(new GridBagLayout());countLabel1.setFont(new Font('serif', Font.BOLD, 28));GridBagConstraints gc = new GridBagConstraints();gc.fill = GridBagConstraints.NONE;gc.gridx = 0; gc.gridy = 0; gc.weightx = 1; gc.weighty = 1; add(countLabel1, gc);gc.gridx = 0; gc.gridy = 1; gc.weightx = 1; gc.weighty = 1; add(statusLabel, gc);gc.gridx = 0; gc.gridy = 2; gc.weightx = 1; gc.weighty = 1; add(startButton, gc);startButton.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent arg0) { start(); } });setSize(200, 400); setDefaultCloseOperation(EXIT_ON_CLOSE); setVisible(true); }private void start() {}public static void main(String[] args) { SwingUtilities.invokeLater(new Runnable() {@Override public void run() { new MainFrame('SwingWorker Demo'); } }); } } We’re going to add some code into the start() method which is called in response to the start button being clicked. First let’s try a normal thread. private void start() { Thread worker = new Thread() { public void run() {// Simulate doing something useful. for(int i=0; i<=10; i++) { // Bad practice countLabel1.setText(Integer.toString(i));try { Thread.sleep(1000); } catch (InterruptedException e) {} }// Bad practice statusLabel.setText('Completed.'); } };worker.start(); } As a matter of fact, this code seems to work (at least for me anyway). The program ends up looking like this:This isn’t recommended practice, however. We’re updating the GUI from our own thread, and under some circumstances that will certainly cause exceptions to be thrown. If we want to update the GUI from another thread, we should use SwingUtilities to schedule our update code to run on the event dispatch thread. The following code is fine, but ugly as the devil himself. private void start() { Thread worker = new Thread() { public void run() {// Simulate doing something useful. for(int i=0; i<=10; i++) {final int count = i;SwingUtilities.invokeLater(new Runnable() { public void run() { countLabel1.setText(Integer.toString(count)); } });try { Thread.sleep(1000); } catch (InterruptedException e) {} }SwingUtilities.invokeLater(new Runnable() { public void run() { statusLabel.setText('Completed.'); } });} };worker.start(); } Surely there must be something we can do to make our code more elegant? The SwingWorker Class SwingWorker is an alternative to using the Thread class, specifically designed for Swing. It’s an abstract class and it takes two template parameters, which make it look highly ferocious and puts most people off using it. But in fact it’s not as complex as it seems. Let’s take a look at some code that just runs a background thread. For this first example, we won’t be using either of the template parameters, so we’ll set them both to Void, Java’s class equivalent of the primitive void type (with a lower-case ‘v’). Running a Background Task We can run a task in the background by implementing the doInBackground method and calling execute to run our code. SwingWorker<Void, Void> worker = new SwingWorker<Void, Void>() { @Override protected Void doInBackground() throws Exception { // Simulate doing something useful. for (int i = 0; i <= 10; i++) { Thread.sleep(1000); System.out.println('Running ' + i); }return null; } };worker.execute(); Note that SwingWorker is a one-shot affair, so if we want to run the code again, we’d need to create another SwingWorker; you can’t restart the same one. Pretty simple, hey? But what if we want to update the GUI with some kind of status after running our code? You cannot update the GUI from doInBackground, because it’s not running in the main event dispatch thread. But there is a solution. We need to make use of the first template parameter. Updating the GUI After the Thread Completes We can update the GUI by returning a value from doInBackground() and then over-riding done(), which can safely update the GUI. We use the get() method to retrieve the value returned from doInBackground() So the first template parameter determines the return type of both doInBackground() and get(). SwingWorker<Boolean, Void> worker = new SwingWorker<Boolean, Void>() { @Override protected Boolean doInBackground() throws Exception { // Simulate doing something useful. for (int i = 0; i <= 10; i++) { Thread.sleep(1000); System.out.println('Running ' + i); }// Here we can return some object of whatever type // we specified for the first template parameter. // (in this case we're auto-boxing 'true'). return true; }// Can safely update the GUI from this method. protected void done() {boolean status; try { // Retrieve the return value of doInBackground. status = get(); statusLabel.setText('Completed with status: ' + status); } catch (InterruptedException e) { // This is thrown if the thread's interrupted. } catch (ExecutionException e) { // This is thrown if we throw an exception // from doInBackground. } }};worker.execute();What if we want to update the GUI as we’re going along? That’s what the second template parameter is for. Updating the GUI from a Running Thread To update the GUI from a running thread, we use the second template parameter. We call the publish() method to ‘publish’ the values with which we want to update the user interface (which can be of whatever type the second template parameter specifies). Then we override the process() method, which receives the values that we publish. Actually process() receives lists of published values, because several values may get published before process() is actually called. In this example we just publish the latest value to the user interface. SwingWorker<Boolean, Integer> worker = new SwingWorker<Boolean, Integer>() { @Override protected Boolean doInBackground() throws Exception { // Simulate doing something useful. for (int i = 0; i <= 10; i++) { Thread.sleep(1000);// The type we pass to publish() is determined // by the second template parameter. publish(i); }// Here we can return some object of whatever type // we specified for the first template parameter. // (in this case we're auto-boxing 'true'). return true; }// Can safely update the GUI from this method. protected void done() {boolean status; try { // Retrieve the return value of doInBackground. status = get(); statusLabel.setText('Completed with status: ' + status); } catch (InterruptedException e) { // This is thrown if the thread's interrupted. } catch (ExecutionException e) { // This is thrown if we throw an exception // from doInBackground. } }@Override // Can safely update the GUI from this method. protected void process(List<Integer> chunks) { // Here we receive the values that we publish(). // They may come grouped in chunks. int mostRecentValue = chunks.get(chunks.size()-1);countLabel1.setText(Integer.toString(mostRecentValue)); }};worker.execute();More …. ? You Want More …. ? I hope you enjoyed this introduction to the highly-useful SwingWorker class. You can find more tutorials, including a complete free video course on multi-threading and courses on Swing, Android and Servlets, on my site Cave of Programming.   Reference: Multi-threading in Java Swing with SwingWorker from our JCG partner John Purcell at the Java Advent Calendar blog. ...

Performance of inlined virtual method invocations in Java

Overview One of the benefits of dynamic compilation it the ability to support extensive method inlining on virtual method code. While inlining the code improves performance, the code still has to check the type (in case it has changed since it was optimised) or select between multiple possible implementations. This leads to the question; how does having multiple implementations of a method, called via an interface,  impact performance. Benchmarking This benchmark tests a trivial method invoked as elements of a list.  It compares different numbers of classes in the list to see how it performs, but also varies the amount the loop is unrolled so that the number of possible classes is reduced. e.g. for a list of 12 different classes, in a repeating pattern, with out unrolling the method invocation could be to any of the 12 classes, but when the loop is unrolled to a degree of two, each call has only 6 possible classes (12/2). When unrolled to a degree of three, there is 4 possible classes for each call (12/3), for a loop unroll to a degree of six, there is only 2 possible classes. Note: the possible classes is different for each line of code. These are the results on an 2.5 GHz i5. Times are in nanoseconds. The axis headings refer to the number of different classes used or called from a given line of code.1 used 2 used 3 used 4 used 6 used 8 used 12 used1 per call site 48.4 46.6 46.9 43.7 48.7 54.92 per call site115.880.5 92.8 87 1123 per call site2852832714 per call site669281 2756 per call site5622758 per call site49812 per call site530It appears that the number of classes loaded, or even the number of classes in the List is not as important as the number for possible classes called from a given piece of code.  It also appears that different pieces of code in the same loop can be optimised independantly. You can see that if there is only one possible class a given call can make, the number of classes used in the program doesn’t matter. Another implication is that if you are comparing two alternatives via a common interface, you have to be careful how you run tests so that the first run is not favoured by the fact only one type has been used. To address this I suggest; running all the tests multiple times for at least 2 seconds in total. This should reduce the bais associated with the order you perform the tests.   Reference: Performance of inlined virtual method invocations in Java from our JCG partner Peter Lawrey at the Vanilla Java blog. ...

AChartEngine – A Charting Library for Android Applications

As its name suggests, AChartEngine is a charting library that can be used in Android applications. It supports all the Android SDK versions from 1.6 and up. The 1.6 version offers only pan and button based zoom, while the 2.1 and newer add support for pinch zoom as this became available in the Android SDK 2.x and newer. A while ago, when AChartEngine dropped the support for older Android SDK versions than 2.1, many users asked it back in just a couple of days after the release. At that time, according to the official Android platform distribution available here, there were still about 5% of devices available worldwide that were running versions that are older than 2.x.   Adding charting to an Android application with AChartEngine, is as simple as adding the achartengine-x.y.z.jar to the application classpath and start coding against its APIs. The current stable version is 1.0.0 and the one under development 1.1.0. The jar file is only 110 KB is size, which is quite a small footprint nowadays. However, AChartEngine offers support for many chart types. A little bit of history In late 2008, Android developers were already looking for charting / graphing / plotting libraries. At that time there was no such free / open-source solution available. I started evaluating Android for fun and after developing a couple of Android applications that needed some graphing, I decided I could open-source the charting code under the name of AChartEngine. Version 0.2.0 was launched in March 2009, being the first open-source charting library for Android. At that time, Android SDK was at version 1.1. Features There are three main types of charts that are supported by AChartEngine:XY charts – display data on 2 axis (line, cubic line, area, bar, scatter, bubble, range (high-low)) ‘Round’ charts – pie, doughnut, dial Combined chart – can display a combination of the XY chartsFor a quick tour through some AChartEngine demo application screenshots, please visit the official website and the AChartEngine Google Code page Overall Class Design The diagram below shows the way the classes that handle the visual rendering of the charts are organized.The AbstractChart class describes behavior that is shared by all charts, including drawing background, legend, titles,… The XYChart class describes state and behavior that is common to the XY chart types like the rendering of axes, labels,… The RoundChart is similar to XYChart, but for the chart types that have a circular shape.Design Components The entire design is not limited to the visual / view part only. There are a few more components contributing to the overall picture.The model – datasets / series of data. The view – described above. Renderers – help in customizing the charts appearance (colors, fonts, axis, labels, formatting,…). ChartFactory – gets an instance of a dataset and an instance of a renderer and returns the desired chart embedded into an Intent (for the case when the chart fills an Activity) or a View (when the chart is a part of an Activity, together with other widgets). Tools – interaction tools for pan and zoom.Code sample The code below shows a simple example of how a chart can be initialized and added to an Activity. The package declaration and the imports have been removed for keeping the sample smaller. public class SampleChart extends Activity { private GraphicalView mChart;private XYMultipleSeriesDataset mDataset = new XYMultipleSeriesDataset();private XYMultipleSeriesRenderer mRenderer = new XYMultipleSeriesRenderer();private XYSeries mCurrentSeries;private XYSeriesRenderer mCurrentRenderer;private void initChart() { mCurrentSeries = new XYSeries('Sample Data'); mDataset.addSeries(mCurrentSeries); mCurrentRenderer = new XYSeriesRenderer(); mRenderer.addSeriesRenderer(mCurrentRenderer); }private void addSampleData() { mCurrentSeries.add(1, 2); mCurrentSeries.add(2, 3); mCurrentSeries.add(3, 2); mCurrentSeries.add(4, 5); mCurrentSeries.add(5, 4); }protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); }protected void onResume() { super.onResume(); LinearLayout layout = (LinearLayout) findViewById(R.id.chart); if (mChart == null) { initChart(); addSampleData(); mChart = ChartFactory.getCubeLineChartView(this, mDataset, mRenderer, 0.3f); layout.addView(mChart); } else { mChart.repaint(); } } } In order to have the above example work, a simple layout file must be defined and that will need to include a LinearLayout with the android:id=’@+id/chart’. Applications Applications of all types are using AChartEngine for chart rendering. The appbrain.com Android market states that 0.52% of their total number of published applications (around 600K) are using AChartEngine, which means that over 3000 of them are using AChartEngine. A list of the most popular applications using AChartEngine:MotoACTV – fitness tool developed by Motorola and pre-installed on their devices EKG viewers, bioharness applications Path trackers, fitness, biking applications Finance, business applications Others,…ResourcesOfficial website http://achartengine.org Google code website providing downloadable jar, javadocs and demo application, issue tracker and source code SVN http://code.google.com/p/achartengine Search for ‘AChartEngine’ on youtube. There is a bunch of helpful video tutorials. Browse stackoverflow.com for finding solutions or posting questions. Register on the AChartEngine Google group and post ideas. Visit the AChartEngine Facebook page facebook.com/achartengine Contact us at contact@achartengine.orgContributors welcome Contributing to an open-source project may look good in your CV. AChartEngine is an open-source project, so it is the result of a community effort. A suggested path to contributing to AChartEngine could be:Checkout the source code from SVN and try to understand it. Take a look at the opened issues and try fixing some of them. When done, provide patches. Ask for commiter rights. Answer questions on StackOverflow or other websites.  Reference: AChartEngine – A Charting Library for Android Applications from our JCG partner Attila-Mihaly Balazs at the Java Advent Calendar blog. ...

The Java ByteBuffer – a crash course

In my experience, java.nio.ByteBuffer is a source of confusion and subtle bugs when developers first encounter it, because it is not immediately obvious how to use it correctly. It took some repeated reading of the API docs and some experience to realize some subtleties, before I felt comfortable with them. This post is a short crash in how to use them correctly, to hopefully save others some trouble. Since all of this is inferred (rather than based on explicit documentation), and based on experience, I cannot claim the information is necessarily authoritative. I welcome feedback to point out mistakes or alternative points of view. I also welcome suggestions for additional pitfalls/best practices to cover.   I do assume that the reader will be looking at the API documentation to go with this post. I am not going to be exhaustive on all the possible things you can dowith a ByteBuffer. The ByteBuffer abstraction Look at a ByteBuffer as providing a view into some (undefined) underlying storage of bytes. The two most common concrete types of byte buffers are those backed by byte arrays, and those backed by direct (off-heap, native) byte buffers. In both cases, the same interface can be used to read and write contents of the buffer. Some parts of the API of a ByteBuffer is specific to some types of byte buffers. For example, a byte buffer can be read-only, restricting usage to a subset of methods. The array() method will only work for a byte buffer backed by a byte array (which can be tested with hasArray()) and should generally only be used if you know exactly what you are doing. A common mistake is to use array() to “convert” a ByteBuffer into a byte array. Not only does this only work for byte array backed buffers, but it is easily a source of bugs because depending on how the buffer was created, the beginning of the returned array may or may not correspond to the beginning of the ByteBuffer. The result tends to be a subtle bug where the behavior of code differs depending on implementation details of the byte buffer and the code that created it. A ByteBuffer offers the ability to duplicate itself by calling duplicate(). This does not actually copy the underlying bytes, it only creates a new ByteBuffer instance pointing to the same underlying storage. A ByteBuffer representing a subset of another ByteBuffer may be created using slice(). Key differences from byte arraysA ByteBuffer has value semantics with respect to hashCode()/equals() and as a result can be more conveniently used in containers. A ByteBuffer has offers the ability to pass around a subset of a byte buffer as a value without copying the bytes, by instantiating a new ByteBuffer. The NIO API makes extensive use of ByteBuffer:s. The bytes in a ByteBuffer may potentially reside outside of the Java heap. A ByteBuffer has state beyond the bytes themselves, which facilitate relative I/O operations (but with caveats, talked about below). A ByteBuffer offers methods for reading and writing various primitive types like integers and longs (and can do it in different byte orders).Key properties of a ByteBuffer The following three properties of a ByteBuffer are critical (I’m quoting the API documentation on each):A buffer’s capacity is the number of elements it contains. The capacity of a buffer is never negative and never changes. A buffer’s limit is the index of the first element that should not be read or written. A buffer’s limit is never negative and is never greater than its capacity. A buffer’s position is the index of the next element to be read or written. A buffer’s position is never negative and is never greater than its limit.Here is a visualization of an example ByteBuffer which is (in this case) backed by a byte array, and the value of the ByteBuffer is the word “test” (click it to zoom):That ByteBuffer would be equal to (in the sense of equals()) to any other ByteBuffer whose contents in between [position,limit) is the same. Suppose the byte buffer visualized above was bb, and we did this: final ByteBuffer other = bb.duplicate(); other.position(bb.position() + 4); We would now have two ByteBuffer instances both referring to the same underlying byte array, but their contents would be different (other would be empty):The buffer/stream duality of byte buffers There are two ways of accessing the contents of a byte buffer – absolute and relative access. For example, suppose I have a ByteBuffer that I know contains two integers. In order to extract the integers using absolute positioning, one might do this: int first = bb.getInt(0) int second = bb.getInt(4) Alternatively one can extract them using relative positioning: int first = bb.getInt(); int second = bb.getInt(); The second option is often convenient, but at the cost of having a side-effect on the buffer (i.e., changing it). Not the contents itself, but the ByteBuffers view into that content. In this way, ByteBuffers can behave similarly to a stream, if used as such. Best practices and pitfalls flip() the buffer If you are building up a ByteBuffer by repeatedly writing into it, and then want to give it away, you must remember to flip() it. For example, here is a method that copies a byte array into a ByteBuffer, assuming default encoding (note that ByteBuffer.wrap(), which is used here, creates a ByteBuffer that wraps the specified byte array, as opposed to copy the contents of it into a new ByteBuffer): public static ByteBuffer fromByteArray(byte[] bytes) { final ByteBuffer ret = ByteBuffer.wrap(new byte[bytes.length]);ret.put(bytes); ret.flip();return ret; } If we did not flip() it, the returned ByteBuffer would be empty because the position would be equal to the limit. Don’t consume the buffer Be careful not to “consume” a byte buffer when reading it, unless you specifically intend on doing so. For example, consider this method to convert a ByteBuffer into a String, assuming default encoding: public static String toString(ByteBuffer bb) { final byte[] bytes = new byte[bb.remaining()];bb.duplicate().get(bytes);return new String(bytes); } Unfortunately, there is no method provided to do absolute positioning reads of a byte array (but there does exist absolute positioning reads for the primitives). Notice the use of duplicate() when reading the bytes. If we did not, the function would have a side-effect on the input ByteBuffer. The cost of doing this is the extra allocation of a new ByteBuffer just for the purpose of the one call to get(). You could record the position of the ByteBuffer prior to the get() and restore it afterwards, but that has thread-safety issues (see next section). It is worth noting that this only applies when you are trying to treat ByteBuffer:s as values. If you are writing code whose purpose is do side-effect on ByteBuffers, treating them more like streams, you would of course be intending to do so and this section does not apply. Don’t mutate the buffer In the context of general-purpose code which is not intimately specific to a particular use-case, it is (in my opinion) good practice for a method that does an (abstractly) read-only operation (such as reading a byte buffer), to not mutate its input. This is a stronger requirement than “Don’t consume the ByteByffer”. Take the example from the previous section, but with an attempt to avoid the extra allocation of the ByteBuffer: public static String toString(ByteBuffer bb) { final byte[] bytes = new byte[bb.remaining()];bb.mark(); // NOT RECOMMENDED, don't do this bb.get(bytes); bb.reset(); // NOT RECOMMENDED, don't do thisreturn new String(bytes); } In this case, we record the state of the ByteBuffer prior to our call to get() and restore it afterwards (see the API documentation for mark() and reset()). There are two problems with this approach. The first problem is that the function above does not compose. A ByteBuffer only has one “mark”, and your (very general, not context aware) toString() method cannot safely assume that the caller is not trying to use mark() and reset() for its own purposes. For example, imagine this caller which is de-serializing a length-prefixed string: bb.mark(); int length = bb.getInt(); ... sanity check length final String str = ByteBufferUtils.toString(bb); ... do something bb.reset(); // OOPS - reset() will now point 4 bytes off, because toString() modified the mark (As an aside, this is a very contrived and strange example, because I found it hard to come up with a realistic example of code that uses mark()/reset(), which is typically used when treating the buffer in a stream-like faction, which would also feel the need to call toString() on the remainder of said buffer. I’d be interested in hearing what solutions people have come up with here. For example, one could imagine clear policies in a code base that allow mark()/reset() in value oriented contexts like toString() – but even if you did (and it smells likely to be inadvertently violated) you would still suffer the mutation problem mentioned later.) Let’s look at an alternative version of toString() that avoids this problem: public static String toString(ByteBuffer bb) { final byte[] bytes = new byte[bb.remaining()];bb.get(bytes); bb.position(bb.position() - bytes.length); // NOT RECOMMENDED, don't do thisreturn new String(bytes); } In this case we are not modifying the mark, so we do compose. However, we are still committing the “crime” of mutating our input. This is a problem in multi-threaded situations; you don’t want reading something to imply mutating it, unless the abstraction implies it (for example, with a stream or when using ByteBuffers in a stream-like fashion). If you’re passing around a ByteBuffer treating it as a value, putting it in containers, sharing them, etc – mutating them will introduces subtle bugs unless you are guaranteed that two threads never ever use the same ByteBuffer at the same time. Typically, the result of this type of bug is strange corruption of values or unexpected BufferOverFlowException:s. A version that suffers from neither of this appears in the “Don’t consume the buffer” section above, which uses duplicate() to construct a temporary ByteBuffer instance on which it is safe to call get(). compareTo() is subject to byte signedness bytes in Java are signed, contrary to what one typically expects. What is easy to miss though, is the fact that this affects ByteBuffer.compareTo() as well. The Java API documentation for that method reads: “Two byte buffers are compared by comparing their sequences of remaining elements lexicographically, without regard to the starting position of each sequence within its corresponding buffer.” A quick reading might lead one to believe the result is what you would typically expect, but of course given the definition of a byte in Java, this is not the case. The result is that the order of byte buffers that contains values with the highest order bit set, will diverge from what you may be expecting. Google’s excellent Guava library has an UnsignedBytes helper to mitigate your pain. array() is usually the wrong method to use Generally, don’t use array() casually. In order for it to be used correctly you either have to know for a fact that the byte buffer is array backed, or you have to test it with hasArray() and have two separate code paths for either case. Additionally, when you use it, you must use arrayOffset() in order to determine what the zeroth position of the ByteBuffer corresponds to in the byte array. In typical application code, you would not use array() unless you really know what you are doing and you specifically need it. That said, there are cases where it’s useful. For example, supposing you were implementing a ByteBuffer version of UnsignedBytes.compare() (again, from Guava) – you may wish to optimize the case where either or both of the arguments are array backed, to avoid unnecessary copying and frequent calls to the buffers. For such a generic and potentially heavily used method, such an optimization would make sense.   Reference: The Java ByteBuffer – a crash course from our JCG partner Peter Schuller at the (mod :world :scode) blog. ...

Should I use a 32- or a 64-bit JVM?

This is a question I have faced several times during my career in enterprise software development. Every once in awhile I’ve had to hand out recommendations for configuring a specific new environment. And more often than not, part of the question at hand was related to “Should I use a 32- or a 64-bit JVM”. To be honest, in the beginning I just flipped the coin. Instead of giving a reasoned answer. (Sorry, bros!) But by now I have gathered some more insight on this and thought to share it with you.           First stop – the more, the merrier. Right? So – as 64 > 32 then this would be an easy answer: if possible, always choose 64-bit? Well, hold your horses. The downside of the 64-bit architecture is that the same data structures consume more memory. A lot more. Our measurements show that depending on the JVM version and the operating system version along with hardware architecture you end up using 30-50% more heap than on 32-bit. Larger heap can also introduce longer GC pauses affecting application latency – running a full GC on a 4.5GB heap is definitely going to take longer than on a 3GB one. So it will not be correct to jump on the 64-bit bandwagon just because 64 is bigger than 32. But… when should you ever desire to use a 64-bit JVM at all then? In most cases the reason is large heap sizes. On different architectures you quickly face limitations of maximum heap size on 32-bit architectures. The following illustrates these limitations on different platforms:OS Max heap NotesLinux 2GB 3GB on specific kernels, such as hugememWindows 1.5GB Up to 3GB with “/3GB” boot flag and JRE compiled with /LARGEADDRESSAWARE switch)Mac OS X 3.8GB Alert – could not find an ancient Mac, so this is untested by meNow how come is it that bad? After all, I bet you have seen 32-bit machines running on 16G+ RAM and doing just fine. What’s wrong with the JVM that it can allocate less than 10% of this 16G on Windows? Main cause – address space. In a 32-bit system you can theoretically allocate up to 4GB of memory per process. What breaks this on Windows is how process address space is handled. Windows cuts the process address space in half. One half of it is reserved for the kernel (which a user process cannot use) and the other half for the user. It doesn’t matter how much RAM is in the box, a 32-bit process can only use 2GB of RAM. What’s even worse – this address space needs to be contiguous, so in practice you are most often left with just 1.5-1.8GB of heap on Windows boxes. There is a trick you can pull on 32-bit windows to reduce the kernel space and grow the user space. You can use the /3GB parameter in your boot.ini. However, to actually use this opportunity, the JVM must be compiled/linked using the /LARGEADDRESSAWARE switch. This unfortunately is not the case, at least with the Hotspot JVM. Until the latest JDK 1.7 releases the JVM is not compiled with this option. You are luckier if you are running on a jRockit on post-2006 versions. In this case you can enjoy up to 2.8-2.9GB of heap size. So – can we conclude that if your application requires more than ~2-3GB of memory you should always run on 64-bit? Maybe. But you have to be aware of the threats as well. We have already introduced the culprits – increased heap consumption and longer GC pauses. Lets analyze the causes here. Problem 1: 30-50% of more heap is required on 64-bit Why so? Mainly because of the memory layout in 64-bit architecture. First of all – object headers are 12 bytes on 64-bit JVM. Secondly, object references can be either 4 bytes or 8 bytes, depending on JVM flags and the size of the heap. This definitely adds some overhead compared to the 8 bytes on headers on 32-bit and 4 bytes on references. You can also dig into one of our earlier posts for more information about calculating the memory consumption of an object. Problem 2: Longer garbage collection pauses Building up more heap means there is more work to be done by GC while cleaning it up from unused objects. What it means in real life is that you have to be extra cautious when building heaps larger than 12-16GB. Without fine tuning and measuring you can easily introduce full GC pauses spanning several minutes. In applications where latency is not crucial and you can optimize for throughput only this might be OK, but on most cases this might become a showstopper. So what are my alternatives when I need larger heaps and do not wish to introduce the overhead caused by 64-bit architecture? There are several tricks we have covered in one of our earlier blog posts- you can get away by heap partitioning, GC tuning, building on different JVMs or allocating memory off the heap. To conclude, let’s re-state that you should always be aware of the consequences of choosing a 64-bit JVM. But do not be afraid of this option.   Reference: Should I use a 32- or a 64-bit JVM? from our JCG partner Vladimir Sor at the Plumbr Blog blog. ...

Escaping the JVM heap for memory intensive applications

If you’ve ever allocated large Java heaps, you know that at some point – typically starting at around 4 GiB – you will start having issues with your garbage collection pauses. I won’t go into detail about why pauses happen in the JVM, but in short it happens when the JVM does full collections and you have a large heap. As the heap increases, those collections might become longer.           The simplest way to overcome this is to tune your JVM garbage collection parameters to match the memory allocation and deallocation behaviour of your particular application. It is a bit of a dark art and requires careful measurements, but it’s possible to have very large heaps while avoiding mostly old generation garbage collections. If you want to learn more about Garbage Collection tuning, check out this JVM GC tuning guide. If you get really interested about GC in general, this is an excellent book: The Garbage Collection Handbook. There are JVM implementations that guarantee much lower pause times than the Sun VM, such as the Zing JVM – but normally at other costs in your system, such as increased memory usage and single threaded performance. The ease of configuration and low gc guarantees is still very appealing. For the purpose of this article, I will use the example of an in-memory cache or store in Java, mainly because I’ve built a couple in the past while using some of these techniques. We’ll assume we have a basic cache interface definition like so: import java.io.Externalizable;public interface Cache<K extends Externalizable, V extends Externalizable> { public void put(K key, V value); public V get(K key); } We’re requiring that keys and values are Externalizable just for this simple example, wouldn’t be like this IRL. We will show how to have different implementations of this cache that store data in memory in different ways. The simplest way to implement this cache would be using Java collections: import java.io.Externalizable; import java.util.HashMap; import java.util.Map;public class CollectionCache<K extends Externalizable, V extends Externalizable> implements Cache<K, V> { private final Map<K, V> backingMap = new HashMap<K, V>();public void put(K key, V value) { backingMap.put(key, value); }public V get(K key) { return backingMap.get(key); } } This implementation is straighforward. However, as the map size increases, we will be allocating a large number of objects (and deallocating), we are using boxed primitives which takes more space in memory that primitives and the map needs to be resized from time to time. We could certainly improve this implementation simply by using a primitive-based map. It would use less memory and objects but would still take space in the heap and possibly partition the heap, leading to longer pauses if for other reasons we do full GCs. Let’s look at other ways to store similar data without using the heap:Use a separate process to store the data. Could be something like a Redis or Memcached instance that you connect through sockets or unix sockets. It’s fairly straightforward to implement. Offload data to disk, using memory mapped files. The OS is your friend and will do a lot of heavy work predicting what you’ll read next from the file and your interface to it is just like a big blob of data. Use native code and access it through JNI or JNA. You’ll get better performance with JNI and ease of use with JNA. Requires you to write native code. Use direct allocated buffers from the NIO package. Use the Sun specific Unsafe class to access memory directly from your Java code.I will focus on the solutions that use exclusively Java for this article, direct allocated buffers and the Unsafe class. Direct Allocated Buffers Direct Allocated Buffers are extremely useful and used extensively when developing high-performance network applications in Java NIO. By allocating data directly outside the heap, in a number of cases you can write software where that data actually never touches the heap. Creating a new direct allocated buffer is as simple as it gets: int numBytes = 1000; ByteBuffer buffer = ByteBuffer.allocateDirect(numBytes); After creating a new buffer, you can manipulate the buffer in a few different ways. If you’ve never used Java NIO buffers you should definitely take a look as they are really cool. Besides ways to fill, drain and mark different points in the buffer, you can opt to have different view on the buffer instead of a ByteBuffer – e.g. buffer.asLongBuffer() gives you a view on the ByteBuffer where you manipulate elements as longs. So how could these be used in our Cache example? There are a number of ways, the most straightforward way would be to store the serialized/externalized form of the value record in a big array along with a map of keys to offsets and sizes of the record in that array. It could look like this (very liberal approach, missing implementations and assuming fixed size records): import java.io.Externalizable; import java.nio.ByteBuffer; import java.util.HashMap; import java.util.Map;public class DirectAllocatedCache<K extends Externalizable, V extends Externalizable> implements Cache<K,V> { private final ByteBuffer backingMap; private final Map<K, Integer> keyToOffset; private final int recordSize;public DirectAllocatedCache(int recordSize, int maxRecords) { this.recordSize = recordSize; this.backingMap = ByteBuffer.allocateDirect(recordSize * maxRecords); this.keyToOffset = new HashMap<K, Integer>(); }public void put(K key, V value) { if(backingMap.position() + recordSize < backingMap.capacity()) { keyToOffset.put(key, backingMap.position()); store(value); } }public V get(K key) { int offset = keyToOffset.get(key); if(offset >= 0) return retrieve(offset);throw new KeyNotFoundException(); }public V retrieve(int offset) { byte[] record = new byte[recordSize]; int oldPosition = backingMap.position(); backingMap.position(offset); backingMap.get(record); backingMap.position(oldPosition);//implementation left as an exercise return internalize(record); }public void store(V value) { byte[] record = externalize(value); backingMap.put(record); } } As you can see, this code has a number of limitations: fixed record size, fixed backing map size, limited way in which externalization is done, difficult to delete and reuse space, etc. While some of these are possible to overcome with clever ways to represent the record in byte arrays (and representing the keyToOffset map in direct allocated buffers also) or dealing with deletions (we could implement our own SLAB allocator) others such as resizing the backing map are difficult to overcome. An interesting improvement is to implement records as offsets to records and fields, thus reducing the amount of data we copy and do so only on demand. Be aware that the JVM imposes a limit to the amount of memory used by direct allocated buffers. You can tune this with the -XX:MaxDirectMemorySize option. Check out the ByteBuffer javadocs Unsafe Another way to manage memory directly from Java is using the hidden Unsafe class. Technically we’re not supposed to use this and it is implementation specific as it lives in a sun package, but the possibilities offered are endless. What Unsafe gives us is the ability to allocate, deallocate and manage memory directly from Java code. We can also get the actual pointers and pass them between native and java code interchangebly. In order to get an Unsafe instance, we need to cut a few corners: private Unsafe getUnsafeBackingMap() { try { Field f = Unsafe.class.getDeclaredField('theUnsafe'); f.setAccessible(true); return (Unsafe) f.get(null); } catch (Exception e) { } return null; } Once we have the unsafe, we can apply this to our previous Cache example: import java.io.Externalizable; import java.lang.reflect.Field; import java.util.HashMap; import java.util.Map;import sun.misc.Unsafe;public class UnsafeCache<K extends Externalizable, V extends Externalizable> implements Cache<K, V> { private final int recordSize; private final Unsafe backingMap; private final Map<K, Integer> keyToOffset; private long address; private int capacity; private int currentOffset;public UnsafeCache(int recordSize, int maxRecords) { this.recordSize = recordSize; this.backingMap = getUnsafeBackingMap(); this.capacity = recordSize * maxRecords; this.address = backingMap.allocateMemory(capacity); this.keyToOffset = new HashMap<K, Integer>(); }public void put(K key, V value) { if(currentOffset + recordSize < capacity) { store(currentOffset, value); keyToOffset.put(key, currentOffset); currentOffset += recordSize; } }public V get(K key) { int offset = keyToOffset.get(key); if(offset >= 0) return retrieve(offset);throw new KeyNotFoundException(); }public V retrieve(int offset) { byte[] record = new byte[recordSize];//Inefficient for(int i=0; i<record.length; i++) { record[i] = backingMap.getByte(address + offset + i); }//implementation left as an exercise return internalize(record); }public void store(int offset, V value) { byte[] record = externalize(value);//Inefficient for(int i=0; i<record.length; i++) { backingMap.putByte(address + offset + i, record[i]); } }private Unsafe getUnsafeBackingMap() { try { Field f = Unsafe.class.getDeclaredField('theUnsafe'); f.setAccessible(true); return (Unsafe) f.get(null); } catch (Exception e) { } return null; } } There’s a lot of space for improvement and you need to do a number of things manually, but it’s very powerful. You can also explicitly free and reallocate memory that you’ve allocated in this way, which allows you to write some code in the same way you would to with C. Check out the javadocs for Unsafe Conclusion There’s a number of ways to avoid using the heap in Java and in this way, use a lot more memory. You don’t need to do this and I’ve personally seen properly tuned JVMs with 20GiB-30GiB running with no long garbage collection pauses, but it is fairly interesting. If you want to check out how some projects use this for the basic (and honestly untested, almost written on a napkin) cache code I wrote here, have a look at EHCache’s BigMemory or Apache Cassandra which uses Unsafe also for this type of approach.   Reference: Escaping from the JVM heap for memory intensive applications from our JCG partner Ruben Badaro at the Java Advent Calendar blog. ...

Groovy: Multiple Values for a Single Command-line Option

One of the many features that makes Groovy an attractive scripting language is its built-in command-line argument support via CliBuilder. I have written about CliBuilder before in the posts Customizing Groovy’s CliBuilder Usage Statements and Explicitly Specifying ‘args’ Property with Groovy CliBuilder. In this post, I look at Groovy‘s CliBuilder’s support for multiple arguments passed via a single command-line flag. The Groovy API Documentation includes this sentence about CliBuilder: nbsp;      Note the use of some special notation. By adding ‘s’ onto an option that may appear multiple times and has an argument or as in this case uses a valueSeparator to separate multiple argument values causes the list of associated argument values to be returned.As this documentation states, Groovy’s built-in CliBuilder support allows a parsed command line flag to be treated as having multiple values and the convention for referencing this argument is to add an ‘s’ after the ‘short’ name of the command-line option. Doing so makes the multiple values associated with a single flag available as a collection of Strings that can be easily iterated to access the multiple values. In the post Customizing Groovy’s CliBuilder Usage Statements, I briefly looked at the feature supporting multiple values passed to the script via a single command line argument. I described the feature in that post as follows:The use of multiple values for a single argument can also be highly useful. The direct use of Apache Commons CLI’s Option class (and specifically its UNLIMITED_VALUES constant field) allows the developer to communicate to CliBuilder that there is a variable number of values that need to be parsed for this option. The character that separates these multiple values (a common in this example) must also be specified by specifying the character via ‘valueSeparator.’The usefulness of this Apache CLI-powered Groovy feature can be demonstrated by adapting a script for finding class files contained in JAR files that I talked about in the post Searching JAR Files with Groovy. The script in that post searched one directory recursively for a single specified String contained as an entry in the searched JARs. A few minor tweaks to this script changes it so that it can support multiple specified directories to recursively search for multiple expressions. The revised script is shown next. #!/usr/bin/env groovy/** * findClassesInJars.groovy * * findClassesInJars.groovy -d <<root_directories>> -s <<strings_to_search_for>> * * Script that looks for provided String in JAR files (assumed to have .jar * extensions) in the provided directory and all of its subdirectories. */def cli = new CliBuilder( usage: 'findClassesInJars.groovy -d <root_directories> -s <strings_to_search_for>', header: '\nAvailable options (use -h for help):\n', footer: '\nInformation provided via above options is used to generate printed string.\n') import org.apache.commons.cli.Option cli.with { h(longOpt: 'help', 'Help', args: 0, required: false) d(longOpt: 'directories', 'Two arguments, separated by a comma', args: Option.UNLIMITED_VALUES, valueSeparator: ',', required: true) s(longOpt: 'strings', 'Strings (class names) to search for in JARs', args: Option.UNLIMITED_VALUES, valueSeparator: ',', required: true) } def opt = cli.parse(args) if (!opt) return if (opt.h) cli.usage()def directories = opt.ds def stringsToSearchFor = opt.ssimport java.util.zip.ZipFile import java.util.zip.ZipExceptiondef matches = new TreeMap<String, Set<String>>() directories.each { directory -> def dir = new File(directory) stringsToSearchFor.each { stringToFind -> dir.eachFileRecurse { file-> if (file.isFile() && file.name.endsWith('jar')) { try { zip = new ZipFile(file) entries = zip.entries() entries.each { entry-> if (entry.name.contains(stringToFind)) { def pathPlusMatch = '${file.canonicalPath} [${entry.name}]' if (matches.get(stringToFind)) { matches.get(stringToFind).add(pathPlusMatch) } else { def containingJars = new TreeSet<String>() containingJars.add(pathPlusMatch) matches.put(stringToFind, containingJars) } } } } catch (ZipException zipEx) { println 'Unable to open file ${file.name}' } } } } }matches.each { searchString, containingJarNames -> println 'String '${searchString}' Found:' containingJarNames.each { containingJarName -> println '\t${containingJarName}' } } Lines 11 through 28 are where Groovy’s internal CliBuilder is applied. The ‘directories’ (short name of ‘d’) and ‘strings’ (short name of ‘s’) command-line flags are set up in lines 20 and 21. Those lines use the Option.UNLIMITED_VALUES to specify multiple values applicable for each argument and they also use valueSeparator to specify the token separating the multiple values for each flag (comma in these cases). Lines 27-28 obtain the multiple values for each argument. Although the options had short names of ‘d’ and ‘s’, appending ‘s’ to each of them (now ‘ds’ and ‘ss’) allows their multiple values to be accessed. The rest of the script takes advantage of these and iterates over the multiple strings associated with each flag. The next screen snapshot demonstrates the above script being executed.The above screen snapshot demonstrates the utility of being able to provide multiple values for a single command-line flag. Groovy’s built-in support for Apache CLI makes it easy to employ customizable command-line parsing.   Reference: Groovy: Multiple Values for a Single Command-line Option from our JCG partner Dustin Marx at the Inspired by Actual Events blog. ...

ANTLR – Semantic Predicates

Parsing simple grammar with antlr is simple. All you have to do is to use regular expressions to describe your language and let antlr generate lexer and parser. Parsing big or complicated languages occasionally require more, because describing them in regular expressions only can be hard or even impossible. Semantic predicates are java (or C++, or JavaScript, or …) conditions written inside the grammar. Antlr uses them either to choose between multiple alternatives or as additional assertions to be checked. They can be placed inside both lexer and parser, but this post focus only on their usage within the parser. They add a lot of power to antlr.  This post assumes that you have general idea on what is antlr and how generated parser works. If you do not, please read linked posts first. They contain everything needed. First chapter contains two motivational use cases. Second chapter describes syntax, terminology and shows simple failed semantic predicate. The post then explains how semantic predicates influence prediction and generated code. It also shows how to write useful conditions and how to solve initial use cases. Final chapter wraps it all into short conclusion. All examples and grammars used in this post are available on Github.Table of ContentsUse CasesKeywords – nth Significant WhitespacesBasicsSyntax TerminologyDisambiguating Semantic Predicate Validating Semantic Predicate Gated Semantic PredicateFailed PredicatesHoisting and PredictionWhat It Is Consequences When It Is UsedIf Needed Hoisting – Disambiguating Predicate Always Hoisting – Gated Rules Never Hoisting – Middle of a RuleNuancesDisambiguating Predicates – Advanced Loops Uncovered Alternatives Combination of Gated and Disambiguated Predicates Additional ParenthesisBacktracking Writing ConditionsInput Token StreamExamplesLabels and ListsLabel Example Label List Example Undefined Labels Labels and HoistingAccess to Local VariablesSolving Initial Use CasesKeywords – nth Significant WhitespacesWrapping It UpValidating Semantic Predicates Disambiguating Semantic Predicates Gated Semantic PredicatesResourcesUse Cases As we spend some time parsing css-like language, both our use cases describe problem we had to solve while writing css part of that parser. First one is about issue encountered while working on pseudo classes and second deals with tricky whitespaces in selectors. If you never worked with css or are not interested in use cases, skip this chapter.Keywords – nth Some css pseudo classes require parameter which can be either a number, an identifier, a selector or something called nth expression. Nth expressions are allowed only inside some pseudo classes and names of these pseudo classes are not reserved keywords in css. Nth expression is an expression of the form an+b where a and b are optional integers. Examples of valid nth expressions: 5n+2, -5n-2, -5n, -2, -n, n. We wanted our grammar to accept nth expressions, but only as parameters of pseudo classes where it is actually allowed. We wanted it to reject nth expressions as parameters of all remaining pseudo classes. All names normally correspond to IDENT tokens. Creating special token corresponding to nth pseudo class name is unpractical, because they are not reserved keywords. For example, they are also perfectly valid class names or element names. Having special token would force us to replace almost all IDENT occurrences by IDENT | NTH. Therefore, we are left with general identifier IDENT which can be either normal or nth pseudo class name. Standard syntactical regular expressions are unable to distiguish between them, but semantic predicates can. Link to solution.Significant Whitespaces Css has semi-important whitespaces. Semi-important means that most of them represent only ends of tokens and that is where their usefulness ends. For example, whitespaces in declaration are irrelevant. Following declarations are equal: padding : 2; padding:2; Most of CSS grammar behave the above way, so there is strong temptation to throw all whitespaces away. However, if we do that, then the next two selectors end up as the same IDENT COLON IDENT LPAREN COLON IDENT RPAREN COLON IDENT LPAREN COLON IDENT RPAREN LBRACE token stream: div :not(:enabled) :not(:disabled) {} div:not(:enabled):not(:disabled) {} Whitespaces in selectors are significant. The first selector is equivalent to div *:not(:enabled) *:not(:disabled) while the second is not. Note: CSS 2.1 grammar available from antlr site ignores this issue. If you want to use it, you have to fix it first. One solution would be to stop hiding whitespaces. This would force us to add explicit whitespace handing WS* into all possible places of all parser rules. That would be a lot of work and the resulting grammar would be less readable. It is also possible to give up on selectors tree building in antlr parser and write custom hand made tree builder for it. This is how we did it originally and we can safely tell that it works, but requires more time and debugging then the final semantic predicates based solution. Link to solution.Basics We start with semantic predicates syntax and some needed terminology. This chapter also outlines basics of what happens if predicate fails. We will not go into details, those are described in the next chapter.Syntax Semantic predicate is always enclosed inside curly braces followed either by question mark or question mark and double arrow:{ condition }? { condition }?=>First example uses simple 1+2==3 and 2+3==5 conditions. The grammar is stored inside the Basics.g file: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;Terminology Depending on which syntax is used and where it is placed, semantic predicates can be called by one of three different names:disambiguating semantic predicate, validating semantic predicate, gated semantic predicate.Disambiguating Semantic Predicate Disambiguating predicates use the shorter {...}? syntax. However, a predicate is called disambiguating only if it is placed in the beginning of a rule or in the beginning of an alternative. Disambiguating semantic predicate: LETTER : 'a'..'z' | 'A'..'Z'; // beginning of a rule rule: {1+2==3}? LETTER LETTER;// beginning of an alternative alternatives: LETTER ( {2+3==5}? LETTER* | {2+3==5}? LETTER+ );Validating Semantic Predicate Validating predicates also use the shorter {...}? syntax. The difference against disambiguating predicates is only in placement. Validating predicates are placed in the middle of a rule or in the middle of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;Gated Semantic Predicate Gated semantic predicates use the longer {...}?=> syntax. The condition can be placed anywhere. Gated semantic predicate: NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;Failed Predicates As we explained in expression language tutorial post, parser starts knowing which rule should correspond to the input and then tries to match it to the input. Matching always starts from left-most element of the rule and continues to the right. If matching encounters semantic predicate, then it tests whether the condition is satisfied. If it is not satisfied, FailedPredicateException exception is thrown. Consider Basics.g grammar shown at the beginning of this chapter: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL; If you open generated BasicsParser class, you will find that each rule has corresponding method with the same name. Both of them contains a copy of the predicate and both of them throws an exception if the condition is not satisfied: // inside the generated word() method if ( !((1+2==3)) ) { throw new FailedPredicateException(input, 'word', '1+2==3'); } // inside the generated number() method if ( !((2+3==5)) ) { throw new FailedPredicateException(input, 'number', '2+3==5'); } Prediction, e.g. what happens if the parser encounters rule with both multiple alternatives and predicates, for example start: word | number rule, is described in the next chapter.Hoisting and Prediction Depending on where and how you use semantic predicates, the parser may try to avoid failed predicate exceptions. Used strategy is called ‘hoisting’ and it is what makes predicates useful. This chapter explains what hoisting is and what consequences it has. Then we will explain when it is used and when it is not used.What It Is A parser that encountered rule with multiple alternatives has to decide which of these alternatives should be used. If some of them start with a predicate, parser may use that predicate to help with the decision. Consider the grammar stored in DisambiguatingHoistingNeeded.g file: LETTER : 'a'..'z' | 'A'..'Z'; word: {1+2==3}? LETTER LETTER; sameWord: {1+2!=3}? LETTER LETTER;start: word | sameWord; Both word() and sameWord() methods of the generated parser contains the usual failed predicate check. DisambiguatingHoistingNeededParser class extract: //inside the word() method if ( !((1+2==3)) ) { throw new FailedPredicateException(input, 'word', '1+2==3'); } //inside the sameWord() method if ( !((1+2!=3)) ) { throw new FailedPredicateException(input, 'sameWord', '1+2!=3'); } In addition, the code corresponding to the start rule contains copies of both word and sameWord semantic predicates. The part that chooses which rule to use next contains following code (comments are mine): int LA1_2 = input.LA(3); //predicate copied from the word rule if ( ((1+2==3)) ) { alt1=1; } //predicate copied from the sameWord rule else if ( ((1+2!=3)) ) { alt1=2; } else { NoViableAltException nvae = new NoViableAltException('', 1, 2, input); throw nvae; } The act of copying the predicate into prediction part of the generated parser is called hoisting.Consequences If there would be no hoisting, predicates would act as assertions only. We could use them to validate some conditions and that would be it. The above grammar would be illegal – it has two syntactically equivalent alternatives if you ignore predicates. As hoisting copies predicates all over the grammar, it has also several limiting consequences for them. It is not just something that happens on the background you can safely ignore:each predicate can run several times, order in which predicates run may be hard to predict, local variables or parameters may not be available in hoisted copies.Conditions must be without side effects, repeatable and their evaluation order should not matter. If they are hoisted into other rules, then they can not reference local variables or parameters. Only predicates placed in the beginning of a rule are hoisted into other rules. Hoisting in the case of alternatives is only within the rule. Therefore, you can break the third rule if the predicate is not placed in the beginning of the rule.When It Is Used Hoisting is used only when the parser has to decide between multiple rules or alternatives and some of them begin with a predicate. If it is gated predicate e.g., condition inside the {...}?=> syntax, then the predicate is hoisted no matter what. If it is disambiguating predicate e.g., condition inside the {...}? syntax, then the predicate is hoisted only if it is actually needed. The term ‘actually needed’ means that multiple alternatives could match the same input. Otherwise said, it is used only if multiple alternatives are ambiguous for some input. Predicates placed in the middle of rules or in the middle of alternatives are never hoisted.If Needed Hoisting – Disambiguating Predicate Consider the rule start in the DisambiguatingHoistingNotNeeded.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; NUMBER : '0'..'9'; word: {1+2==3}? LETTER LETTER; differentWord: {1+2!=3}? LETTER NUMBER;start: word | differentWord; The rule start has to choose between the word and differentWord rules. Both of them start with predicate, but the predicate is not needed in order to differentiate between them. The second token of the word is LETTER while the second token of the differentWord is NUMBER. Hoisting will not be used. Instead, the grammar will look into upcoming 2 tokens to distinguish between these rules. To verify, open the start() method of the generated DisambiguatingHoistingNotNeededParser class in our sample project: neither 1+2==3 nor 1+2!=3 condition was copied into it. int alt1=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { alt1=1; } break; case NUMBER: { alt1=2; } break; default: NoViableAltException nvae = new NoViableAltException('', 1, 1, input);throw nvae; } On the other hand, consider the rule start in the DisambiguatingHoistingNeeded.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; word: {1+2==3}? LETTER LETTER; sameWord: {1+2!=3}? LETTER LETTER;start: word | sameWord; The rule start has to choose between the word and sameWord rules. These two rules match exactly the same sequence of tokens and differ only by the predicate. Hoisting will be used. To verify, open the start() method of the generated DisambiguatingHoistingNeededParser class in our sample project. It contains copies of both 1+2==3 and 1+2!=3 conditions. int alt1=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { int LA1_2 = input.LA(3);if ( ((1+2==3)) ) { alt1=1; } else if ( ((1+2!=3)) ) { alt1=2; } else { /* ... */ } } break; default: // ... } } break; default: // ... } The exactly same thing is happening with disambiguating predicates in alternatives. This will not be hoisted (DisambiguatingHoistingNotNeeded.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; alternatives: LETTER ( {2+3==5}? LETTER | {2+3==5}? NUMBER ); This will be hoisted (DisambiguatingHoistingNeeded.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; alternatives: LETTER ( {2+3==5}? LETTER | {2+3==5}? LETTER );Always Hoisting – Gated Rules Look at the start rule in the GatedHoisting.g grammar: LETTER : 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';word: {1+2==3}?=> LETTER LETTER; differentWord: {1+2!=3}?=> LETTER NUMBER;start: word | differentWord; The rule start has to choose between the word and differentWord words. Both of them starts with predicate and that predicate is not needed in order to differentiate between them. However, hoisting will be used because we used gated semantic predicate. To verify, open the start() method of the generated GatedHoisting class in our sample project. It contains copies of both 1+2==3 and 1+2!=3 conditions. int LA1_0 = input.LA(1);if ( (LA1_0==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA1_1 = input.LA(2);if ( (LA1_1==LETTER) && ((1+2==3))) { alt1=1; } else if ( (LA1_1==NUMBER) && ((1+2!=3))) { alt1=2; } else { NoViableAltException nvae = new NoViableAltException('', 1, 1, input);throw nvae; } } else { NoViableAltException nvae = new NoViableAltException('', 1, 0, input);throw nvae; } The exactly same thing is happening with gated predicates in alternatives. This will be hoisted (GatedHoisting.g grammar): LETTER : 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';alternatives: LETTER ( {2+3==5}?=> LETTER | {2+3==5}?=>NUMBER );Never Hoisting – Middle of a Rule Hoisting is never used if the predicate is located in the middle of a rule or an alternative. It does not matter which predicate type is used. Therefore, if your rules differ only by the predicate, that predicate must be placed in the beginning of a rule or an alternative. Non-hoisted gated predicate (GatedNoHoisting.g): LETTER: 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';//gated predicate in the middle of a rule word: LETTER {1+2==3}?=> LETTER; differentWord: LETTER {1+2!=3}?=> NUMBER;start: word | differentWord; Another non-hoisted gated predicate (GatedNoHoisting.g): LETTER: 'a'..'z' | 'A'..'Z'; NUMBER: '0'..'9';//gated predicate in the middle of an alternative alternatives: LETTER ( LETTER {2+3==5}?=> LETTER | LETTER {2+3==5}?=> NUMBER ); Generated parser is in GatedNoHoistingParser class. The most important point is that if your rules differ only by the predicate and that predicate is placed in the middle of a rule, antlr will refuse to generate corresponding parser. Next expandable box contains several examples of syntactically incorrect grammars along with antr errors they cause. Incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER; sameWord: LETTER {1+2!=3}? LETTER;start: word | sameWord; Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:28:6: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:28:6: The following alternatives can never be matched: 2 Another incorrect grammar (SyntacticallyIncorrect.g): alternativesStart: LETTER ( LETTER {1+2==3}? | LETTER {1+2!=3}? ); Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:31:27: Decision can match input such as 'LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:31:27: The following alternatives can never be matched: 2 Yet another incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; gatedWord: LETTER {1+2==3}?=> LETTER; gatedSameWord: LETTER {1+2!=3}?=> LETTER;gatedStart: gatedWord | gatedSameWord; Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:40:11: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:40:11: The following alternatives can never be matched: 2 Last incorrect grammar (SyntacticallyIncorrect.g): LETTER : 'a'..'z' | 'A'..'Z'; gatedAlternativesStart: LETTER ( LETTER {1+2==3}?=> LETTER | LETTER {1+2!=3}?=> LETTER ); Error in console: warning(200): org\meri\antlr\predicates\SyntacticallyIncorrect.g:43:32: Decision can match input such as 'LETTER LETTER' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input error(201): org\meri\antlr\predicates\SyntacticallyIncorrect.g:43:32: The following alternatives can never be matched: 2Nuances Previous ‘When It Is Used’ sub-chapter shown how predicates behave in clearly hoisted and clearly non-hoisted situations. We selected examples to show as clear and simple situations as possible. This sub-chapter contains different set of examples. We picked most potentially confusing we have been aware of. All used examples are located in Nuances.g file.Disambiguating Predicates – Advanced Hoisted disambiguating predicates are used only if multiple alternatives are ambiguous for current input. Otherwise said, hoisted copy of the predicate is run only if the actual input could be parsed by multiple alternatives. Example: alternatives in the following rule are not syntactically equivalent, because they do not match the same set of inputs. First alternative matches exactly two letters and second alternative matches any number of letters: advancedDisambiguating: LETTER ( {1+2==3}? LETTER LETTER | {1+2!=3}? LETTER* ); If the input starts with exactly one LETTER, then it can not be parsed by the first alternative. As only second alternative matches it, predicate will not be used. Parser will use second alternative and if 1+2!=3 condition happen to be false, parser will throw failed predicate exception. However, if the input starts with two letters, then it could be matched by both alternatives and predicate will be used. This is how generated code looks: int alt4=2; switch ( input.LA(1) ) { case LETTER: { switch ( input.LA(2) ) { case LETTER: { int LA4_3 = input.LA(3); //predicate is used only if first two tokens are LETTER if ( ((1+2==3)) ) { alt4=1; } else if ( ((1+2!=3)) ) { alt4=2; } else { // ... irrelevant code ... } } break; //if the second token is not LETTER, predicate is not used case EOF: { alt4=2; } break; default: // ... } } break; //if the first token is not LETTER, predicate is not used case EOF: // ... default: // ... } Compare it to very similar gated rule: compareGated: LETTER ( {1+2==3}?=> LETTER LETTER | {1+2!=3}?=> LETTER* ); Parser will use the predicate no mater what. The second alternative will never be entered, because the predicate 1+2!=3 is never satisfied: int alt6=2; int LA6_0 = input.LA(1);if ( (LA6_0==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA6_1 = input.LA(2);if ( (LA6_1==LETTER) && (((1+2==3)||(1+2!=3)))) { int LA6_3 = input.LA(3); Gated predicate causes antlr to throw different kind of exception in this case. As we will show later in this post, different hoisting of gated and disambiguating predicates can make much bigger difference with more complicated predicates. Namely, it can make difference between accepting and rejecting the input.LoopsAlthough it does not look like that at the first sight, loops are alternatives too. They use prediction to guess whether they should do one more round or whether they should end. Use predicate to stay in the loop only while it returns true: LETTER : 'a'..'z' | 'A'..'Z'; loop: ( {somePredicate()}?=> LETTER )*; The loop rule will match letters until the function somePredicate() returns false or until the rule runs out of LETTER tokens. loop1: do { int alt1=2; int LA1_0 = input.LA(1); // predicate is used during the prediction if ( (LA1_0==LETTER) && ((somePredicate()))) { alt1=1; } //matching: either jump out or match another LETTER switch (alt1) { case 1: { if ( !((somePredicate())) ) { throw new FailedPredicateException(...); } // ... match LETTER ... } break;default: break loop1; } } while (true); Disambiguating predicate can not be used for this purpose. Next predicate will not be used to decide whether parser should stay in the loop or not: LETTER : 'a'..'z' | 'A'..'Z'; loopDisambiguating: ( {somePredicate()}? LETTER )*; Technically, the loop is deciding between LETTER and <nothing> alternatives. Those are syntactically different and prediction uses disambiguating predicates only if it has to decide between syntactically ambiguous alternatives. The loopDisambiguating rule will match letters until it runs out of LETTER tokens. If the function somePredicate() returns false during that time, the rule will throw FailedPredicateException exception. Generated code is very similar to the previous one, only the prediction part changes. Predicate is not used: loop2: do { int alt2=2; //prediction ignores the predicate switch ( input.LA(1) ) { case LETTER: { alt2=1; } break; } //matching: either jump out or match another LETTER switch (alt2) { case 1: { if ( !((somePredicate())) ) { throw new FailedPredicateException(...); } // ... match LETTER ... } break;default: break loop2; } } while (true);Uncovered Alternatives It is perfectly ok to leave some alternatives uncovered. Alternatives with predicates will work as expected. Gated predicate is always hoisted and disambiguated predicate is hoisted only if there are multiple ambiguous alternatives. Gated predicate is always hoisted: uncoveredGated: LETTER ( {3+4==7}?=> LETTER | NUMBER ); Hoisted disambiguated predicate: uncoveredDisambiguatingHoisted: LETTER ( {2+5==7}? LETTER* | LETTER+ ); Non hoisted disambiguated predicate: uncoveredDisambiguatingNotHoisted: LETTER ( {2+4==6}? LETTER | NUMBER );Combination of Gated and Disambiguated PredicatesIf one alternative is gated and the other is disambiguated, then gated predicate is always hoisted and disambiguated predicate is hoisted only if it is actually needed. Gated predicate is hoisted while disambiguated predicate is not hoisted: combinationDisambiguatingNotHoisted: LETTER ( {1+4==5}?=> LETTER | {1+4!=5}? NUMBER ); Both predicates are hoisted: combinationDisambiguatingHoisted: LETTER ( {1+5==6}?=> LETTER* | {1+5!=6}? LETTER+ );Additional Parenthesis If you close disambiguating predicate in parenthesis, the predicate is still be treated like disambiguating predicate. Another way to write disambiguating predicate: stillDisambiguating: ({2+2==4}?) LETTER; testStillDisambiguating: stillDisambiguating | LETTER; If you put additional parenthesis around gated predicate, the predicate will be ignored: ignored: ({3+3==6}?)=>LETTER;Backtracking Predicates run even if the parser is backtracking e.g, if it is inside syntactical predicate. If the parser is backtracking and the predicate fails, backtracking fails too. Failed predicate exception is thrown only if the parser is not backtracking. The backtrack rule initiates backtracking (Backtracking.g): LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER; number: LETTER {2+3!=5}? LETTER;backtrack: (number)=> number | word; Since backtracking is possible, generated predicate check is different: if ( !((2+3!=5)) ) { if (state.backtracking>0) {state.failed=true; return retval;} throw new FailedPredicateException(input, 'number', '2+3!=5'); } Backtracking is yet another reason why conditions must not have side effects, must be repeatable and their evaluation order must not matter.Writing Conditions This chapter shows how to write advanced conditions for semantic predicates. First, we will show how to access and use input token stream. We will also explain how to reference labeled tokens. Last part is about local variables in non-hoisted conditions. Unless specified otherwise, all used examples are located in Environnement.g file.Input Token Stream Each generated parser has public TokenStream input field. This field provides access to the whole input token stream and also to current position in that token stream. Its most important method is the Token LT(int k) method. The parameter k contains relative position of the token you are interested in. The number 1 means ‘look ahead one token’, 2 means ‘second token ahead’ and so on. Negative numbers reference passed tokens: -1 will return previous token, -2 returns the one before it and so on. Do not use 0. Its meaning is undefined and the default parser returns null. Note: relative referencing works correctly even when the grammar is in backtracking state. -1 is always previous token and 1 is always the next token.Examples Disambiguating: if the word starts with letter a, then it must have at least two letters: word: LETTER ( { input.LT(-1).getText().equals('a')}? LETTER+ | { !input.LT(-1).getText().equals('a')}? LETTER* ); Gated: if the second numeral of the number is 9, then it must have exactly 3 numerals: number: NUMERAL ( {input.LT(1).getText().equals('9')}?=> NUMERAL NUMERAL | {!input.LT(1).getText().equals('9')}?=> NUMERAL* ); Note: the choice of predicate slightly matter in this case. It influences what kind of error will be thrown if the input does not match the rule.Labels and Lists Predicates can reference and use any previously defined label or label list the same way as actions can.Label Example If the first letter of the word is a, then the word must have at least two letters: labeledWord: a=LETTER ( { $a.getText().equals('a')}? LETTER+ | { !$a.getText().equals('a')}? LETTER* );Label List ExampleIf the word starts with less then 3 letters, then it must end with a number: labeledListWord: a+=LETTER+ ( { $a.size() < 3 }?=> NUMERAL | { $a.size() >= 3}?=> ); Note: the choice of predicate does matter in this case. The above example works correctly only if it uses gated {...}?=> predicate instead of disambiguating {...}? one. The NUMERAL and <nothing> are syntactically different. Disambiguating predicate would not be used for prediction e.g., it would not be hoisted. The parser would base its decision solely on the next token (is it NUMERAL?). The condition would be used as an afterwards assertion to check whether the number of letters was right. Such grammar would throw an exception on the abcd9 input, while ours would accept it.Undefined Labels Predicate can NOT reference not-yet-defined labels. The parser is generated, but the first attempt to use the rule throws null pointer exception in runtime: //this would cause null pointer exception nullPointerAtPredicate: LETTER { $a.getText().equals('a') }? a=LETTER;Labels and Hoisting Into Other Rules As the label has to be defined before being used in the predicate and predicates are copied into other rules only if they are located in the very beginning of a rule, you do not have to worry about hoisting into other rules.Access to Local Variables Antlr allows you to define custom local variables and use them withing one rule. If you are sure that predicate will not be copied into other rules, it can use them. Of course, using local variables if the predicate can be copied into other rules will result in faulty parser. Create local variables and use them in the predicate. If the word starts with less then 10 letters, then it must end with a number localVariables @init {int num=0;} //define local variable num : (LETTER { num++; })* //raise the num variable by 1 for each letter ( // what should follow depends on the variable value { num < 10 }?=> NUMERAL | { num >= 10}?=> ); Note: The same warning as before applies. We must use gated predicates. You must be especially careful not to use local variables in potentialy hoisted predicates. For example, Antlr Reference book recommends following rule to match only numbers composed of less then four numerals (ANTLRReference3Error.g): localVariablesWarning @init {int n=1;} // n becomes a local variable : ( {n<=4}?=> NUMERAL {n++;} )+ // enter loop only if n<=4 ; The above rule works well in isolation, that is when it is not used in other rules. Unfortunately, if you include it into other rules, the predicate may be hoisted into that other rule (ANTLRReference3Error.g): // syntax error in generated parser syntaxError: localVariablesWarning | LETTER; The n<=4 predicate will be copied into the syntaxError rule. The variable n is not accessible inside that rule and generated parser will be syntactically incorrect.Solving Initial Use Cases Finally, we are going to solve both use cases described in the motivational chapter.Keywords – nth Link to original use case. We created function isInsideNth that returns true only if previous token matched name of some nth pseudo class. The function is used as condition inside gated predicate. Generated parser will assume that input contains nth expression if and only if it is inside nth pseudo class. UseCasesNth.g file: @parser::members { private static Set<String> NTH_PSEUDOCLASSES = new HashSet<String>(); static { NTH_PSEUDOCLASSES.add('nth-child'); NTH_PSEUDOCLASSES.add('nth-last-child'); NTH_PSEUDOCLASSES.add('nth-of-type'); NTH_PSEUDOCLASSES.add('nth-last-of-type'); }public boolean isInsideNth() { return isNthPseudoClass(input.LT(-1)); }private boolean isNthPseudoClass(Token a) { if (a == null) return false; String text = a.getText(); if (text == null) return false; return NTH_PSEUDOCLASSES.contains(text.toLowerCase()); }}LPAREN: '('; RPAREN: ')'; COLON: ':'; COMMA: ','; IDENT : ('a'..'z' | 'A'..'Z')+;//pseudoparameters and nth with dummy syntax pseudoparameters: IDENT (COMMA IDENT)*; nth: IDENT; //real nth syntax ommited for simplicity sake// pseudoclass pseudo : COLON COLON? IDENT (( { isInsideNth()}?=> LPAREN nth RPAREN | LPAREN pseudoparameters RPAREN )?) ; An alternative solution with labels and rewrite rule: //different solution - note that we need to use rewrite rules in this case pseudoDifferentSolution : COLON COLON? name=IDENT (( { isNthPseudoClass($name)}?=> LPAREN nthParameters=nth RPAREN | LPAREN parameters=pseudoparameters RPAREN )?) -> $name $nthParameters? $parameters? ;Significant Whitespaces Link to original use case. Css selectors can be composed of multiple parts separated by combinators >, +, ~ and <space>. Each part called simple selector starts with an optional element name and may be followed by multiple pseudo classes, attributes and similar structures. Ignoring the space as combinator problem, simplified simple selector grammar can look like this: COLON: ':'; STAR: '*'; NUMBER: ('0'..'9')+; IDENT : ('a'..'z' | 'A'..'Z')+;//some options have been removed from following rules for simplicity sake elementName: IDENT | STAR | NUMBER; pseudoClass: COLON COLON? IDENT; elementSubsequent: pseudoClass;simpleSelectorWrong: (elementName elementSubsequent*) | elementSubsequent+ ; The above simpleSelectorWrong rule matches valid simple selectors: h1, h1:first-child:hover, :first-child:hover and :hover. Unfortunately, as whitespaces are thrown away, the above rule matches more then that. For example, it would match also h1:first-child :hover which should be interpreted exactly the same way as h1:first-child *:hover selector e.g., as two simple selectors joined by <space>. We created method that returns true only if there is no hidden token between previous and next tokens. Unless configured otherwise, all tokens are instance of CommonToken class. Since common token knows its start and stop index, we can cast and compare them to see whether there was something between them. New parser methods (UseCasesSelectors.g): @parser::members { public boolean onEmptyCombinator(TokenStream input) { return !directlyFollows(input.LT(-1), input.LT(1)); }private boolean directlyFollows(Token first, Token second) { CommonToken firstT = (CommonToken) first; CommonToken secondT = (CommonToken) second;if (firstT.getStopIndex() + 1 != secondT.getStartIndex()) return false;return true; } } Fixed simple selector uses gated predicate to check whether it should or should not continue adding subsequent elements (UseCasesSelectors.g): simpleSelector: ( elementName ({!onEmptyCombinator(input)}?=>elementSubsequent)* ) | ( elementSubsequent ({!onEmptyCombinator(input)}?=>elementSubsequent)* ); We have to use gated predicates in this case. If we would use disambiguating predicate, generated parser would not use our predicate to decide whether to stay inside the loop or not. It is because the loop is technically deciding between elementSubsequent and <nothing> alternatives and those are syntactically different. The {...}? predicate would not be used during the prediction, it would just occasionally throw exceptions.Wrapping It Up Semantic predicates are java conditions written inside the grammar. They are copied into generated parser as they are, without any changes. If token matching algorithm reaches semantic predicate and that predicate fails, FailedPredicateException exception is thrown. If a rule or an alternative starts with semantic predicate, that semantic predicate can be used during prediction phase. Failed predicates during the prediction phase never throw exceptions, but they may disable some alternatives. This is called hoisting. Conditions must be without side effects, repeatable and their evaluation order should not matter. If they are hoisted into other rules, then they can not reference local variables or parameters. Semantic predicates are used in three different ways: as validating semantic predicates, as disambiguating semantic predicates and as gated semantic predicates.Validating Semantic Predicates Validating semantic predicates act as assertions only. As a result, validating semantic predicates are never hoisted. Condition is closed inside curly braces followed by question mark { condition }?. It must be placed either in the middle of a rule or in the middle of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; word: LETTER {1+2==3}? LETTER;Disambiguating Semantic Predicates Disambiguating semantic predicates help to choose between syntactically equivalent alternatives. As a result, disambiguating semantic predicates are hoisted only if the parser has to choose between multiple ambiguous alternatives. Disambiguating semantic predicates use exactly the same syntax as validating predicates. Condition is closed inside curly braces followed by question mark { condition }?. However, they must be placed either in the beginning of a rule or in the beginning of an alternative: LETTER : 'a'..'z' | 'A'..'Z'; // beginning of an alternative alternatives: LETTER ( {2+3==5}? LETTER* | {2+3==5}? LETTER+ );Gated Semantic Predicates Gated semantic predicates are used to dynamically turn on and off portions of grammar. As a result, all gated predicates placed in the beginning of a rule or an alternative are hoisted. Gated predicates placed in the middle of a rule or an alternative are never hoisted. Condition is closed inside curly braces followed by question mark and double arrow { condition }?=>: NUMERAL: '0'..'9'; number: {2+3==5}?=> NUMERAL NUMERAL;ResourcesWincent Wiki Stack Overflow Question The Definitive ANTLR Reference  Reference: ANTLR – Semantic Predicates from our JCG partner Maria Jurcovicova at the This is Stuff blog. ...

Checking out what is new with Servlet 3.0

With the JEE6 specification hitting the market, some major changes have taken place with respect to how you would approach developing applications in the enterprise application world. In this article i would be touching upon a few changes that were done with respect to web application development.                 First things first, say good bye to the web.xml deployment descriptor (at least for parts of it). Well its not like it is deprecated, but with the rise of the usage of annotations and their usage, the new specification allows us to define our configuration using annotations, though some thing such as welcome file lists, context params etc will still need to go inside your web.xml . Annotations available for use are;@WebServlet @WebFilter @WebInitParam @WebListener @MultipartConfigIn this article i would be checking out the @WebServlet and @WebFilter annotations. Let us see how we would usually map a servlet in the web.xml era; <servlet> <servlet-name>myservlet</servlet-name> <servlet-class>com.example.MyServlet</servlet-class> </servlet><servlet-mapping> <servlet-name>myservlet</servlet-name> <url-pattern>/hello</url-pattern> </servlet-mapping> With the Servlet 3.0 spec, now configuring a Servlet is as easy as annotating a class that extends HttpServlet. Lets see how that looks like; @WebServlet('/student') public class StudentServlet extends HttpServlet{/** * */ private static final long serialVersionUID = 2276157893425171437L;@Override protected void doPost(HttpServletRequest arg0, HttpServletResponse arg1) throws ServletException, IOException { StringBuilder response = new StringBuilder(500); response.append('<html><body>').append('Registered Student : ').append(arg0.getParameter('txtName')).append('</body></html>'); arg1.getOutputStream().write(response.toString().getBytes()); arg1.getOutputStream().flush(); arg1.getOutputStream().close(); } } All you need is the @WebServlet annotation. In order for this to work, the class should reside either in the WEB-INF/classes folder or within a jar residing in the WEB-INF/lib folder. Next up lets see how we would configure a filter with annotations. package com.blog.example.servlettest;import java.io.IOException;import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.annotation.WebFilter;@WebFilter('/student') public class StudentFilter implements Filter{@Override public void destroy() { }@Override public void doFilter(ServletRequest arg0, ServletResponse arg1, FilterChain arg2) throws IOException, ServletException {if(arg0.getParameter('txtName')==null || arg0.getParameter('txtName').isEmpty()) { arg1.getWriter().append('Invalid name supplied'); arg1.getWriter().flush(); arg1.getWriter().close(); } else { arg2.doFilter(arg0, arg1); } }@Override public void init(FilterConfig arg0) throws ServletException { // TODO Auto-generated method stub}} Again very easy. Just a mere annotation to notify it as a filter. Note that here we implement the Filter interface. The value or the urlPatterns should be available. Using both is illegal as per the specification. In the coming weeks i will cover the other new annotations available with JEE6 and wrap up with a comprehensive example using them together. If JEE6 will replace Spring framework or not is not a question by itself, but i believe we would be seeing some fierce competition between the two. The annotations vs xml debate is more or less resolved with people with preference for each holding their own grounds. I believe a little bit from both worlds would be beneficial for an application. You can download and run a sample example which i have uploaded here. If you are using JBoss-AS7 all you need to do is run the application server on standalone mode and do a mvn package jboss-as:deploy and point the browser to http://localhost:{port}/servlet3.0. That is it for today. Thank you for reading and if you have any comments or suggestions for improvement, please do leave by a comment. Have a good day all!!   Reference: Checking out what is new with Servlet 3.0 from our JCG partner Dinuka Arseculeratne at the My Journey Through IT blog. ...

Functional Java collections

There is a lot of functional hype these days so I would give a short overview on what is out there at least when it comes to collections in Java. Personally I like standard collections API but i some cases can be awkward and add additional verboseness. This should not be a problem in latter version of Java 8+. There we would probably worry about not creating callback hell but hey there is no silver bullet for most stuff why should there be one for programming?             The Guava Way Guava project is one of Google’s core libraries where there are plenty of different core language aspects and problems covered. There are utilities and extensions for everyday usage like : collections, primitives, caching, common annotations, string processing, I/O, Math, Reflections and many others. We will only take a look at the Collections goodies so lets see some of them : // list ImmutableList<String> of = ImmutableList.of("a", "b", "c", "d"); // Same one for map ImmutableMap<String, String> map = ImmutableMap.of("key1", "value1", "key2", "value2"); //list of ints List<Integer> theList = Ints.asList(1, 2, 3, 4, 522, 5, 6);The Guava Collections are compatible with the JDK collections since they mostly extend or implement the standard classes. There are several cool utilities that are part of the API and have similar names with the ones from java.util.Collections. Basically any programmer who knows the JDK collections should be able to transfer to Guava easily. The ones for List is called Lists, one for Set is Sets, for Map is Maps and so on for the rest. For example: //create new List List<someLongName> list = Lists.newArrayList(); //create new LinkedHashMap Map<someKeyType, SomeValueType> map = Maps.newLinkedHashMap();//initalize Array List on the spot List<String> someList = Lists.newArrayList("one", "two", "three");//set inital size for readability as well as performance List<Type> exactly100 = Lists.newArrayListWithCapacity(100); List<Type> approx100 = Lists.newArrayListWithExpectedSize(100);Methods corresponding to a particular interface are grouped in a very intuitive manner. There are also some extremely good ways of building cache with various of features : Cache<Integer, Customer> cache = CacheBuilder.newBuilder() .weakKeys() .maximumSize(10000) .expireAfterWrite(10, TimeUnit.MINUTES) .build(new CacheLoader<Integer, Customer>() {@Override public Customer load(Integer key) throws Exception {return retreveCustomerForKey(key); } }); Since Guava is available in most of the maven repositories its very easy to add it to your build LAMBDAJ The idea behind the project is to manipulate collections in a functional and statically typed way. This is achieved in a way the avoids repetitions of simple tasks we usually do with collections. Repetition makes programmers to go with copy/pasting and creates makes them create bugs by doing so. Accessing collections without explicit looping provides way of filtering, sorting, extraction, grouping, transforming, invoking a method on each item or sum up the elements or fields of those element in a collections. Additionally to all these features lambdaj is also a DSL in a way since it adds very cool ‘sugar’ features to the syntax making it more readable in pseudo-english. This is done with static methods so in order to use them we include them directly: import static ch.lambdaj.Lambda.*; As it comes to checking and matching lambdaj relies deeply on Hamcrestmatchers. So for example to create a check for an odd integers and then filter a list with that check: Matcher<Integer> odd = new Predicate<Integer>() { public boolean apply(Integer item) { return item % 2 == 1; } }; List<Integer> oddNumbers = filter(odd, asList(1, 2, 3, 4, 5)); and as expected the list will return the list [1,3,5]. Lambdaj take a step further with it’s DSL, for example : List<Beneficiary> beneficiaries = with(transactions) .retain(having(on(Transaction.class) .getQunatity(), lessThan(100))) .extract(on(Transaction.class).getBeneficiary()) .sort(on(Beneficiary.class).getName()); Performance costs Although the best way to make your application fast is to have the cleanest code as possible there comes a time when you must optimize.In order to do that there is some info provided by the creators on the memory usage and time. Lambdaj has a performance wiki page with code examples. There are also some test done by other programmers where they compare lambdaj with JDK8 for example. There are also some measurements on memory usage of Guava. As for performance of Guava most of it’s functionality is standard JDK classes builders and utilities so the overhead is minimal. At the end of the day it’s up to you to decide how much effect each of these libraries will have on your project and if that is positive. I’m for the idea that almost every project must have Guava on it’s classpath. Related links summaryGuavahttp://code.google.com/p/guava-libraries/ lambdaj http://code.google.com/p/lambdaj/ Hamcrest http://hamcrest.org/ Guava Links http://www.tfnico.com/presentations/google-guava Guava examples https://github.com/mitemitreski/guava-examples Guava presentation http://blog.mitemitreski.com/2012/07/google-guava-for-cleaner-code.html  Reference: Functional Java collections from our JCG partner Mite Mitresky at the Java Advent Calendar blog. ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: