What's New Here?


Oracle Drops Collection Literals in JDK 8

In a posting on the OpenJDK JEP 186 Oracle’s Brian Goetz informs that Oracle will not be pursuing collection literals as a language feature in JDK8. A collection literal is a syntactic expression form that evaluates to an aggregate type as an array, List or Map. Project Coin proposed collection literals, which also complements the library additions in Java SE8. The assumption was that collection literals would increase productivity, code readability, and code safety. As an alternative Oracle suggests a library-based proposal based on the concept of static methods on interfaces. The Implementation would ideally be via new dedicated immutable classes. Following are the major points behind this library-based approach.The basic solution of this feature works only for Sets, Lists and Maps so it is not very satisfying or popular. The advanced solution to cover an extensible set of other collection types is open-ended, messy, and virtually guaranteed to way overrun its design budget. The library-based changes would remove much of the requirement for the “collection literals” change discussed in Project Coin. The library-based approach gives X% of the benefit for 1% of the cost, where X >> 1. The value types are coming and the behavior of this new feature (collection literals) with the value types is not known. It is better not to try collection literal before the value types. It is better off focusing Oracle’s language-design bandwidth on addressing foundational issues underlying a library-based version. This includes more efficient varargs, array constants in the constant pool, immutable arrays, and support for caching (and reclaiming under pressure) intermediate immutable results.According to Oracle’s Brian Goetz, the real pain is in Maps not Lists, Sets or Arrays. The library-based solutions are more acceptable for Lists, Sets and Arrays. But this approach still lacks a reasonable way to describe pair literals as Maps. The Static methods in an interface make the library-based solution more practical. The value types make library-based solutions for Map far more practical too. The proof of concept patch for the library-based solution is also available.Reference: Oracle Drops Collection Literals in JDK 8 from our JCG partner Kaushik Pal at the TechAlpine – The Technology world blog....

Testing Lucene’s index durability after crash or power loss

One of Lucene’s useful transactional features is index durability which ensures that, once you successfully call IndexWriter.commit, even if the OS or JVM crashes or power is lost, or you kill -KILL your JVM process, after rebooting, the index will be intact (not corrupt) and will reflect the last successful commit before the crash. Of course, this only works if your hardware is healthy and your IO devices implement fsync properly (flush their write caches when asked by the OS). If you have data-loss issues, such as a silent bit-flipper in your memory, IO or CPU paths, thanks to the new end-to-end checksum feature ( LUCENE-2446), available as of Lucene 4.8.0, Lucene will now detect that as well during indexing or CheckIndex. This is similar to the ZFS file system‘s block-level checksums, but not everyone uses ZFS yet (heh), and so Lucene now does its own checksum verification on top of the file system. Be sure to enable checksum verification during merge by calling IndexWriterConfig.setCheckIntegrityAtMerge. In the future we’d like to remove that option and always validate checksums on merge, and we’ve already done so for the default stored fields format in LUCENE-5580 and (soon) term vectors format in LUCENE-5602, as well as set up the low-level IO APIs so other codec components can do so as well, with LUCENE-5583, for Lucene 4.8.0. FileDescriptor.sync and fsync Under the hood, when you call IndexWriter.commit, Lucene gathers up all newly written filenames since the last commit, and invokes FileDescriptor.sync on each one to ensure all changes are moved to stable storage. At its heart, fsync is a complex operation, as the OS must flush any dirty pages associated with the specified file from its IO buffer cache, work with the underlying IO device(s) to ensure their write caches are also flushed, and also work with the file system to ensure its integrity is preserved. You can separately fsync the bytes or metadata for a file, and also the directory(ies) containing the file. This blog post is a good description of the challenges. Recently we’ve been scrutinizing these parts of Lucene, and all this attention has uncovered some exciting issues! In LUCENE-5570, to be fixed in Lucene 4.7.2, we discovered that the fsync implementation in our FSDirectory implementations is able to bring new 0-byte files into existence. This normally isn’t a problem by itself, because IndexWriter shouldn’t fsync a file that it didn’t create. However, it exacerbates debugging when there is a bug in IndexWriter or in the application using Lucene (e.g., directly deleting index files that it shouldn’t). In these cases it’s confusing to discover these 0-byte files so much later, versus hitting a FileNotFoundException at the point when IndexWriter tried to fsync them. In LUCENE-5588, to be fixed in Lucene 4.8.0, we realized we must also fsync the directory holding the index, otherwise it’s possible on an OS crash or power loss that the directory won’t link to the newly created files or that you won’t be able to find your file by its name. This is clearly important because Lucene lists the directory to locate all the commit points ( segments_N files), and of course also opens files by their names.Since Lucene does not rely on file metadata like access time and modify time, it is tempting to use fdatasync (or FileChannel.force(false) from java) to fsync just the file’s bytes. However, this is an optimization and at this point we’re focusing on bugs. Furthermore, it’s likely this won’t be any faster since the metadata must still be sync’d by fdatasync if the file length has changed, which is always the case in Lucene since we only append to files when writing (we removed Indexoutput.seek in LUCENE-4399). In LUCENE-5574, to be fixed as of Lucene 4.7.2, we found that a near-real-time reader, on closing, could delete files even if the writer it was opened from has been closed. This is normally not a problem by itself, because Lucene is write-once (never writes to the same file name more than once), as long as you use Lucene’s APIs and don’t modify the index files yourself. However, if you implement your own index replication by copying files into the index, and if you don’t first close your near-real-time readers, then it is possible closing them would remove the files you had just copied. During any given indexing session, Lucene writes many files and closes them, many files are deleted after being merged, etc., and only later, when the application finally calls IndexWriter.commit, will IndexWriter then re-open the newly created files in order to obtain a FileDescriptor so we can fsync them. This approach (closing the original file, and then opening it again later in order to sync), versus never closing the original file and syncing that same file handle you used for writing, is perhaps risky: the javadocs for FileDescriptor.sync are somewhat vague as to whether this approach is safe. However, when we check the documentation for fsync on Unix/Posix and FlushFileBuffers on Windows, they make it clear that this practice is fine, in that the open file descriptor is really only necessary to identify which file’s buffers need to be sync’d. It’s also hard to imagine an OS that would separately track which open file descriptors had made which changes to the file. Nevertheless, out of paranoia or an abundance of caution, we are also exploring a possible patch on LUCENE-3237 to fsync only the originally opened files. Testing that fsync really works With all these complex layers in between your application’s call to IndexWriter.commit and the laws of physics ensuring little magnets were flipped or a few electrons were moved into a tiny floating gate in a NAND cell, how can we reliably test that the whole series of abstractions is actually working? In Lucene’s randomized testing framework we have a nice evil Directory implementation called MockDirectoryWrapper. It can do all sorts of nasty things like throw random exceptions, sometimes slow down opening, closing and writing of some files, refuse to delete still-open files (like Windows), refuse to close when there are still open files, etc. This has helped us find all sorts of fun bugs over time. Another thing it does on close is to simulate an OS crash or power loss by randomly corrupting any un-sycn’d files and then confirming the index is not corrupt. This is useful for catching Lucene bugs where we are failing to call fsync when we should, but it won’t catch bugs in our implementation of sync in our FSDirectory classes, such as the frustrating LUCENE-3418 (first appeared in Lucene 3.1 and finally fixed in Lucene 3.4).So, to catch such bugs, I’ve created a basic test setup, making use of a simple Insteon on/off device, along with custom Python bindings I created long ago to interact with Insteon devices. I already use these devices all over my home for controlling lights and appliances, so also using this for Lucene is a nice intersection of two of my passions! The script loops forever, first updating the sources, compiling, checking the index for corruption, then kicking off an indexing run with some randomization in the settings, and finally, waiting a few minutes and then cutting power to the box. Then, it restores power, waits for the machine to be responsive again, and starts again. So far it’s done 80 power cycles and no corruption yet. Good news! To “test the tester”, I tried temporarily changing fsync to do nothing, and indeed after a couple iterations, the index became corrupt. So indeed the test setup seems to “work”. Currently the test uses Linux on a spinning magnets hard drive with the ext4 file system. This is just a start, but it’s better than no proper testing for Lucene’s fsync. Over time I hope to test different combinations of OS’s, file systems, IO hardware, etc.Reference: Testing Lucene’s index durability after crash or power loss from our JCG partner Michael Mc Candless at the Changing Bits blog....

Attempt to map WCF to Java terms

By writing this post I’m taking a huge risk of being rejected by both .NET and Java communities. This is an attempt to explain what WCF, which stands for Windows Communication Foundation, is in Java terms. WCF-to-Java mapping is not really trivial. I’m lacking understanding to what extend WFC consumer should be aware about the type of communication with the service: request/response or asynchronous messaging. I have difficulties imagining this is completely transparent for the consumer… unless WCF framework “removes” asynchronousity of messaging and takes care of waiting for a response message(s). If the latest happens, then there is actually no asynchronous messaging! As usual with Java (and I was truly missing it working with .NET), there are Specifications of technologies and there are various Implementations of these specifications. Although normally the applications are being tested with and therefore claim to support explicit Implementations of used Specifications, in theory the final selection of those is done during deployment or just before the application starts. Whenever we talk about a service, we have the actual service and its consumers. Let’s start with consumers. For sending asynchronous messages they’d better be written against JMS – Specification for Java Messaging System. Consumers of JMS just need to know logical name of the target queue or topic. For request/response communication consumers should be written against a plain Interface of service. This Interface is agnostic to the technologies used on the service side and in the transportation layer. To obtain an explicit implementation of the Interface at run-time the consumer uses an externally configurable Factory. This factory will use something like JAX-WS for Web Services, JAX-RS for RESTful services, RMI for remote EJBs (Enterprise Java Beans) or plain object (POJO) for in-process services. Are you still here? Then let’s move to the service side. If the service consumes messages, it can be implemented using JMS directly or as Message-Driven Bean (flavor of EJB). The last option provides you with all that transactivity and scalability from Application Server (something like IIS). If the service should provide responses (including failures), the golden rule is to let them implement a plain Interface – the one, which will be used by the service consumer. Then either by adding annotations to the Interface Implementation code or by using external configuration in Application Server your implementation becomes accessible as Web Service or Session EJB. Actually nowadays most of the Servers are capable of exposing Session EJBs as Web Services. If you use Proxy pattern, you also have a clean, unspoiled implementation of the Interface, which can be used by in-process consumers. This is a very lengthy explanation. A shorter translation of “All cross-layer entities are WCF services” would be: “All entities are defined by their Interfaces and written against Interfaces of other entities. Implementations of the entities are Plain Old Java Objects (POJOs), possibly wrapped by EJB Proxies“Reference: Attempt to map WCF to Java terms from our JCG partner Viktor Sadovnikov at the jv-ration blog....

MongoDB 2.6 is out

Introduction MongoDB is evolving rapidly. The 2.2 version introduced the aggregation framework as an alternative to the Map-Reduce query model. Generating aggregated reports is a recurrent requirement for enterprise systems and MongoDB shines in this regard. If you’re new to it you might want to check this aggregation framework introduction or the performance tuning and the data modelling guides. Let’s reuse the data model I first introduced while demonstrating the blazing fast MongoDB insert capabilities:   { "_id" : ObjectId("5298a5a03b3f4220588fe57c"), "created_on" : ISODate("2012-04-22T01:09:53Z"), "value" : 0.1647851116706831 } MongoDB 2.6 Aggregation enhancements In the 2.4 version, if I run the following aggregation query: db.randomData.aggregate( [ { $match: { "created_on" : { $gte : new Date(Date.UTC(2012, 0, 1)), $lte : new Date(Date.UTC(2012, 0, 10)) } } }, { $group: { _id : { "minute" : { $minute : "$created_on" } }, "values": { $addToSet: "$value" } } }]); I hit the 16MB aggregation result limitation: { "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)", "code" : 16389, "ok" : 0 } MongoDB documents are limited to 16MB, and prior to the 2.6 version, the aggregation result was a BSON document. The 2.6 version replaced it with a cursor instead. Running the same query on 2.6 yields the following result: db.randomData.aggregate( [ { $match: { "created_on" : { $gte : new Date(Date.UTC(2012, 0, 1)), $lte : new Date(Date.UTC(2012, 0, 10)) } } }, { $group: { _id : { "minute" : { $minute : "$created_on" } }, "values": { $addToSet: "$value" } } }]) .objsLeftInBatch(); 14 I used the cursor-based objsLeftInBatch method to test the aggregation result type and the 16MB limitation no longer applies to the overall result. The cursor inner results are regular BSON documents, hence they are still limited to 16MB, but this is way more manageable than the previous overall result limit. The 2.6 version also addresses the aggregation memory restrictions. A full collection scan such as: db.randomData.aggregate( [ { $group: { _id : { "minute" : { $minute : "$created_on" } }, "values": { $addToSet: "$value" } } }]) .objsLeftInBatch(); can end up with the following error: { "errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.", "code" : 16945, "ok" : 0 } So, we can now perform large sort operations using the allowDiskUse parameter: db.randomData.aggregate( [ { $group: { _id : { "minute" : { $minute : "$created_on" } }, "values": { $addToSet: "$value" } } }] , { allowDiskUse : true }) .objsLeftInBatch(); The 2.6 version allows us to save the aggregation result to a different collection using the newly added $out stage. db.randomData.aggregate( [ { $match: { "created_on" : { $gte : new Date(Date.UTC(2012, 0, 1)), $lte : new Date(Date.UTC(2012, 0, 10)) } } }, { $group: { _id : { "minute" : { $minute : "$created_on" } }, "values": { $addToSet: "$value" } } }, { $out : "randomAggregates" } ]); db.randomAggregates.count(); 60 New operators have been added such as let, map, cond, to name a few. The next example will append AM or PM to the time info of each specific event entry. var dataSet = db.randomData.aggregate( [ { $match: { "created_on" : { $gte : new Date(Date.UTC(2012, 0, 1)), $lte : new Date(Date.UTC(2012, 0, 2)) } } }, { $project: { "clock" : { $let: { vars: { "hour": { $substr: ["$created_on", 11, -1] }, "am_pm": { $cond: { if: { $lt: [ {$hour : "$created_on" }, 12 ] } , then: 'AM',else: 'PM'} } }, in: { $concat: [ "$$hour", " ", "$$am_pm"] } } } } }, { $limit : 10 } ]); dataSet.forEach(function(document) { printjson(document); }); Resulting in: "clock" : "16:07:14 PM" "clock" : "22:14:42 PM" "clock" : "21:46:12 PM" "clock" : "03:35:00 AM" "clock" : "04:14:20 AM" "clock" : "03:41:39 AM" "clock" : "17:08:35 PM" "clock" : "18:44:02 PM" "clock" : "19:36:07 PM" "clock" : "07:37:55 AM" Conclusion MongoDB 2.6 version comes with a lot of other enhancements such as bulk operations or index intersection. MongoDB is constantly evolving, offering a viable alternative for document-based storage. At such a development rate, there’s no wonder it was named 2013 database of the year.Reference: MongoDB 2.6 is out from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....

Yet another way to handle exceptions in JUnit: catch-exception

There are many ways of handling exceptions in JUnit (3 ways of handling exceptions in JUnit. Which one to choose?, JUnit ExpectedException rule: beyond basics). In this post I will introduce catch-exception library that I was recommended to give a try. In short, catch-exceptions is a library that catches exceptions in a single line of code and makes them available for further analysis. Install via Maven In order to get started quickly, I used my Unit Testing Demo project with a set of test dependencies (JUnit, Mocito, Hamcrest, AssertJ) and added catch-exceptions: <dependency> <groupId>com.googlecode.catch-exception</groupId> <artifactId>catch-exception</artifactId> <version>1.2.0</version> <scope>test</scope> </dependency> So the dependency tree looks as follows: [INFO] --- maven-dependency-plugin:2.1:tree @ unit-testing-demo --- [INFO] com.github.kolorobot:unit-testing-demo:jar:1.0.0-SNAPSHOT [INFO] +- org.slf4j:slf4j-api:jar:1.5.10:compile [INFO] +- org.slf4j:jcl-over-slf4j:jar:1.5.10:runtime [INFO] +- org.slf4j:slf4j-log4j12:jar:1.5.10:runtime [INFO] +- log4j:log4j:jar:1.2.15:runtime [INFO] +- junit:junit:jar:4.11:test [INFO] +- org.mockito:mockito-core:jar:1.9.5:test [INFO] +- org.assertj:assertj-core:jar:1.5.0:test [INFO] +- org.hamcrest:hamcrest-core:jar:1.3:test [INFO] +- org.hamcrest:hamcrest-library:jar:1.3:test [INFO] +- org.objenesis:objenesis:jar:1.3:test [INFO] \- com.googlecode.catch-exception:catch-exception:jar:1.2.0:test Getting started System under test (SUT): class ExceptionThrower { void someMethod() { throw new RuntimeException("Runtime exception occurred"); } void someOtherMethod() { throw new RuntimeException("Runtime exception occurred", new IllegalStateException("Illegal state")); } void yetAnotherMethod(int code) { throw new CustomException(code); } } The basic catch-exception BDD-style approach example with AssertJ assertions: import org.junit.Test; import static com.googlecode.catchexception.CatchException.*; import static com.googlecode.catchexception.apis.CatchExceptionAssertJ.*; public class CatchExceptionsTest { @Test public void verifiesTypeAndMessage() { when(new SomeClass()).someMethod(); then(caughtException()) .isInstanceOf(RuntimeException.class) .hasMessage("Runtime exception occurred") .hasMessageStartingWith("Runtime") .hasMessageEndingWith("occured") .hasMessageContaining("exception") .hasNoCause(); } } Looks good. Concise, readable. No JUnit runners. Please note, that I specified which method of SomeClass I expect to throw an exception. As you can imagine, I can check multiple exceptions in one test. Although I would not recommend this approach as it may feel like violating a single responsibility of a test. By the way, if you are working with Eclipse this may be handy for you: Improve content assist for types with static members while creating JUnit tests in Eclipse Verify the cause I think there is no comment needed for the below code: import org.junit.Test; import static com.googlecode.catchexception.CatchException.*; import static com.googlecode.catchexception.apis.CatchExceptionAssertJ.*; public class CatchExceptionsTest { @Test public void verifiesCauseType() { when(new ExceptionThrower()).someOtherMethod(); then(caughtException()) .isInstanceOf(RuntimeException.class) .hasMessage("Runtime exception occurred") .hasCauseExactlyInstanceOf(IllegalStateException.class) .hasRootCauseExactlyInstanceOf(IllegalStateException.class); } } Verify custom exception with Hamcrest To verify a custom exception I used the Hamcrest matcher code from my previous post: class CustomException extends RuntimeException { private final int code; public CustomException(int code) { this.code = code; } public int getCode() { return code; } } class ExceptionCodeMatches extends TypeSafeMatcher<CustomException> { private int expectedCode; public ExceptionCodeMatches(int expectedCode) { this.expectedCode = expectedCode; } @Override protected boolean matchesSafely(CustomException item) { return item.getCode() == expectedCode; } @Override public void describeTo(Description description) { description.appendText("expects code ") .appendValue(expectedCode); } @Override protected void describeMismatchSafely(CustomException item, Description mismatchDescription) { mismatchDescription.appendText("was ") .appendValue(item.getCode()); } } And the test: import org.junit.Test; import static com.googlecode.catchexception.CatchException.*; import static org.junit.Assert.*; public class CatchExceptionsTest { @Test public void verifiesCustomException() { catchException(new ExceptionThrower(), CustomException.class).yetAnotherMethod(500); assertThat((CustomException) caughtException(), new ExceptionCodeMatcher(500)); } } Summary catch-exception looks really good. It is easy to get started quickly. I see some advantages over method rule in JUnit. If I have a chance, I will investigate the library more thoroughly, hopefully in a real-world project.The source code of this article can be found here: Unit Testing DemoIn case you are interested, please have a look at my other posts:3 ways of handling exceptions in JUnit. Which one to choose? JUnit ExpectedException rule: beyond basics HOW-TO: Test dependencies in a Maven project (JUnit, Mocito, Hamcrest, AssertJ) Improve content assist for types with static members while creating JUnit tests in EclipseReference: Yet another way to handle exceptions in JUnit: catch-exception from our JCG partner Rafal Borowiec at the Codeleak.pl blog....

Java Rocks More Than Ever

On the TIOBE index, Java and C have been sharing the #1 and #2 rank for a long time now, and with the recent GA release of the JDK 8, things are not going to get any worse for our community. Java simply rocks! And it’s the best platform to build almost any of your applications, out there. But why does Java rock so much? Is it the JVM? Is it the backwards-compatibility? Is it the easy syntax? Or the millions of free and commercial software available to build your software? All of this and much more.   The Top 10 Reasons why Java Rocks More Than Ever ZeroTurnaround’s RebelLabs often publish awesome blog posts, which we can only recommend. In this case, we’ve discovered a very well-written series of blog posts explaining why Java is so great in 10 steps, by ZeroTurnaround’s Geert Bevin. The articles include:Part 1: The Java CompilerThe compiler is one of the things we take for granted in any language, without thinking about its great features. In Java, unlike C++, you can simply compile your code without thinking too much about linking, optimisation and all sorts of other usual compiler features. This is partially due to the JIT (Just In Time compiler), which does further compilation work at runtime. Read the full article herePart 2: The Core APIThe JDK’s core API consists of a very solid, stable and well-understood set of libraries. While many people complain about the lack of functionality in this area (resorting to Google Guava or Apache Commons), people often forget that the core API is still the one that is underneath all those extensions. Again, from a C++ perspective, this is a truly luxurious situation. Read the full article herePart 3: Open SourceIn this section, ZeroTurnaround’s Geert Bevin‘s mind-set aligns well with our own at Data Geekery when it comes to the spirit of Open Source – no matter whether this is about free-as-in-freedom, or free-as-in-beer, the point is that so many things about Java are “open”. We’re all in this together. Read the full article herePart 4: The Java Memory ModelAgain, a very interesting point of view from someone with a solid C++ background. We’re taking many things for granted as Java has had a very good threading and memory model from the beginning, which was corrected only once in the JDK 1.5 in 2004, and which has built a solid grounds for newer API like actor-based ones, Fork/JOIN, etc. Read the full article herePart 5: High-Performance JVMThe JVM is the most obvious thing to talk about it has allowed for so many languages to work on so many hardware environments, and it runs so fast, nowadays! Read the full article herePart 6: Bytecode… and the JVM also rocks because of bytecode, of course. Bytecode is a vendor-independent abstraction of machine code, which is very predictable and can be generated, manipulated, and transformed by various technologies. We’ve recently had a guest post by Dr. Ming-Yee Iu who has shown how bytecode transformations can be used to emulate LINQ in Java. Let’s hear it for bytecode! Read the full article herePart 7: Intelligent IDEs15 years ago, developing software worked quite differently. People can write assembler or C programs with vi or Notepad. But when you’re writing a very complex enterprise-scale Java program, you wouldn’t want to miss IDEs, nowadays. We’ve blogged about various reasons why SQLJ has died. The lack of proper IDE support was one of them. Read the full article herePart 8: Profiling ToolsRemember when Oracle released Java Mission Control for free developer use with the JDK 7u40? Profiling is something very very awesome. With modern profilers, you can know exactly where your bottleneck is by simply measuring every aspect of your JVM. You don’t have to guess, you can know. How powerful is that? Read the full article herePart 9: Backwards CompatibilityWhile backwards-compatibility has its drawbacks, too, it is still very impressive how long the Java language, the JVM, and the JDK have existed so far without introducing any major backwards-compatibility regressions. The only thing that comes to mind is the introduction of keywords like assert and enum. Could you imagine introducing the Java 8 Streams API, lambda expressions, default methods, generics, enums, and loads of other features without ever breaking anything? That’s just great! Read the full article herePart 10: Maturity With InnovationIn fact, this article is a summary of all the others, saying that Java has been a very well-designed and mature platform from the beginning without ever ceasing to innovate. And it’s true. With Java 8, a great next step has been published that will – again – change the way the enterprise perceives software development for good. Read the full article here Java Rocks More Than Ever It does, and it’s a great great platform with a bright future for all its community participants.Reference: Java Rocks More Than Ever from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

15 Must Read Java 8 Tutorials

Java 8 was released last month and is just chock-full of new features and behind-the-scenes optimizations. The internet has been doing quite a good job covering all these new additions – both the good and the bad. I thought it’d be good to do a round-up of what we think are some of the best tutorials out there, to help you get you quickly up-to-speed on what’s new and what you need to know. Java 8 New Features List Let’s start with the basics – the official OpenJDK list of new features in the Java 8 core library, JVM and the JDK. This is a must read. The OpenJDK 8 new features list Lambda Expressions Hailed as the biggest change to the language in the last decade, Java 8’s Lambda expressions finally deliver core elements of functional programming that have been made popular with languages such as Scala and Clojure right to your doorstep. This is really one of those cases where I suggest going with the official documentation and tutorials first - The official Java Lambda expressions tutorial An extensive Lambda expressions tutorial with examples Parallel Array Operations 2 is better than 1 (it’s kitch song time!). Java 8 now lets you operate on arrays and collections in parallel to maximize use of your hardware’s resource with a simple and intuitive new set of APIs. Check ‘em out - Parallel Array operations Parallel operations benchmark Concurrent Counters This is a personal favourite of mine. I always thought Java has done such a great job at providing powerful idioms for safe multi-threaded operations. That’s why I always felt it was such a shame it did not provide an intrinsic idiom for multi-threaded counters. I just get shivers when I think of all the bugs and man hours that could have been saved. Well, not anymore. Concurrent counters are finally here! Concurrent counters in Java 8 Date Time APIs Java 8 finally makes using date and time operations in your code simple and intuitive, on par with that we’re used to with Joda Time. The new date time API A deep look into the Java 8 date time APIs Bonus: Why Joda wasn’t used in Java 8 And before we move on to other subjects, here’s a very thorough tutorial on all the new language and library additions in Java 8. Nashorn JavaScript Nashorn in the the new JavaScript engine built into the Java 8 core library which enables you to execute JavaScript right from the comfort of your JVM, without having to set up another node.js container. How to use Nashorn in your code Using Nashorn to run CoffeeScript Garbage Collection Java 8 made significant changes to the internals of the GC engine, going as far as removing the permgen space. Java 8 GC – here’s what you need to know Java 8 Security Code security is right up there with brushing your teeth or doing your cardio. You don’t really like doing it for the most the part, but you know that bad things will happen if you don’t. Here’s a good (and concise) round-up of the new security features in Java 8. So go ahead and brush your teeth - Java 8 Security Enhancements HashMap Collisions While not really a tutorial, I thought it’d be worth mentioning that Java 8 has finally gone ahead and improved how Hashmaps operate under stress – something that’s been talked about for years now and finally got done. Good job! Fixing frequent HashMap collisionsReference: 15 Must Read Java 8 Tutorials from our JCG partner Tal Weiss at the Takipi blog....

Tracking Exceptions – Part 4 – Spring’s Mail Sender

If you’ve read any of the previous blogs in this series, you may remember that I’m developing a small but almost industrial strength application that searches log files for exceptions. You may also remember that I now have a class that can contain a whole bunch of results that will need sending to any one whose interested. This will be done by implementing my simple Publisher interface shown below.             public interface Publisher {  public <T> boolean publish(T report); } If you remember, the requirement was: 7 . Publish the report using email or some other technique. In this blog I’m dealing with the concrete part of the requirement: sending a report by email. As this is a Spring app, then the simplest way of sending an email is to use Spring’s email classes. Unlike those stalwarts of the Spring API, template classes such as JdbcTemplate and JmsTemplate, the Spring email classes are based around a couple of interfaces and their implementations. The interfaces are:MailSender JavaMailSender extends MailSender MailMessage…and the implementations are:JavaMailSenderImpl implements JavaMailSender SimpleMailMessage implements MailMessageNote that these are the ‘basic’ classes; you can send nicer looking, more sophisticated email content using classes such as: MimeMailMessage, MimeMailMessageHelper, ConfigurableMimeFileTypeMap and MimeMessagePreparator. Before getting down to some code, there’s the little matter of project configuration. To use the Spring email classes, you need the following entry in your Maven POM file: <dependency> <groupId>javax.mail</groupId> <artifactId>mail</artifactId> <version>1.4</version> </dependency> This ensures that the underlying Java Mail classes are available to your application. Once the Java Mail classes are configured in the build, the next thing to do is to set up the Spring XML config. <!-- Spring mail configuration --><bean id="mailSender" class="org.springframework.mail.javamail.JavaMailSenderImpl"> <property name="host" value="${mail.smtp.host}"/> </bean><!-- this is a template message that we can pre-load with default state --> <bean id="mailMessage" class="org.springframework.mail.SimpleMailMessage"> <property name="to" value="${mail.to}"></property> <property name="from" value="${mail.from}"/> <property name="subject" value="${mail.subject}"/> </bean> For the purposes of this app, which is sending out automated reports, I’ve included two Spring beans: mailSender and mailMessage.mailSender, is a JavaMailSenderImpl instance configured to use a specific SMTP mail server, with all other properties, such as TCP port, left as defaults. The second Spring bean is mailMessage, an instance of SimpleMailMessage. This time I’ve pre-configured three properties: ‘to’, ‘from’ and ‘subject’. This is because, being automated messages, these values are always identical. You can of course configure these programatically, something you’d probably need to do if you were creating a mail GUI. All this XML makes the implementation of the Publisher very simple. @Service public class EmailPublisher implements Publisher {  private static final Logger logger = LoggerFactory.getLogger(EmailPublisher.class);  @Autowired   private MailSender mailSender;  @Autowired   private SimpleMailMessage mailMessage;  @Override   public <T> boolean publish(T report) {    logger.debug("Sending report by email...");     boolean retVal = false;     try {       String message = (String) report;       mailMessage.setText(message);       mailSender.send(mailMessage);       retVal = true;     } catch (Exception e) {       logger.error("Can't send email... " + e.getMessage(), e);     }    return retVal;   }} The Publisher class contains one method:publish, which takes a generic argument T report. This, as I’ve said before, has to be the same type as the argument returned by the Formatter implementation from my previous blog. There are only really three steps in this code to consider: firstly, the generic T is cast to a String (this is where it’ll all fall over if the argument T report isn’t a String. The second step is to attach the email’s text to the mailMessage and then to send the message using mailSender.send(…). The final step is fulfil the Publisher contract by returning true, unless the email fails to send in which case the exception is logged and the return value is false. In terms of developing the code that’s about it. The next step is to sort out the scheduling, so that the report is generated on time, but more on that later…The code for this blog is available on Github at: https://github.com/roghughe/captaindebug/tree/master/error-track.If you want to look at other blogs in this series take a look here…Tracking Application Exceptions With Spring Tracking Exceptions With Spring – Part 2 – Delegate Pattern Error Tracking Reports – Part 3 – Strategy and Package PrivateReference: Tracking Exceptions – Part 4 – Spring’s Mail Sender from our JCG partner Roger Hughes at the Captain Debug’s Blog blog....

Hadoop MapReduce Concepts

What do you mean by Map-Reduce programming? MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. The MapReduce programming model is inspired by functional languages and targets data-intensive computations. The input data format is application-specific, and is specified by the user. The output is a set of <key,value> pairs. The user expresses an algorithm using two functions, Map and Reduce. The Map function is applied on the input data and produces a list of intermediate <key,value> pairs. The Reduce function is applied to all intermediate pairs with the same key. It typically performs some kind of merging operation and produces zero or more output pairs. Finally, the output pairs are sorted by their key value. In the simplest form of MapReduce programs, the programmer provides just the Map function. All other functionality, including the grouping of the intermediate pairs which have the same key and the final sorting, is provided by the runtime. Phases of MapReduce model The top level unit of work in MapReduce is a job. A job usually has a map and a reduce phase, though the reduce phase can be omitted. For example, consider a MapReduce job that counts the number of times each word is used across a set of documents. The map phase counts the words in each document, then the reduce phase aggregates the per-document data into word counts spanning the entire collection. During the map phase, the input data is divided into input splits for analysis by map tasks running in parallel across the Hadoop cluster. By default, the MapReduce framework gets input data from the Hadoop Distributed File System (HDFS). The reduce phase uses results from map tasks as input to a set of parallel reduce tasks. The reduce tasks consolidate the data into final results. By default, the MapReduce framework stores results in HDFS. Although the reduce phase depends on output from the map phase, map and reduce processing is not necessarily sequential. That is, reduce tasks can begin as soon as any map task completes. It is not necessary for all map tasks to complete before any reduce task can begin. MapReduce operates on key-value pairs. Conceptually, a MapReduce job takes a set of input key-value pairs and produces a set of output key-value pairs by passing the data through map and reduces functions. The map tasks produce an intermediate set of key-value pairs that the reduce tasks uses as input. The keys in the map output pairs need not be unique. Between the map processing and the reduce processing, a shuffle step sorts all map output values with the same key into a single reduce input (key, value-list) pair, where the ‘value’ is a list of all values sharing the same key. Thus, the input to a reduce task is actually a set of (key, value-list) pairs.Though each set of key-value pairs is homogeneous, the key-value pairs in each step need not have the same type. For example, the key-value pairs in the input set (KV1) can be (string, string) pairs, with the map phase producing (string, integer) pairs as intermediate results (KV2), and the reduce phase producing (integer, string) pairs for the final results (KV3). The keys in the map output pairs need not be unique. Between the map processing and the reduce processing, a shuffle step sorts all map output values with the same key into a single reduce input (key, value-list) pair, where the ‘value’ is a list of all values sharing the same key. Thus, the input to a reduce task is actually a set of (key, value-list) pairs. Example demonstrating MapReduce conceptsThe example demonstrates basic MapReduce concept by calculating the number of occurrence of each word in a set of text files. The MapReduce input data is divided into input splits, and the splits are further divided into input key-value pairs. In this example, the input data set is the two documents, document1 and document2. The InputFormat subclass divides the data set into one split per document, for a total of 2 splits:Note: The MapReduce framework divides the input data set into chunks called splits using the org.apache.hadoop.mapreduce.InputFormat subclass supplied in the job configuration. Splits are created by the local Job Client and included in the job information made available to the Job Tracker. The JobTracker creates a map task for each split. Each map task uses a RecordReader provided by the InputFormat subclass to transform the split into input key-value pairs. A (line number, text) key-value pair is generated for each line in an input document. The map function discards the line number and produces a per-line (word, count) pair for each word in the input line. The reduce phase produces (word, count) pairs representing aggregated word counts across all the input documents. Given the input data shown the map-reduce progression for the example job is:The output from the map phase contains multiple key-value pairs with the same key: The ‘oats’ and ‘eat’ keys appear twice. Recall that the MapReduce framework consolidates all values with the same key before entering the reduce phase, so the input to reduce is actually (key, values) pairs. Therefore, the full progression from map output, through reduce, to final results is shown above. MapReduce Job Life Cycle Following is the life cycle of a typical MapReduce job and the roles of the primary actors.The full life cycle are more complex so here we will concentrate on the primary components. The Hadoop configuration can be done in different ways but the basic configuration consists of the following.Single master node running Job Tracker Multiple worker nodes running Task TrackerFollowing are the life cycle components of MapReduce job.Local Job client: The local job Client prepares the job for submission and hands it off to the Job Tracker. Job Tracker: The Job Tracker schedules the job and distributes the map work among the Task Trackers for parallel processing. Task Tracker: Each Task Tracker spawns a Map Task. The Job Tracker receives progress information from the Task Trackers.Once map results are available, the Job Tracker distributes the reduce work among the Task Trackers for parallel processing. Each Task Tracker spawns a Reduce Task to perform the work. The Job Tracker receives progress information from the Task Trackers. All map tasks do not have to complete before reduce tasks begin running. Reduce tasks can begin as soon as map tasks begin completing. Thus, the map and reduce steps often overlap. Functionality of different components in MapReduce job Job Client: Job client performs the following tasksValidates the job configuration Generates the input splits. This is basically splitting the input job into chunks Copies the job resources (configuration, job JAR file, input splits) to a shared location, such as an HDFS directory, where it is accessible to the Job Tracker and Task Trackers Submits the job to the Job TrackerJob Tracker: Job Tracker performs the following tasksFetches input splits from the shared location where the Job Client placed the information Creates a map task for each split Assigns each map task to a Task Tracker (worker node)After the map task is complete, Job Tracker does the following tasksCreates reduce tasks up to the maximum enabled by the job configuration. Assigns each map result partition to a reduce task. Assigns each reduce task to a Task Tracker.Task Tracker: A Task Tracker manages the tasks of one worker node and reports status to the Job Tracker. Task Tracker does the following tasks when map or reduce task is assigned to itFetches job resources locally Spawns a child JVM on the worker node to execute the map or reduce task Reports status to the Job TrackerDebugging Map Reduce Hadoop keeps logs of important events during program execution. By default, these are stored in the logs/ subdirectory of the hadoop-version/ directory where you run Hadoop from. Log files are named hadoop-username-service-hostname.log. The most recent data is in the .log file; older logs have their date appended to them. The username in the log filename refers to the username under which Hadoop was started — this is not necessarily the same username you are using to run programs. The service name refers to which of the several Hadoop programs are writing the log; these can be jobtracker, namenode, datanode, secondarynamenode, or tasktracker. All of these are important for debugging a whole Hadoop installation. But for individual programs, the tasktracker logs will be the most relevant. Any exceptions thrown by your program will be recorded in the tasktracker logs. The log directory will also have a subdirectory called userlogs. Here there is another subdirectory for every task run. Each task records its stdout and stderr to two files in this directory. Note that on a multi-node Hadoop cluster, these logs are not centrally aggregated — you should check each TaskNode’s logs/userlogs/ directory for their output. Debugging in the distributed setting is complicated and requires logging into several machines to access log data. If possible, programs should be unit tested by running Hadoop locally. The default configuration deployed by Hadoop runs in “single instance” mode, where the entire MapReduce program is run in the same instance of Java as called JobClient.runJob(). Using a debugger like Eclipse, you can then set breakpoints inside the map() or reduce() methods to discover your bugs. Is reduce job mandatory? Some jobs can complete all their work during the map phase. SO the Job can be map only job. To stop a job after the map completes, set the number of reduce tasks to zero. Conclusion This module described the MapReduce execution platform at the heart of the Hadoop system. By using MapReduce, a high degree of parallelism can be achieved by applications. The MapReduce framework provides a high degree of fault tolerance for applications running on it by limiting the communication which can occur between nodes.Reference: Hadoop MapReduce Concepts from our JCG partner Kaushik Pal at the TechAlpine – The Technology world blog....

CSRF protection in Spring MVC, Thymeleaf, Spring Security application

Cross-Site Request Forgery (CSRF) is an attack which forces an end user to execute unwanted actions on a web application in which he/she is currently authenticated. Preventing CSRF attacks in Spring MVC / Thymeleaf application is fairly easy if you use Spring Security 3.2 and above. How to test? To test I created a application with restricted area where I can send a form. The form’s source code:     <form class="form-narrow form-horizontal" method="post" th:action="@{/message}" th:object="${messageForm}" action="http://localhost:8080/message"> <fieldset> <legend>Send a classified message</legend> <div class="form-group" th:classappend="${#fields.hasErrors('payload')}? 'has-error'"> <label for="payload" class="col-lg-2 control-label">Payload</label> <div class="col-lg-10"> <input type="text" class="form-control" id="payload" placeholder="Payload" th:field="*{payload}" name="payload"/> <span class="help-block" th:if="${#fields.hasErrors('payload')}" th:errors="*{payload}">May not be empty</span> </div> </div> <div class="form-group"> <div class="col-lg-offset-2 col-lg-10"> <button type="submit" class="btn btn-default">Send</button> </div> </div> </fieldset> </form> Knowing that the action URL was http://localhost:8080/message I created a separate page with a HTTP request referencing that URL (with all parameters): <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> </head> <body> <form action="http://localhost:8080/message" method="post"> <input type="hidden" name="payload" value="Hacked content!"/> <input type="submit" value="Hack!" /> </form> </body> </html> I logged on the application and executed the above code. Of course the server allowed me to execute the request because my application is vulnerable to CSRF attacks. To learn more about testing for CSRF visit this link: Testing for CSRF. How to secure? If you are using the XML configuration with Spring Security the CSRF protection must be enabled: <security:http auto-config="true" disable-url-rewriting="true" use-expressions="true"> <security:csrf /> <security:form-login login-page="/signin" authentication-failure-url="/signin?error=1"/> <security:logout logout-url="/logout" /> <security:remember-me services-ref="rememberMeServices" key="remember-me-key"/> <!-- Remaining configuration --> </security:http> In case of Java configruation – it is enabled by default. As of version Thymeleaf 2.1, CSRF token will be automatically added into forms with hidden input: <form class="form-narrow form-horizontal" method="post" action="/message"> <!-- Fields --> <input type="hidden" name="_csrf" value="16e9ae08-76b9-4530-b816-06819983d048" /> </form> Now, when you try to repeat the attack, you will see Access Denied error. One thing to remember, though, is that enabling CSRF protection ensures that log out requires a CSRF token. I used JavaScript to submit a hidden form: <a href="/logout" th:href="@{#}" onclick="$('#form').submit();">Logout</a> <form style="visibility: hidden" id="form" method="post" action="#" th:action="@{/logout}"></form> Summary In this short article, I showed how easily you can utilize CSRF protection whilst working with Spring MVC (3.1+), Thymeleaf (2.1+) and Spring Security (3.2+). As of Spring Security 4 CSRF will be enabled by default also when XML configuration will be used. Please note, that HTTP session is used in order to store CSRF token. But this can be easily changed. For more details, see references.I included CSRF configuration in my Spring MVC Archetype. Please check!ResourcesThymeleaf – Integration with RequestDataValueProcessor Spring Security – CSRF Attacks OWASP – Cross-Site Request Forgery (CSRF) Reference: CSRF protection in Spring MVC, Thymeleaf, Spring Security application from our JCG partner Rafal Borowiec at the Codeleak.pl blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books
Get tutored by the Geeks! JCG Academy is a fact... Join Now
Hello. Add your message here.