Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

and many more ....

Featured FREE Whitepapers

What's New Here?


Database Migrations in Java EE using Flyway

Database schema of any Java EE application evolves along with business logic. This makes database migrations an important of any Java EE application. Do you still perform them manually, along with your application deployment? Is it still a lock step process or run as two separate scripts – one for application deployment and one for database migrations? Learn how Flyway simplifies database migrations, and seamlessly integrates with your Java EE application in this webinar with Axel Fontaine (@axelfontaine). You’ll learn about:Need for database migration tool in a Java EE application Seamless integration with Java EE application lifecycle SQL scripts and Java-based migrations Getting Started guides Comparison with Liquibase And much more!A fun fact about this, and jOOQ hanginar, is that both were conceived on the wonderful cruise as part of JourneyZone. Happy to report that these are now complete! Enjoy!Reference: Database Migrations in Java EE using Flyway from our JCG partner Arun Gupta at the Miles to go 2.0 … blog....

TDD Against the Clock

A couple of weeks ago I ran a “TDD Against the Clock” session. The format is simple: working in pairs following a strict red-green-refactor TDD cycle we complete a simple kata. However we add one key constraint: the navigator starts a five minute timer. As soon as the five minutes is up:If the code compiles and the tests are green, commit! Otherwise, revert!Either way, the pairs swap roles when the timer goes off. If the driver completes a red-green-refactor cycle faster than five minutes, they can commit ahead of time but the roles still swap. The kata we chose for this session was the bowling kata. This is a nice, simple problem that I expected we could get a decent way through in each of the two 45 minute sessions. Hard Time The five minute time constraint sounds fiendish doesn’t it? How can you possibly get anything done in five minutes? Well, you can, if you tackle something small enough. This exercise is designed to force you to think in small increments of functionality. It’s amazing how little you can type in five minutes. But if you think typing speed is the barrier, you’re not thinking hard enough about the right way to tackle the problem. There comes a point in the bowling kata where you go from dealing with single frames and simple scores to spares (or strikes) for the first time. This always requires a jump because what you had before won’t suit what you need now. How to tackle this jump incrementally is part of the challenge when working within a five minute deadline. One of our group had an idea but knew it was tough to get it done in five minutes. He typed like a demon trying to force his solution in: he still ran out of time. Typing speed is not the problem (no matter how much it seems like it is). You need a better approach, you need to think more not type more. Good Behaviour After a few cycles, we found hardly anybody hit the 5 minute deadline any more. It’s fascinating how quickly everyone realised that it was better to spend a 5 minute cycle discussing than to get lost half-way through a change and end up reverting. Similarly, when you find the change you wanted to make in this cycle is too hard or too time consuming, it’s better to throw away what you have, swap pairs and refactor before you try and write the failing test again. These are all good behaviours that are useful in day-to-day life, where it’s all too easy to keep chasing down a rat hole. Learning to work in small, independent increments and making that a subconscious part of how you work will make you a better programmer. Wrong School The biggest trouble we found is that the bowling kata isn’t well suited to what I consider “normal”, outside-in TDD (London School TDD). Most of the time I use TDD as a design tool, to help me uncover the right roles and responsibilities. However, with the bowling kata the most elegant solution is the one Uncle Bob drives towards, which is just simple types with no object modelling. This is fine for an algorithm like scoring a game of bowling, which has an ultimate truth and isn’t likely to change. But in the normal day-to-day world we’re designing for flexibility and constant change. This is where a good object model of the domain makes things easier to reason about and simpler to change. This is typically where outside-in TDD will help you. A couple of the group were determined to implement an OO version of the bowling kata. It isn’t easy as it doesn’t lend itself naturally to being built incrementally towards a good object model. However, with enough stubbornness it can be done. This led to an interesting discussion of whether you can TDD algorithms and whether TDD is better suited to problems where an object model is the desired outcome. Obviously you can TDD algorithms incrementally, whether it’s worthwhile I’m not so sure.  Typically you’re implementing an algorithm because there is a set of rules to follow. Implementing each rule one at a time might help keep you focussed, but you always need to be aware of the algorithm as a whole. Using TDD to drive an OO design is different. There can be many, similarly correct object models that vary only by subtle nuances. TDD can help guide your design and choose between the nuances. While you still need to think of the overall system design, TDD done outside-in is very deliberately trying to limit the things you need to worry about at any given stage: focus on one pair of interactions at a time. This is where TDD is strongest: providing a framework for completing a large task in small, manageable increments. Even if the problem we chose wasn’t ideal, overall I found the TDD against the clock session a great way to practice the discipline of keeping your commits small, with constant refactoring, working incrementally towards a better design. How do you move a mountain? Simply move it one teaspoonful at a time.Reference: TDD Against the Clock from our JCG partner David Green at the Actively Lazy blog....

Determining File Types in Java

Programmatically determining the type of a file can be surprisingly tricky and there have been many content-based file identification approaches proposed and implemented. There are several implementations available in Java for detecting file types and most of them are largely or solely based on files’ extensions. This post looks at some of the most commonly available implementations of file type detection in Java. Several approaches to identifying file types in Java are demonstrated in this post. Each approach is briefly described, illustrated with a code listing, and then associated with output that demonstrates how different common files are typed based on extensions. Some of the approaches are configurable, but all examples shown here use “default” mappings as provided out-of-the-box unless otherwise stated. About the Examples The screen snapshots shown in this post are of each listed code snippet run against certain subject files created to test the different implementations of file type detection in Java. Before covering these approaches and demonstrating the type each approach detects, I list the files under test and what they are named and what they really are.File Name File Extension File Type Type Matches Extension Convention?actualXml.xml xml XML YesblogPostPDFPDF NoblogPost.pdf pdf PDF YesblogPost.gif gif GIF YesblogPost.jpg jpg JPEG YesblogPost.png png PNG YesblogPostPDF.txt txt PDF NoblogPostPDF.xml xml PDF NoblogPostPNG.gif gif PNG NoblogPostPNG.jpg jpg PNG Nodustin.txt txt Text Yesdustin.xml xml Text NodustinText NoFiles.probeContentType(Path) [JDK 7] Java SE 7 introduced the highly utilitarian Files class and that class’s Javadoc succinctly describes its use: “This class consists exclusively of static methods that operate on files, directories, or other types of files” and, “in most cases, the methods defined here will delegate to the associated file system provider to perform the file operations.” The java.nio.file.Files class provides the method probeContentType(Path) that “probes the content type of a file” through use of “the installed FileTypeDetector implementations” (the Javadoc also notes that “a given invocation of the Java virtual machine maintains a system-wide list of file type detectors”). /** * Identify file type of file with provided path and name * using JDK 7's Files.probeContentType(Path). * * @param fileName Name of file whose type is desired. * @return String representing identified type of file with provided name. */ public String identifyFileTypeUsingFilesProbeContentType(final String fileName) { String fileType = "Undetermined"; final File file = new File(fileName); try { fileType = Files.probeContentType(file.toPath()); } catch (IOException ioException) { out.println( "ERROR: Unable to determine file type for " + fileName + " due to exception " + ioException); } return fileType; } When the above Files.probeContentType(Path)-based approach is executed against the set of files previously defined, the output appears as shown in the next screen snapshot.The screen snapshot indicates that the default behavior for Files.probeContentType(Path) on my JVM seems to be tightly coupled to the file extension. The files with no extensions show “null” for file type and the other listed file types match the files’ extensions rather than their actual content. For example, all three files with names starting with “dustin” are really the same single-sentence text file, but Files.probeContentType(Path) states that they are each a different type and the listed types are tightly correlated with the different file extensions for essentially the same text file. MimetypesFileTypeMap.getContentType(String) [JDK 6] The class MimetypesFileTypeMap was introduced with Java SE 6 to provide “data typing of files via their file extension” using “the .mime.types format.” The class’s Javadoc explains where in a given system the class looks for MIME types file entries. My example uses the ones that come out-of-the-box with my JDK 8 installation. The next code listing demonstrates use of javax.activation.MimetypesFileTypeMap. /** * Identify file type of file with provided name using * JDK 6's MimetypesFileTypeMap. * * See Javadoc documentation for MimetypesFileTypeMap class * ( * for details on how to configure mapping of file types or extensions. */ public String identifyFileTypeUsingMimetypesFileTypeMap(final String fileName) { final MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap(); return fileTypeMap.getContentType(fileName); } The next screen snapshot demonstrates the output from running this example against the set of test files.This output indicates that the MimetypesFileTypeMap approach returns the MIME type of application/octet-stream for several files including the XML files and the text files without a .txt suffix. We see also that, like the previously discussed approach, this approach in some cases uses the file’s extension to determine the file type and so incorrectly reports the file’s actual file type when that type is different than what its extension conventionally implies. URLConnection.getContentType() I will be covering three methods in URLConnection that support file type detection. The first is URLConnection.getContentType(), a method that “returns the value of the content-type header field.” Use of this instance method is demonstrated in the next code listing and the output from running that code against the common test files is shown after the code listing. /** * Identify file type of file with provided path and name * using JDK's URLConnection.getContentType(). * * @param fileName Name of file whose type is desired. * @return Type of file for which name was provided. */ public String identifyFileTypeUsingUrlConnectionGetContentType(final String fileName) { String fileType = "Undetermined"; try { final URL url = new URL("file://" + fileName); final URLConnection connection = url.openConnection(); fileType = connection.getContentType(); } catch (MalformedURLException badUrlEx) { out.println("ERROR: Bad URL - " + badUrlEx); } catch (IOException ioEx) { out.println("Cannot access URLConnection - " + ioEx); } return fileType; }The file detection approach using URLConnection.getContentType() is highly coupled to files’ extensions rather than the actual file type. When there is no extension, the String returned is “content/unknown.” URLConnection.guessContentTypeFromName(String) The second file detection approach provided by URLConnection that I’ll cover here is its method guessContentTypeFromName(String). Use of this static method is demonstrated in the next code listing and associated output screen snapshot. /** * Identify file type of file with provided path and name * using JDK's URLConnection.guessContentTypeFromName(String). * * @param fileName Name of file whose type is desired. * @return Type of file for which name was provided. */ public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromName(final String fileName) { return URLConnection.guessContentTypeFromName(fileName); }URLConnection‘s guessContentTypeFromName(String) approach to file detection shows “null” for files without file extensions and otherwise returns file type String representations that closely mirror the files’ extensions. These results are very similar to those provided by the Files.probeContentType(Path) approach shown earlier with the one notable difference being that URLConnection‘s guessContentTypeFromName(String) approach identifies files with .xml extension as being of file type “application/xml” while Files.probeContentType(Path) identifies these same files’ types as “text/xml”. URLConnection.guessContentTypeFromStream(InputStream) The third approach I cover that is provided by URLConnection for file type detection is via the class’s static method guessContentTypeFromStream(InputStream). A code listing employing this approach and associated output in a screen snapshot are shown next. /** * Identify file type of file with provided path and name * using JDK's URLConnection.guessContentTypeFromStream(InputStream). * * @param fileName Name of file whose type is desired. * @return Type of file for which name was provided. */ public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromStream(final String fileName) { String fileType; try { fileType = URLConnection.guessContentTypeFromStream(new FileInputStream(new File(fileName))); } catch (IOException ex) { out.println("ERROR: Unable to process file type for " + fileName + " - " + ex); fileType = "null"; } return fileType; }All the file types are null! The reason for this appears to be explained by the Javadoc for the InputStream parameter of the URLConnection.guessContentTypeFromStream(InputStream) method: “an input stream that supports marks.” It turns out that the instances of FileInputStream in my examples do not support marks (their calls to markSupported() all return false). Apache Tika All of the examples of file detection covered in this post so far have been approaches provided by the JDK. There are third-party libraries that can also be used to detect file types in Java. One example is Apache Tika, a “content analysis toolkit” that “detects and extracts metadata and text from over a thousand different file types.” In this post, I look at using Tika’s facade class and its detect(String) method to detect file types. The instance method call is the same in the three examples I show, but the results are different because each instance of the Tika facade class is instantiated with a different Detector. The instantiations of Tika instances with different Detectors is shown in the next code listing. /** Instance of Tika facade class with default configuration. */ private final Tika defaultTika = new Tika();/** Instance of Tika facade class with MimeTypes detector. */ private final Tika mimeTika = new Tika(new MimeTypes()); his is /** Instance of Tika facade class with Type detector. */ private final Tika typeTika = new Tika(new TypeDetector()); With these three instances of Tika instantiated with their respective Detectors, we can call the detect(String) method on each instance for the set of test files. The code for this is shown next. /** * Identify file type of file with provided name using * Tika's default configuration. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingDefaultTika(final String fileName) { return defaultTika.detect(fileName); }/** * Identify file type of file with provided name using * Tika's with a MimeTypes detector. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingMimeTypesTika(final String fileName) { return mimeTika.detect(fileName); }/** * Identify file type of file with provided name using * Tika's with a Types detector. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingTypeDetectorTika(final String fileName) { return typeTika.detect(fileName); } When the three above Tika detection examples are executed against the same set of files are used in the previous examples, the output appears as shown in the next screen snapshot.We can see from the output that the default Tika detector reports file types similarly to some of the other approaches shown earlier in this post (very tightly tied to the file’s extension). The other two demonstrated detectors state that the file type is application/octet-stream in most cases. Because I called the overloaded version of detect(-) that accepts a String, the file type detection is “based on known file name extensions.” If the overloaded detect(File) method is used instead of detect(String), the identified file type results are much better than the previous Tika examples and the previous JDK examples. In fact, the “fake” extensions don’t fool the detectors as much and the default Tika detector is especially good in my examples at identifying the appropriate file type even when the extension is not the normal one associated with that file type. The code for using Tika.detect(File) and the associated output are shown next. /** * Identify file type of file with provided name using * Tika's default configuration. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingDefaultTikaForFile(final String fileName) { String fileType; try { final File file = new File(fileName); fileType = defaultTika.detect(file); } catch (IOException ioEx) { out.println("Unable to detect type of file " + fileName + " - " + ioEx); fileType = "Unknown"; } return fileType; }/** * Identify file type of file with provided name using * Tika's with a MimeTypes detector. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingMimeTypesTikaForFile(final String fileName) { String fileType; try { final File file = new File(fileName); fileType = mimeTika.detect(file); } catch (IOException ioEx) { out.println("Unable to detect type of file " + fileName + " - " + ioEx); fileType = "Unknown"; } return fileType; }/** * Identify file type of file with provided name using * Tika's with a Types detector. * * @param fileName Name of file for which file type is desired. * @return Type of file for which file name was provided. */ public String identifyFileTypeUsingTypeDetectorTikaForFile(final String fileName) { String fileType; try { final File file = new File(fileName); fileType = typeTika.detect(file); } catch (IOException ioEx) { out.println("Unable to detect type of file " + fileName + " - " + ioEx); fileType = "Unknown"; } return fileType; }Caveats and Customization File type detection is not a trivial feat to pull off. The Java approaches for file detection demonstrated in this post provide basic approaches to file detection that are often highly dependent on a file name’s extension. If files are named with conventional extensions that are recognized by the file detection approach, these approaches are typically sufficient. However, if unconventional file type extensions are used or the extensions are for files with types other than that conventionally associated with that extension, most of these approaches to file detection break down without customization. Fortunately, most of these approaches provide the ability to customize the mapping of file extensions to file types. The Tika approach using Tika.detect(File) was generally the most accurate in the examples shown in this post when the extensions were not the conventional ones for the particular file types. Conclusion There are numerous mechanisms available for simple file type detection in Java. This post reviewed some of the standard JDK approaches for file detection and some examples of using Tika for file detection.Reference: Determining File Types in Java from our JCG partner Dustin Marx at the Inspired by Actual Events blog....

Using JDK 8 Streams to Convert Between Collections of Wrapped Objects and Collections of Wrapper Objects

I have found Decorators and Adapters to be useful from time to time as I have worked with Java-based applications. These “wrappers” work well in a variety of situations and are fairly easy to understand and implement, but things can become a bit more tricky when a hierarchy of objects rather than a single object needs to be wrapped. In this blog post, I look at how Java 8 streams make it easier to convert between collections of objects and collections of objects that wrap those objects. For this discussion, I’ll apply two simple Java classes representing a Movie class and a class that “wraps” that class called MovieWrapper. The Movie class was used in my post on JDK 8 enhancements to Java collections. The Movie class and the class that wraps it are shown next. package dustin.examples.jdk8.streams;import java.util.Objects;/** * Basic characteristics of a motion picture. * * @author Dustin */ public class Movie { /** Title of movie. */ private final String title;/** Year of movie's release. */ private final int yearReleased;/** Movie genre. */ private final Genre genre;/** MPAA Rating. */ private final MpaaRating mpaaRating;/** Rating. */ private final int imdbTopRating;public Movie(final String newTitle, final int newYearReleased, final Genre newGenre, final MpaaRating newMpaaRating, final int newImdbTopRating) { this.title = newTitle; this.yearReleased = newYearReleased; this.genre = newGenre; this.mpaaRating = newMpaaRating; this.imdbTopRating = newImdbTopRating; }public String getTitle() { return this.title; }public int getYearReleased() { return this.yearReleased; }public Genre getGenre() { return this.genre; }public MpaaRating getMpaaRating() { return this.mpaaRating; }public int getImdbTopRating() { return this.imdbTopRating; }@Override public boolean equals(Object other) { if (!(other instanceof Movie)) { return false; } final Movie otherMovie = (Movie) other; return Objects.equals(this.title, otherMovie.title) && Objects.equals(this.yearReleased, otherMovie.yearReleased) && Objects.equals(this.genre, otherMovie.genre) && Objects.equals(this.mpaaRating, otherMovie.mpaaRating) && Objects.equals(this.imdbTopRating, otherMovie.imdbTopRating); }@Override public int hashCode() { return Objects.hash(this.title, this.yearReleased, this.genre, this.mpaaRating, this.imdbTopRating); }@Override public String toString() { return "Movie: " + this.title + " (" + this.yearReleased + "), " + this.genre + ", " + this.mpaaRating + ", " + this.imdbTopRating; } } package dustin.examples.jdk8.streams;/** * Wraps a movie like a Decorator or Adapter might. * * @author Dustin */ public class MovieWrapper { private Movie wrappedMovie;public MovieWrapper(final Movie newMovie) { this.wrappedMovie = newMovie; }public Movie getWrappedMovie() { return this.wrappedMovie; }public void setWrappedMovie(final Movie newMovie) { this.wrappedMovie = newMovie; }public String getTitle() { return this.wrappedMovie.getTitle(); }public int getYearReleased() { return this.wrappedMovie.getYearReleased(); }public Genre getGenre() { return this.wrappedMovie.getGenre(); }public MpaaRating getMpaaRating() { return this.wrappedMovie.getMpaaRating(); }public int getImdbTopRating() { return this.wrappedMovie.getImdbTopRating(); }@Override public String toString() { return this.wrappedMovie.toString(); } } With the Movie and MovieWrapper classes defined above, I now look at converting a collection of one of these into a collection of the other. Before JDK 8, a typical approach to convert a collection of Movie objects into a collection of MovieWrapper objects would to iterate over the source collection of Movie objects and add each one to a new collection of MovieWrapper objects. This is demonstrated in the next code listing. Converting Collection of Wrapped Object Into Collection of Wrapper Objects // movies previously defined as Set<Movie> final Set<MovieWrapper> wrappedMovies1 = new HashSet<>(); for (final Movie movie : movies) { wrappedMovies1.add(new MovieWrapper(movie)); } With JDK 8 streams, the operation above can now be implemented as shown in the next code listing. Converting Collection of Wrapped Objects Into Collection of Wrapper Objects – JDK 8 // movies previously defined as Set<Movie> final Set<MovieWrapper> wrappedMovies2 = -> new MovieWrapper(movie)).collect(Collectors.toSet()); Converting the other direction (from collection of wrapper objects to collection of wrapped objects) can be similarly compared to demonstrate how JDK 8 changes this. The next two code listings show the old way and the JDK 8 way. Converting Collection of Wrapper Objects Into Collection of Wrapped Objects final Set<Movie> newMovies1 = new HashSet(); for (final MovieWrapper wrappedMovie : wrappedMovies1) { newMovies1.add(wrappedMovie.getWrappedMovie()); } Converting Collection of Wrapper Objects Into Collection of Wrapped Objects – JDK 8 final Set<Movie> newMovies2 =; Like some of the examples in my post Stream-Powered Collections Functionality in JDK 8, the examples in this post demonstrate the power of aggregate operations provided in JDK 8. The advantages of these aggregate operations over traditional iteration include greater conciseness in the code, arguably (perhaps eventually) greater readability, and the advantages of internal iteration (including easier potential streams-supported parallelization). A good example of using streams and more complex Functions to convert between collections of less cohesively related objects is shown in Transform object into another type with Java 8.Reference: Using JDK 8 Streams to Convert Between Collections of Wrapped Objects and Collections of Wrapper Objects from our JCG partner Dustin Marx at the Inspired by Actual Events blog....

Even Doctors Will Be Data Scientists

We all know how it works. You walk into a doctor’s office complaining about some pain in your leg or otherwise. They take your temperature, get you on the scale, check your blood pressure, and perhaps even get out the rubber hammer. These measurements are simply snapshots at one particular instant in time and may be subject to error. This limited dataset fails to capture temporal variations or the many other important factors that are required to assess the patient’s health status. After reviewing the few measurements collected, the consultation between the patient and doctor begins. Baased on the rudimentary physical analysis, along with the discussion with the patient, the physician will assert the condition that they believe is present, followed by a recommended treatment. This approach, which is common throughout the world, is much more based on instinct and gut feeling than a scientific approach to analyzing data. Accordingly, it seems that most decisions are made based on the opinion of the physician instead of a data-proven truth. This type of opinion-based medicine is a problem in both doctor-patient care and in medical research. This is a symptom of a lack of data, as well as years of training physicians to perform without complete data. The data collected in a typical office visit is only a fraction of the data that could be collected if health were viewed as a data problem. And, if health were redefined as a data problem, physicians would likely need different skills to process and analyze the data. Vinod Khosla is one of the most successful venture capitalists in the history of Silicon Valley. He was an original founder of Sun Microsystems, and has since gone on to finance a variety of start-up companies as a venture capitalist. While he is not a medical expert, he is a data expert. In his speech at Stanford Medicine X, Khosla highlights three major issues in medicine today:Doctors are human: Doctors, like everyone else, have cognitive limitations. Some are naturally smarter than others or have deeper knowledge about a particular topic. The latter leads to biases in how they think, act, and prescribe. Most shockingly, Khosla cites that doctors often decide on a patient diagnosis in the first 30 seconds of the observation. Said another way, they base their diagnosis on a gut reaction to the symptoms that they can see or are described to them. Opinions dominate medicine: Khosla asserts that medicine is much more based on opinion than data. He cites the Cleveland Clinic Doctors’ Review of Initial Diagnosis study, asserting that Cleveland Clinic doctors disagree with initial diagnoses 11 percent of the time. In 22 percent of cases, minor changes to treatment are recommended. And in a startling 18 percent of cases, major changes to treatment are recommended. As Khosla states, “This means it’s not medical science.” Disagreement is common among physicians: Doctors disagree a lot. It’s so dramatic, that, Khosla states, “whether or not you have surgery is a function of whom you ask.” Medicine is currently a process of trial and error, coupled with professional opinion.The Data era in medicine will be defined by a shift from intuition and opinion to data. We can collect more data in a day now than we could in a year not too long ago. Collecting data and applying it to solve healthcare problems will transform the cost and effectiveness of medicine. The question is how quickly we can get there. Medical schools must evolve as technology advances. Most advancement in medical schools, based on technology, have been focused on utilizing advanced tools and equipment, as opposed to addressing the core knowledge needed by a physician in the data era. The curriculum for the first two years of medical school varies by school, but it is heavy on the sciences, the human body, and the human condition. This has been typical since the first medical schools in the 1200s. All this time, investment and history, yet the newly minted physician is unprepared for practicing in the data era. The data era requires an augmentation in curriculum to include key skills required for data-based analysis:Mathematics Statistics Probability Data Analysis and ToolsThe skills of physicians will necessarily evolve in the data era, and that has to begin in medical schools. This focus will expedite the move away from opinion-based medicine to a future that the ill prefer: prescriptions based on hardened data analysis. This week, IBM is announcing a set of tools, technology, and processes to bring data science to the masses. Said another way, armed with IBM technology, everyone is a data scientist. We are democratizing the access to data in your organization. Every organization sees Hadoop as providing an open-source, rapidly evolving platform that is capable of collecting and economically storing a large corpus of data, waiting to be tapped. Yet, most organizations are not yet fully realizing the value of Hadoop due to the lack of skilled data scientists and developers to extract valuable insight. IBM will make everyone a data scientist. We take the first steps this week by:Introducing new modules for In-Hadoop analytics including SQL, Machine Learning, and R. Confirming our commitment to open source with IBM BigInsights Open Platform with Apache Hadoop, to include new innovations like Apache Spark. We are excited to be a founding member of the Open Data Platform. Rolling out expanded data science training for Machine Learning and Apache Spark via BigDataUniversity. Today, over 230,000 professionals and students are being trained at BigDataUniversity and we are on our way to 1 million trained.We all look forward to how things will be in 15 years. You walk into a doctor’s office, and the physician immediately knows why you are there. In fact, she had discussed some data irregularities that she had spotted at your annual physical exam, six months prior. She doesn’t need to take your temperature, as she receives that data direct from your home every day. You also take your own blood pressure monthly and that is transmitted directly to your physician. Instead, the discussion immediately turns to the possible treatments, along with the probability of success with each one. Recent data from other patients with a similar history and physiology indicate that regular medication will solve the issue 95 percent of the time. With this quick diagnosis, involving no opinions, you are on your way after ten minutes, confident that the problem has been solved. This is medicine in the data era, administered by a physician steeped in mathematics and statistics. In the data era, even doctors become data scientists. This post is adapted from my book, Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns, Wiley, 2015. Find more on the web at http://www.bigdatarevolutionbook.comReference: Even Doctors Will Be Data Scientists from our JCG partner Rob Thomas at the Rob’s Blog blog....

JBoss Fuse – Some less known trick

TL;DRexpose java static calls as Karaf shell native commands override OSGi Headers at deploy time override OSGi Headers after deploy time with OSGi FragmentsExpose java static calls as Karaf shell native commands As part of my job as software engineer that has to collaborate with support guys and customers, I very often find myself in the need of extracting additional information from a system I don’t have access to. Usual approaches, valid in all kind of softwares, are usually extracting logs, invoking interactive commands to obtain specific outputs or in what is the most complex case deploy some PoC unit that is supposed to verify a specific behavior. JBoss Fuse, adn Karaf, the platform it’s based onto do alredy a great job in exposing all those data. You have:extensive logs and integration with Log4j extensive list of jmx operation (you can eventually invoke over http with jolokia) a large list of shell commandsBut sometimes this is not enough. If you have seen my previous post about how to use Byteman on JBoss Fuse, you can imagine all the other cases:you need to print values that are not logged or returned in the code you might need to short-circuit some logic to hit a specific execution branch of your code you want to inject a line of code that wasn’t there at allByteman is still a very good option to, but Karaf has a facility we can use to run custom code. Karaf, allows you to write code directly in its shell; and allows you to record these bits of code as macro you can re-invoke. This macro will look like a native Karaf shell command! Let’s see a real example I had to implement: verify if the jvm running my JBoss Fuse instance was resolving a specific DNS as expected. The standard JDK has a method you can invoke to resolve a dns name: InetAddress.gettAllByName(String) Since that command is simple enough, meaning it doesn’t requires a complex or structured input, I thought I could turn it into an easy to reuse command: # add all public static methods on a java class as commands to the namespace "my_context": # bundle 0 is because system libs are served by that bundle classloader addcommand my_context (($.context bundle 0) loadClass That funky line is explained in this way:addcommand is the karaf shell functionality that accepts new commands my_context is the namespace/prefix you will attach you command to. In my case, “dns” would have made a good namespace. ($.context bundle 0) invokes java code. In particular we are invoking the $.context instances, that is a built-in instance exposed by Karaf shell to expose the OSGi framework, whose type is org.apache.felix.framework.BundleContextImpl, and we are invoking its method called bundle passing it the argument 0 representing the id of the OSGi classloader responsible to load the JDK classes. That call returns an instance of org.apache.felix.framework.Felix that we can use to load the specific class definition we need, that is the inline comment says, an invocation of addcommand, exposes all the public static method on that class. So we are now allowed to invoke those methods, and in particular, the one that can resolve dns entries: JBossFuse:karaf@root> my_context:getAllByName "" This functionality is described on Karaf documentation page. Override OSGi Headers at deploy time If you work with Karaf, you are working with OSGi, love it or hate it. A typical step in each OSGi workflow is playing (or fighting) with OSGi headers. If you are in total control of you project, this might be more or less easy, depending on the releationship between your deployment units. See Christian Posta post to have a glimpse of some less than obvious example. Within those conditions, a very typical situation is the one when you have to use a bundle, yours or someone else’s, and that bundle headers are not correct. What you end up doing, very often is to re-package that bundles, so that you can alter the content of its MANIFEST, to add the OSGi headers that you need. Karaf has a facility in this regard, called the wrap protocol. You might alredy know it as a shortcut way to deploy a non-bundle jar on Karaf but it’s actually more than just that. What it really does, as the name suggest, is to wrap. But it can wrap both non-bundles and bundles! Meaning that we can also use it to alter the metadata of an already packaged bundle we are about to install. Let’s give an example, again taken fron a real life experience. Apache HttpClient is not totally OSGi friendly. We can install it on Karaf with the wrap: protocol and export all its packages. JBossFuse:karaf@root> install -s 'mvn:org.apache.httpcomponents/httpclient/4.2.5' Bundle ID: 257 JBossFuse:karaf@root> exports | grep -i 257 257 No active exported packages. This command only works on started bundles, use osgi:headers instead JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.httpcomponents/httpclient/\ 4.2.5$Export-Package=*; version=4.2.5' Bundle ID: 259 JBossFuse:karaf@root> exports | grep -i 259 259 org.apache.http.client.entity; version=4.2.5 259 org.apache.http.conn.scheme; version=4.2.5 259 org.apache.http.conn.params; version=4.2.5 259 org.apache.http.cookie.params; version=4.2.5 ... And we can see that it works with plain bundles too: JBossFuse:karaf@root> la -l | grep -i camel-core [ 142] [Active ] [ ] [ ] [ 50] mvn:org.apache.camel/camel-core/2.12.0.redhat-610379 JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.camel/camel-core/2.12.0.redhat-610379\ $overwrite=merge&Bundle-SymbolicName=paolo-s-hack&Export-Package=*; version=1.0.1' Bundle ID: 269JBossFuse:karaf@root> headers 269camel-core (269) ---------------- ...Bundle-Vendor = Red Hat, Inc. Bundle-Activator = org.apache.camel.impl.osgi.Activator Bundle-Name = camel-core Bundle-DocURL = Bundle-Description = The Core Camel Java DSL based routerBundle-SymbolicName = paolo-s-hackBundle-Version = 2.12.0.redhat-610379 Bundle-License = Bundle-ManifestVersion = 2...Export-Package = org.apache.camel.fabric; uses:="org.apache.camel.util, org.apache.camel.model, org.apache.camel, org.apache.camel.processor,,, org.apache.camel.spi"; version=1.0.1,... Where you can see Bundle-SymbolicName and the version of the exported packages are carrying the values I set. Again, the functionality is described on Karaf docs and you might find useful the wrap protocol reference. Override OSGi Headers after deploy time with OSGi Fragments Last trick is powerful, but it probably requires you to remove the original bundle if you don’t want to risk having half of the classes exposed by one classloader and the remaining ones (those packages you might have added in the overridden Export) in another one. There is actually a better way to override OSGi headers, and it comes directly from an OSGi standard functionality: OSGi Fragments. If you are not familiare with the concept, the definition taken directly from OSGi wiki is: A Bundle fragment, or simply a fragment, is a bundle whose contents are made available to another bundle (the fragment host). Importantly, fragments share the classloader of their parent bundle. That page gives also a further hint about what I will describe: Sometimes, fragments are used to ‘patch’ existing bundles. We can use this strategy to:inject .jars in the classpath of our target bundle alter headers of our target bundleI have used the first case to fix a badly configured bundle that was looking for a an xml configuration descriptor that it didn’t include, and that I have provided deploying a light Fragment Bundle that contained just that. But the use case I want to show you here instead, is an improvement regarding the way to deploy Byteman on JBoss Fuse/Karaf. If you remember my previous post, since Byteman classes needed to be available from every other deployed bundle and potentially need access to every class available, we had to add Byteman packages to the org.osgi.framework.bootdelegation property, that instructs the OSGi Framework to expose the listed packages through the virtual system bundle (id = 0). You can verify what is currently serving with headers 0, I won’t include the output here since it’s a long list of jdk extension and framework classes. If you add your packages, org.jboss.byteman.rule,org.jboss.byteman.rule.exception in my case, even these packages will be listed in the output of that command. The problem with this solution is that this is a boot time property. If you want to use Byteman to manipulate the bytecode of an already running instance, you have to restart it after you have edited this properties. OSGi Fragments can help here, and avoid a preconfiguration at boot time. We can build a custom empty bundle, with no real content, that attaches to the system bundle and extends the list of packages it serves. <Export-Package> org.jboss.byteman.rule,org.jboss.byteman.rule.exception </Export-Package> <Fragment-Host> system.bundle; extension:=framework </Fragment-Host> That’s an excerpt of maven-bundle-plugin plugin configuration, see here for the full working Maven project, despite the project it’s really just 30 lines of pom.xml: JBossFuse:karaf@root> install -s mvn:test/byteman-fragment/1.0-SNAPSHOT Once you have that configuration, you are ready to use Byteman, to, for example, inject a line in java.lang.String default constructor. # find your Fuse process id PROCESS_ID=$(ps aux | grep karaf | grep -v grep | cut -d ' ' -f2)# navigate to the folder where you have extracted Byteman cd /data/software/redhat/utils/byteman/byteman-download- export Byteman env variable: export BYTEMAN_HOME=$(pwd) cd bin/# attach Byteman to Fabric8 process, no output expected unless you enable those verbose flags sh -b -Dorg.jboss.byteman.transform.all $PROCESS_ID # add these flags if you have any kind of problem and what to see what's going on: -Dorg.jboss.byteman.debug -Dorg.jboss.byteman.verbose# install our Byteman custom rule, we are passing it directly inline with some bash trick sh /dev/stdin <<OPTS# smoke test rule that uses also a custom output file RULE DNS StringSmokeTest CLASS java.lang.String METHOD <init>() AT ENTRY IF TRUE DO traceln(" works: " ); traceOpen("PAOLO", "/tmp/byteman.txt"); traceln("PAOLO", " works in files too " ); traceClose("PAOLO"); ENDRULEOPTS Now, to verify that Byteman is working, we can just invoke java.lang.String constructor in Karaf shell: JBossFuse:karaf@root> new java.lang.String works: And as per our rule, you will also see the content in /tmp/byteman.txt Inspiration for this third trick come from both the OSGi wiki and this interesting page from Spring guys.Reference: JBoss Fuse – Some less known trick from our JCG partner Paolo Antinori at the Someday Never Comes blog....

The Optimum Method to Concatenate Strings in Java

Recently I was asked this question – Is it bad for performance to use the + operator to concatenate Strings in Java? This got me thinking about the different ways in Java to concatenate Strings and how they would all perform against each other. These are the methods I’m going to investigate:          Using the + operator Using a StringBuilder Using a StringBuffer Using String.concat() Using String.join (new in Java8)I also experimented with String.format() but that is so hideously slow that I will leave it out of this post for now. Before we go any further we should separate two use cases:Concatenating two Strings together as a single call, for example in a logging message. Because this is only one call you would have thought that performance is hardly an issue but the results are still interesting and shed light on the subject. Concatenating two Strings in a loop.  Here performance is much more of an issue especially if your loops are large.My initial thoughts and questions were as follows:The + operator is implemented with StringBuilder, so at least in the case of concatenating two Strings it should produce similar results to StringBuilder. What exactly is going on under the covers? StringBuilder should be the most efficient method, after all the class was designed for the very purpose of concatenating Strings and supersedes StringBuffer. But what is the overhead of creating the StringBuilder when compared with String.concat()? StringBuffer was the original class for concatenating Strings – unfortunately its methods are synchronized. There really is no need for the synchronization and it was subsequently replaced by StringBuilder which is not synchronized.  The question is, does the JIT optimise away the synchronisation? String.concat() ought to work well for 2 strings but does it work well in a loop? String.join() has more functionality that StringBuilder, how does it affect performance if we instruct it to join Strings using an empty delimiter?The first question I wanted to get out of the way was how the + operator works. I’d always understood that it used a StringBuilder under the covers but to prove this we need to examine the byte code. The easiest way to look at byte code these days is with JITWatch which is a really excellent tool created to understand how your code is compiled by the JIT.  It has a great view where you can view your source code side by side with byte code (also machine code if you want to go to that level).Here’s the byte code for a really simple method plus2() and we can see that indeed on line 6 a StringBuilder is created and appends the variables a (line 14) and b (line 18). I thought it would be interesting to compare this against a handcrafted use of the StringBuffer so I create another method build2() with results below.The byte code generated here is not quite as compact as the plus() method.  The StringBuilder is stored into the variable cache (line 13) rather than just left on the stack.  I’m not sure why this should be but the JIT might be able to do something with this, we’ll have to see how the timings look. In any case it would be very surprising if the results of concatenating 2 strings with the plus operator and and the StringBuilder were significantly different. I wrote a small JMH test to determine how the different methods performed. Let’s first look at the two Strings test. See code below: package org.sample;import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole;import java.util.UUID; import java.util.concurrent.TimeUnit;@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(1) @State(Scope.Thread) public class LoopStringsBenchmark {private String[] strings;@Setup public void setupTest(){ strings = new String[100]; for(int i = 0; i<100; i++) { strings[i] = UUID.randomUUID().toString().substring(0, 10); } }@Benchmark public void testPlus(Blackhole bh) { String combined = ""; for(String s : strings) { combined = combined + s; } bh.consume(combined); }@Benchmark public void testStringBuilder(Blackhole bh) { StringBuilder sb = new StringBuilder(); for(String s : strings) { sb.append(s); } bh.consume(sb.toString()); }@Benchmark public void testStringBuffer(Blackhole bh) { StringBuffer sb = new StringBuffer(); for(String s : strings) { sb.append(s); } bh.consume(sb.toString()); }@Benchmark public void testStringJoiner(Blackhole bh) { bh.consume(String.join("", strings)); }@Benchmark public void testStringConcat(Blackhole bh) { String combined = ""; for(String s : strings) { combined.concat(s); } bh.consume(combined); } } The results look like this:The clear winner here is String.concat().  Not really surprising as it doesn’t have to pay the performance penalty of creating a StringBuilder / StringBuffer for each call. It does though, have to create a new String each time (which will be significant later) but for the very simple case of joining two Stings it is faster. Another point is that as we expected plus and StringBuilder are equivalent despite the extra byte code produced. StringBuffer is only marginally slower than StringBuilder which is interesting and shows that the JIT must be doing some magic to optimise away the synchronisation. The next test creates an array of 100 Strings with 10 characters each. The benchmark compares how long it takes for the different methods to concatenate the 100 Strings together. See code below: The results look quite different this time:Here the plus method really suffers.  The overhead of creating a StringBuilder every time you go round the loop is crippling. You can see this clearly in the byte code:You can see that a new StringBuilder is created (line 30) every time the loop is executed. It is arguable that the JIT ought to spot this and be able to optimise, but it doesn’t and using + becomes very slow. Again StringBuilder and StringBuffer perform exactly the same but this time they are both faster than String.concat().  The price that String.concat() pays for creating a new String on each iteration of the loop eventually mounts up and a StringBuilder becomes more efficient. String.join() does pretty well given all the extra functionality you can add to this method but, as expected, for pure concatenation it is not the best option. Summary If you are concatenating Strings in a single line of code I would use the + operator as it is the most readable and performance really doesn’t matter that much for a single call. Also beware of String.concat() as you will almost certainly need to carry out a null check which is not necessary with the other methods. When you are concatenating Strings in a loop you should use a StringBuilder.  You could use a StringBuffer but I wouldn’t necessarily trust the JIT in all circumstances to optimise away the synchronization as efficiently as it would in a benchmark. All my results were achieved using JMH and they come with the usual health warning.Reference: The Optimum Method to Concatenate Strings in Java from our JCG partner Daniel Shaya at the Rational Java blog....

MySQL as Kubernetes Service, Access from WildFly Pod

Java EE 7 and WildFly on Kubernetes using Vagrant (Tech Tip #71) explained how to run a trivial Java EE 7 application on WildFly hosted using Kubernetes and Docker. The Java EE 7 application was the hands-on lab that have been delivered around the world. It uses an in-memory database that is bundled with WildFly and allows to understand the key building blocks of Kubernetes. This is good to get you started with initial development efforts but quickly becomes a bottleneck as the database is lost when the application server goes down. This tech tip will show how to run another trivial Java EE 7 application and use MySQL as the database server. It will use Kubernetes Services to explain how MySQL and WildFly can be easily decoupled. Lets get started! Make sure to have a working Kubernetes setup as explained in Kubernetes using Vagrant. The complete source code used in this blog is available at Start MySQL Kubernetes pod First step is to start the MySQL pod. This can be started by using the MySQL Kubernetes configuration file:kubernetes> ./cluster/ create -f ../kubernetes-java-sample/mysql.json KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth create -f ../kubernetes-java-sample/mysql.json mysqlThe configuration file used is at Check the status of MySQL pod:kubernetes> ./cluster/ get pods KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth get pods POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS mysql mysql mysql:latest name=mysql PendingWait till the status changes to “Running”. It will look like:KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth get pods POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS mysql mysql mysql:latest name=mysql RunningIt takes a few minutes for MySQL server to be in that state, so grab a coffee or a quick fast one miler! Start MySQL Kubernetes service Pods, and the IP addresses assigned to them, are ephemeral. If a pod dies then Kubernetes will recreate that pod because of its self-healing features, but it might recreate it on a different host. Even if it is on the same host, a different IP address could be assigned to it. And so any application cannot rely upon the IP address of the pod. Kubernetes services is an abstraction which defines a logical set of pods. A service is typically back-ended by one or more physical pods (associated using labels), and it has a permanent IP address that can be used by other pods/applications. For example, WildFly pod can not directly connect to a MySQL pod but can connect to MySQL service. In essence, Kubernetes service offers clients an IP and port pair which, when accessed, redirects to the appropriate backends.Lets start MySQL service.kubernetes> ./cluster/ create -f ../kubernetes-java-sample/mysql-service.json KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth create -f ../kubernetes-java-sample/mysql-service.json mysqlThe configuration file used is at In this case, only a single MySQL instance is started. But multiple MySQL instances can be easily started and WildFly Pod will continue to refer to all of them using MySQL Service. Check the status/IP of the MySQL service:kubernetes> ./cluster/ get services KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth get services NAME LABELS SELECTOR IP PORT kubernetes component=apiserver,provider=kubernetes <none> 443 kubernetes-ro component=apiserver,provider=kubernetes <none> 80 mysql name=mysql name=mysql 3306 skydns k8s-app=skydns k8s-app=skydns 53Start WildFly Kubernetes Pod WildFly Pod must be started after MySQL service has started. This is because the environment variables used for creating JDBC resource in WildFly are only available after the service is up and running. Specifically, the JDBC resource is created as:data-source add --name=mysqlDS --driver-name=mysql --jndi-name=java:jboss/datasources/ExampleMySQLDS --connection-url=jdbc:mysql://$MYSQL_SERVICE_HOST:$MYSQL_SERVICE_PORT/sample?useUnicode=true&characterEncoding=UTF-8 --user-name=mysql --password=mysql --use-ccm=false --max-pool-size=25 --blocking-timeout-wait-millis=5000 --enabled=true$MYSQL_SERVICE_HOST and $MYSQL_SERVICE_PORT environment variables are populated by Kubernetes as explained here. This is shown at Start WildFly pod:kubernetes> ./cluster/ create -f ../kubernetes-java-sample/wildfly.json KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth create -f ../kubernetes-java-sample/wildfly.json wildflyThe configuration file used is at Check the status of pods:KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth get pods POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS mysql mysql mysql:latest name=mysql Running wildfly wildfly arungupta/wildfly-mysql-javaee7:knetes name=wildfly PendingWait until WildFly pod’s status is changed to Running. This could be a few minutes, so may be time to grab another quick miler! Once the container is up and running, you can check /opt/jboss/wildfly/standalone/configuration/standalone.xml in the WildFly container and verify that the connection URL indeed contains the correct IP address. Here is how it looks on my machine:[jboss@wildfly ~]$ grep 3306 /opt/jboss/wildfly/standalone/configuration/standalone.xml <connection-url>jdbc:mysql://;characterEncoding=UTF-8</connection-url>The updated status (after the container is running) would look like as shown:kubernetes> ./cluster/ get pods KUBE_MASTER_IP: Running: ./cluster/../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --auth-path=/Users/arungupta/.kubernetes_vagrant_auth get pods POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS mysql mysql mysql:latest name=mysql Running wildfly wildfly arungupta/wildfly-mysql-javaee7:knetes name=wildfly RunningAccess the Java EE 7 Application Note down the HOST IP address of the WildFly container and access the application as: curl to see the output as:<?xml version="1.0" encoding="UTF-8" standalone="yes"?><collection><employee><id>1</id><name>Penny</name></employee><employee><id>2</id><name>Sheldon</name></employee><employee><id>3</id><name>Amy</name></employee><employee><id>4</id><name>Leonard</name></employee><employee><id>5</id><name>Bernadette</name></employee><employee><id>6</id><name>Raj</name></employee><employee><id>7</id><name>Howard</name></employee><employee><id>8</id><name>Priya</name></employee></collection>Or viewed in the browser as:Debugging Kubernetes and Docker Login to the Minion-1 VM:kubernetes> vagrant ssh minion-1 Last login: Tue Feb 10 23:20:13 2015 from in as root:[vagrant@kubernetes-minion-1 ~]$ su - Password: [root@kubernetes-minion-1 ~]#Default root password for VM images created by Vagrant is “vagrant”. List of Docker containers running on this VM can be seen as:[root@kubernetes-minion-1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7fc1fca102bf arungupta/wildfly-mysql-javaee7:knetes "/opt/jboss/wildfly/ 28 minutes ago Up 28 minutes k8s_wildfly.6c5f240e_wildfly.default.api_1230e74a-b178-11e4-8464-0800279696e1_509268aa 4aa49c0ebb70 kubernetes/pause:go "/pause" 43 minutes ago Up 43 minutes>8080/tcp,>9090/tcp k8s_POD.bca60d1a_wildfly.default.api_1230e74a-b178-11e4-8464-0800279696e1_0bff6efa c36e99cd4557 mysql:latest "/ mysq 47 minutes ago Up 47 minutes k8s_mysql.278e3c40_mysql.default.api_f3d07101-b175-11e4-8464-0800279696e1_ddbcaf62 ed4611b5c276 google/cadvisor:0.8.0 "/usr/bin/cadvisor" 56 minutes ago Up 56 minutes k8s_cadvisor.8d424740_cadvisor-agent.file-6bb810db-kubernetes-minion-1.file_80331227d28e618b4cef459507a31796_36d83f7d 50a3428612f0 kubernetes/pause:go "/pause" 58 minutes ago Up 58 minutes>3306/tcp k8s_POD.c783ea16_mysql.default.api_f3d07101-b175-11e4-8464-0800279696e1_e46a8424 11a95eeda794 kubernetes/pause:go "/pause" 59 minutes ago Up 59 minutes>8080/tcp k8s_POD.252debe0_cadvisor-agent.file-6bb810db-kubernetes-minion-1.file_80331227d28e618b4cef459507a31796_734d54ebLast 10 lines of the WildFly log (after application has been accessed a few times) can be seen as:23:12:25,004 INFO [stdout] (ServerService Thread Pool -- 50) Hibernate: INSERT INTO EMPLOYEE_SCHEMA(ID, NAME) VALUES (8, 'Priya') 23:12:27,928 INFO [org.jboss.resteasy.spi.ResteasyDeployment] (MSC service thread 1-5) Deploying class org.javaee7.samples.employees.MyApplication 23:12:27,987 INFO [org.wildfly.extension.undertow] (MSC service thread 1-5) JBAS017534: Registered web context: /employees 23:12:28,073 INFO [] (ServerService Thread Pool -- 28) JBAS018559: Deployed "employees.war" (runtime-name : "employees.war") 23:12:28,203 INFO [] (Controller Boot Thread) JBAS015961: Http management interface listening on 23:12:28,203 INFO [] (Controller Boot Thread) JBAS015951: Admin console listening on 23:12:28,204 INFO [] (Controller Boot Thread) JBAS015874: WildFly 8.2.0.Final "Tweek" started in 26772ms - Started 280 of 334 services (92 services are lazy, passive or on-demand) 23:23:16,091 INFO [stdout] (default task-16) Hibernate: select as id1_0_, as name2_0_ from EMPLOYEE_SCHEMA employee0_ 23:24:07,322 INFO [stdout] (default task-17) Hibernate: select as id1_0_, as name2_0_ from EMPLOYEE_SCHEMA employee0_ 23:35:07,030 INFO [stdout] (default task-18) Hibernate: select as id1_0_, as name2_0_ from EMPLOYEE_SCHEMA employee0_Similarly, MySQL log is seen as:2015-02-10 22:52:55 1 [Note] Server hostname (bind-address): '*'; port: 3306 2015-02-10 22:52:55 1 [Note] IPv6 is available. 2015-02-10 22:52:55 1 [Note] - '::' resolves to '::'; 2015-02-10 22:52:55 1 [Note] Server socket created on IP: '::'. 2015-02-10 22:52:56 1 [Note] Event Scheduler: Loaded 0 events 2015-02-10 22:52:56 1 [Note] Execution of init_file '/tmp/mysql-first-time.sql' started. 2015-02-10 22:52:56 1 [Note] Execution of init_file '/tmp/mysql-first-time.sql' ended. 2015-02-10 22:52:56 1 [Note] mysqld: ready for connections. Version: '5.6.23' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) 2015-02-10 23:12:21 1 [Warning] IP address '' could not be resolved: Name or service not knownEnjoy!Reference: MySQL as Kubernetes Service, Access from WildFly Pod from our JCG partner Arun Gupta at the Miles to go 2.0 … blog....

Why is my JVM having access to less memory than specified via -Xmx?

“Hey, can you drop by and take a look at something weird”. This is how I started to look into a support case leading me towards this blog post. The particular problem at hand was related to different tools reporting different numbers about the available memory. In short, one of the engineers was investigating the excessive memory usage of a particular application which, by his knowledge was given 2G of heap to work with. But for whatever reason, the JVM tooling itself seemed to have not made up their mind on how much memory the process really has. For example jconsole guessed the total available heap to be equal to 1,963M while jvisualvm claimed it to be equal to 2,048M. So which one of the tools was correct and why was the other displaying different information? It was indeed weird, especially seeing that the usual suspects were eliminated – the JVM was not pulling any obvious tricks as:-Xmx and -Xms were equal so that the reported numbers were not changed during runtime heap increases JVM was prevented from dynamically resizing memory pools by turning off adaptive sizing policy (-XX:-UseAdaptiveSizePolicy)Reproducing the difference First step toward understanding the problem was zooming in to the tooling implementation. Access to available memory information via standard APIs is as simple as following: System.out.println("Runtime.getRuntime().maxMemory()="+Runtime.getRuntime().maxMemory()); And indeed, this was what the tooling at hand seemed to be using. First step towards having an answer to question like this is to have reproducible test case. For this purpose I wrote the following snippet: package eu.plumbr.test; //imports skipped for brevitypublic class HeapSizeDifferences {static Collection<Object> objects = new ArrayList<Object>(); static long lastMaxMemory = 0;public static void main(String[] args) { try { List<String> inputArguments = ManagementFactory.getRuntimeMXBean().getInputArguments(); System.out.println("Running with: " + inputArguments); while (true) { printMaxMemory(); consumeSpace(); } } catch (OutOfMemoryError e) { freeSpace(); printMaxMemory(); } }static void printMaxMemory() { long currentMaxMemory = Runtime.getRuntime().maxMemory(); if (currentMaxMemory != lastMaxMemory) { lastMaxMemory = currentMaxMemory; System.out.format("Runtime.getRuntime().maxMemory(): %,dK.%n", currentMaxMemory / 1024); } }static void consumeSpace() { objects.add(new int[1_000_000]); }static void freeSpace() { objects.clear(); } } The code is allocating chunks of memory via new int[1_000_000] in a loop and checking for the memory currently known to be available for the JVM runtime. Whenever it spots a change to the last known memory size, it reports it by printing the output of Runtime.getRuntime().maxMemory() similar to the following: Running with: [-Xms2048M, -Xmx2048M] Runtime.getRuntime().maxMemory(): 2,010,112K. Indeed – even though I had specified the JVM to use 2G of heap, the runtime somehow is not able to find 85M of it. You can double-check my math by converting the output of Runtime.getRuntime().maxMemory() to MB by dividing the 2,010,112K by 1024. The result you will get equals 1,963M, differentiating from 2048M by exactly 85M. Finding the root cause After being able to reproduce the case, I took the following note – running with the different GC algorithms also seemed to produce different results:GC algorithm Runtime.getRuntime().maxMemory()-XX:+UseSerialGC 2,027,264K-XX:+UseParallelGC 2,010,112K-XX:+UseConcMarkSweepGC 2,063,104K-XX:+UseG1GC 2,097,152KBesides G1, which is consuming exactly the 2G I had given to the process, every other GC algorithm seemed to consistently lose a semi-random amount of memory. Now it was time to dig into the source code of the JVM where in source code of the CollectedHeap I discovered the following: // Support for java.lang.Runtime.maxMemory(): return the maximum amount of // memory that the vm could make available for storing 'normal' java objects. // This is based on the reserved address space, but should not include space // that the vm uses internally for bookkeeping or temporary storage // (e.g., in the case of the young gen, one of the survivor // spaces). virtual size_t max_capacity() const = 0; The answer was rather well-hidden I have to admit that. But the hint was still there for the truly curious minds to find – referring to the fact that in some cases one of the survivor spaces might be excluded from heap size calculations.From here it was tailwinds all the way – turning on the GC logging discovered that indeed, with 2G heap the Serial, Parallel and CMS algorithms all set the survivor spaces to be sized at exactly the difference missing. For example, on the ParallelGC example above, the GC logging demonstrated the following: Running with: [-Xms2g, -Xmx2g, -XX:+UseParallelGC, -XX:+PrintGCDetails] Runtime.getRuntime().maxMemory(): 2,010,112K.... rest of the GC log skipped for brevity ...PSYoungGen total 611840K, used 524800K [0x0000000795580000, 0x00000007c0000000, 0x00000007c0000000) eden space 524800K, 100% used [0x0000000795580000,0x00000007b5600000,0x00000007b5600000) from space 87040K, 0% used [0x00000007bab00000,0x00000007bab00000,0x00000007c0000000) to space 87040K, 0% used [0x00000007b5600000,0x00000007b5600000,0x00000007bab00000) ParOldGen total 1398272K, used 1394966K [0x0000000740000000, 0x0000000795580000, 0x0000000795580000) from which you can see that the Eden space is set to 524,800K, both survivor spaces (from and to) are set to 87,040K and Old space is sized at 1,398,272K. Adding together Eden, Old and one of the survivor spaces totals exactly to 2,010,112K, confirming that the missing 85M or 87,040K was indeed the remaining Survivor space. Summary After reading the post you are now equipped with new insight into Java API implementation details. The next time certain tooling visualizes the total available heap size to be slightly less than the Xmx-specified heap size, you know the difference to be equal to the size of one of your Survivor spaces. I have to admit the fact is not particularly useful in day to day programming activities, but this was not the point for the post. Instead I wrote the post describing a particular characteristic I am always looking in good engineers – curiosity. Good engineers are always looking to understand how and why something works the way it does. Sometimes the answer remains hidden, but I still recommend you to attempt to seek answers. Eventually the knowledge built along the way will start paying out dividends.Reference: Why is my JVM having access to less memory than specified via -Xmx? from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog....

“NoSQL Injection” – What 40000 Unsecured MongoDB Databases Mean for our Industry

The news is all over reddit… Major security alert as 40,000 MongoDB databases left unsecured on the internet Security is a feature that is often neglected until it’s too late. And when it’s too late, it is often hard to bake it into a well-established architecture without major refactoring efforts. Every system and thus also every database is always vulnerable. Most databases, however, do offer a significant amount of features to implement a security layer – and MongoDB is no different from any other DBMS here. So, how could this massive security hole happen? Security is a cultural thing. Either, a company has security in their DNA, or it doesn’t. The same is true for scalability, or user experience, or any other aspect of software engineering. I’ve worked for companies that are at completely opposite ends of security awareness. Some (in the E-Banking field) were ultra-paranoiac, implementing thorough security checks in around 7 layers of the application. Others were rather lenient with management focusing much more on marketing than anything else. Without any empirical evidence, however, there was a certain correlation between security-awareness in a company and the backend-orientedness of the same company, E-Banking being a very backend-oriented business. Backend developers are more security aware This is an over-generalisation and probably doesn’t do justice to many excellent frontend developers out there, but security is where the data is. Where the algorithms are. Where people reason about constraints, workflows, batch jobs, accounting, money, … algorithms. These folks focus on all the users. On the system. And they want to protect it. On the flip side, they might neglect usability. There is only little security-awareness where the user experience is. Where people reason about layout, formatting, usability, style, … user interfaces. These folks focus on single users. On their experience. And they want to make things easy for the user. (and again, the same is true for scalability) It is no coincidence that backend technology evolves extremely slowly. Java: 20 years and we’ve just finally gotten lambdas. SQL: 30 years and we still don’t have easy ways to reuse code. At the same time, frontend technology evolves at the “speed of reddit”. The next hype is just 100 karma away, and we’ll throw all the previous tech out of the window, just to be part of the game. Clearly, security is something that has to be reasoned about way too thoroughly for it to survive in the fast-paced frontend world. What does MongoDB have to do with it? The current event isn’t actually directly related to MongoDB (you could probably find just as many unprotected MySQL instances out there). But it strongly correlates with MongoDB’s sales and marketing strategies. MongoDB has done very aggressive and successful marketing in the past, claiming that the reign of the RDBMS is over – just as much as the reign of the RDBMS had been over before, when the astonishing object databases surfaced this planet. Well, we all know where object or XML databases went:This time, the anti-RDBMS marketing resonated mostly with frontend developers, obviously, because JSON is their favourite data representation format, and MongoDB promised to be able to store data directly from the DOM into the DB. Not only did this mean “the end of the DBA” for some software vendors, but many vendors also hoped that they could omit operations, and perhaps even backend development. What obviously worked well for prototyping and simple applications doesn’t scale well to applications with sensitive data. The Solution The solution is obvious. Homogeneity kills your business. You should hire a variety of different types of personnel. You should have skilled frontend developers, backend developers, operations people, DBA, and security experts on your team. You should make them work all together, hear each of their opinions, review each others’ code, learn from each other. Because each one of them has a strong focus and interest on an entirely different, yet equally important aspect of your application. Do not neglect any of these aspects. Because if you do, and if it’s security, and if you lose sensitive customer data – well, you’re not going to stay in business, you’ll be sued in court. Got hooked on the security topic? Continue reading about …SQL injection and how your application is probably not safe yet How the right database abstraction layer can help you prevent SQL injectionReference: “NoSQL Injection” – What 40000 Unsecured MongoDB Databases Mean for our Industry from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....
Java Code Geeks and all content copyright © 2010-2015, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.