Featured FREE Whitepapers

What's New Here?

java-logo

HPROF – Memory leak analysis tutorial

This article will provide you with a tutorial on how you can analyze a JVM memory leak problem by generating and analyzing a Sun HotSpot JVM HPROF Heap Dump file. A real life case study will be used for that purpose: Weblogic 9.2 memory leak affecting the Weblogic Admin server. Environment specificationsJava EE server: Oracle Weblogic Server 9.2 MP1 Middleware OS: Solaris 10 Java VM: Sun HotSpot 1.5.0_22 Platform type: Middle tierMonitoring and troubleshooting toolsQuest Foglight (JVM and garbage collection monitoring) jmap (hprof / Heap Dump generation tool) Memory Analyzer 1.1 via IBM support assistant (hprof Heap Dump analysis) Platform type: Middle tierStep #1 – WLS 9.2 Admin server JVM monitoring and leak confirmation The Quest Foglight Java EE monitoring tool was quite useful to identify a Java Heap leak from our Weblogic Admin server. As you can see below, the Java Heap memory is growing over time. If you are not using any monitoring tool for your Weblogic environment, my recommendation to you is to at least enable verbose:gc of your HotSpot VM. Please visit my Java 7 verbose:gc tutorial on this subject for more detailed instructions.Step #2 – Generate a Heap Dump from your leaking JVM Following the discovery of a JVM memory leak, the goal is to generate a Heap Dump file (binary format) by using the Sun JDK jmap utility. ** please note that jmap Heap Dump generation will cause your JVM to become unresponsive so please ensure that no more traffic is sent to your affected / leaking JVM before running the jmap utility ** <JDK HOME>/bin/jmap -heap:format=b <Java VM PID>This command will generate a Heap Dump binary file (heap.bin) of your leaking JVM. The size of the file and elapsed time of the generation process will depend of your JVM size and machine specifications / speed.For our case study, a binary Heap Dump file of ~ 2 GB was generated in about 1 hour elapsed time. Sun HotSpot 1.5/1.6/1.7 Heap Dump file will also be generated automatically as a result of a OutOfMemoryError and by adding -XX:+HeapDumpOnOutOfMemoryError in your JVM start-up arguments. Step #3 – Load your Heap Dump file in Memory Analyzer tool It is now time to load your Heap Dump file in the Memory Analyzer tool. The loading process will take several minutes depending of the size of your Heap Dump and speed of your machine.Step #4 – Analyze your Heap Dump The Memory Analyzer provides you with many features, including a Leak Suspect report. For this case study, the Java Heap histogram was used as a starting point to analyze the leaking objects and the source.For our case study, java.lang.String and char[] data were found as the leaking Objects. Now question is what is the source of the leak e.g. references of those leaking Objects. Simply right click over your leaking objects and select >> List Objects > with incoming referencesAs you can see, javax.management.ObjectName objects were found as the source of the leaking String & char[] data. The Weblogic Admin server is communicating and pulling stats from its managed servers via MBeans / JMX which create javax.management.ObjectName for any MBean object type. Now question is why Weblogic 9.2 is not releasing properly such Objects… Root cause: Weblogic javax.management.ObjectName leak! Following our Heap Dump analysis, a review of the Weblogic known issues was performed which did reveal the following Weblogic 9.2 bug below:Weblogic Bug ID: CR327368 Description: Memory leak of javax.management.ObjectName objects on the Administration Server used to cause OutOfMemory error on the Administration Server. Affected Weblogic version(s): WLS 9.2 Fixed in: WLS 10 MP1http://download.oracle.com/docs/cd/E11035_01/wls100/issues/known_resolved.html This finding was quite conclusive given the perfect match of our Heap Dump analysis, WLS version and this known problem description. Conclusion I hope this tutorial along with case study has helped you understand how you can pinpoint the source of a Java Heap leak using jmap and the Memory Analyzer tool. Please don’t hesitate to post any comment or question. I also provided free Java EE consultation so please simply email me and provide me with a download link of your Heap Dump file so I can analyze it for you and create an article on this Blog to describe your problem, root cause and resolution. Reference: HPROF – Memory leak analysis tutorial from our JCG partner Pierre-Hugues Charbonneau at the Java EE Support Patterns & Java Tutorial blog....
java-logo

Java Thread CPU analysis on Windows

This article will provide you with a tutorial on how you can quickly pinpoint the Java Thread contributors to a high CPU problem on the Windows OS. Windows, like other OS such as Linux, Solaris & AIX allow you to monitor the CPU utilization at the process level but also for individual Thread executing a task within a process. For this tutorial, we created a simple Java program that will allow you to learn this technique in a step by step manner. Troubleshooting tools The following tools will be used below for this tutorial:Windows Process Explorer (to pinpoint high CPU Thread contributors) JVM Thread Dump (for Thread correlation and root cause analysis at code level)High CPU simulator Java program The simple program below is simply looping and creating new String objects. It will allow us to perform this CPU per Thread analysis. I recommend that you import it in an IDE of your choice e.g. Eclipse and run it from there. You should observe an increase of CPU on your Windows machine as soon as you execute it. package org.ph.javaee.tool.cpu;/** * HighCPUSimulator * @author Pierre-Hugues Charbonneau * http://javaeesupportpatterns.blogspot.com * */ public class HighCPUSimulator { private final static int NB_ITERATIONS = 500000000; // ~1 KB data footprint private final static String DATA_PREFIX = "datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatad atadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata datadatadatadatadatadatadata"; /** * @param args */ public static void main(String[] args) { System.out.println("HIGH CPU Simulator 1.0"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com/");try {for (int i = 0; i < NB_ITERATIONS; i++) { // Perform some String manipulations to slowdown and expose looping process... String data = DATA_PREFIX + i; }} catch (Throwable any) { System.out.println("Unexpected Exception! " + any.getMessage() + " [" + any + "]"); }System.out.println("HighCPUSimulator done!"); }}Step #1 – Launch Process Explorer The Process Explorer tool visually shows the CPU usage dynamically. It is good for live analysis. If you need historical data on CPU per Thread then you can also use Windows perfmon with % Processor Time & Thread Id data counters. You can download Process Explorer from the link below: http://technet.microsoft.com/en-us/sysinternals/bb896653 In our example, you can see that the Eclipse javaw.exe process is now using ~25% of total CPU utilization following the execution of our sample program.Step #2 – Launch Process Explorer Threads view The next step is to display the Threads view of the javaw.exe process. Simply right click on the javaw.exe process and select Properties. The Threads view will be opened as per below snapshot:The first column is the Thread Id (decimal format) The second column is the CPU utilization % used by each Thread - The third column is also another counter indicating if Thread is running on the CPUIn our example, we can see our primary culprit is Thread Id #5996 using ~ 25% of CPU. Step #3 – Generate a JVM Thread Dump At this point, Process Explorer will no longer be useful. The goal was to pinpoint one or multiple Java Threads consuming most of the Java process CPU utilization which is what we achieved. In order to go the next level in your analysis you will need to capture a JVM Thread Dump. This will allow you to correlate the Thread Id with the Thread Stack Trace so you can pinpoint that type of processing is consuming such high CPU. JVM Thread Dump generation can be done in a few manners. If you are using JRockit VM you can simply use the jrcmd tool as per below example:Once you have the Thread Dump data, simply search for the Thread Id and locate the Thread Stack Trace that you are interested in. For our example, the Thread “Main Thread” which was fired from Eclipse got exposed as the primary culprit which is exactly what we wanted to demonstrate. Main Thread id=1 idx=0x4 tid=5996 prio=5 alive, native_blocked at org/ph/javaee/tool/cpu/HighCPUSimulator.main (HighCPUSimulator.java:31) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace Step #4 – Analyze the culprit Thread(s) Stack Trace and determine root cause At this point you should have everything that you need to move forward with the root cause analysis. You will need to review each Thread Stack Trace and determine what type of problem you are dealing with. That final step is typically where you will spend most of your time and problem can be simple such as infinite looping or complex such as garbage collection related problems. In our example, the Thread Dump did reveal the high CPU originates from our sample Java program around line 31. As expected, it did reveal the looping condition that we engineered on purpose for this tutorial. for (int i = 0; i < NB_ITERATIONS; i++) { // Perform some String manipulations to slowdown and expose looping process... String data = DATA_PREFIX + i; }I hope this tutorial has helped you understand how you can analyze and help pinpoint root cause of Java CPU problems on Windows OS. Please stay tuned for more updates, the next article will provide you with a Java CPU troubleshooting guide including how to tackle that last analysis step along with common problem patterns. Reference: Java Thread CPU analysis on Windows from our JCG partner Pierre-Hugues Charbonneau at the Java EE Support Patterns & Java Tutorial blog....
software-development-2-logo

The True Story of the Grid Engine Dream

In Sun, I they called me Mr. Grid Engine. I was the product manager of Sun Grid Engine for a decade, until January 2010. This blog is a story of my journey. Where are we today? Defying many predictions, Grid Engine, formerly CODINE, is the Distributed Resource Management software most used as grid computing, particularly in High Performance Computing (HPC), and it refuses to yield to the cloud business model. Top Supercomputers cannot deliver the Infrastructure as a Service (HPC IaaS), yet. One day they will. If you want a metaphor, the Volkswagen Beetle created the cult car that made successful the entire company. Similarly, Grid Engine can be viewed as a not perfect, but fascinating product. It can become the launching pad for something much bigger, with a much wider adoption. Grid Engine is, yes – not a typo error – a cult product, an enviable positioning, which defies logic. Apple, Java, Facebook are cult products. Grid Engine has the potential grow from this reputation well beyond it is today. Commercial supported distributions are Univa Grid Engine and Oracle Grid Engine . Both claim to be the successor of Sun Grid Engine. The Open Source distributions are Son of Grid Engine , the latest release being 8.0.0e on April 19, 2012, and Open Grid Scheduler Here is a quote from Inc. magazine latest article 8 Core beliefs of Extraordinary BossesAverage bosses see business as a conflict between companies, departments and groups. They… demonize competitors as “enemies,” and treat customers as “territory” to be conquered. Extraordinary bosses see business as a symbiosis where the most diverse firm is most likely to survive and thrive. They naturally create teams that adapt easily to new markets and can quickly form partnerships with other companies, customers … and even competitors.Grid Engine started with an extraordinary entrepreneur, Dr Wolfgang Gentzsch, who founded Genias Software in 1991 later re-named Gridware in 1999. Wolfgang says there is only one Grid Engine Community community, which forms an ecosystem, which he calls a symbiosis of diversity. It all originated in Genias It implieda huge physical and mental effort, going through Sun acquisition of Gridware in 2000 and later – when Oracle took over Sun in 2012 the creation of the GE ecosystem. After Wolfgang left Sun, – many fine people in Sun had to leave at that time – it was frustrating to see how our efforts to have two Sun Grid Engine products (one available bysubscriptionand one available as free Open Source) failed because of management veto. On one hand we were under pressure to be profitable as a unit, on the other hand, our customers appeared to have no reason to pay even one cent for a subscription or license. Oracle still has IP control of Grid Engine. Both Univa and Oracle decided to make no more contributions to the open source. While in Oracle open source policies are clear, Univa, a champion of open source for many years, has surprised the community. This has created an agitated thread on Grid Engine discussion group Quoting from Inc. again: Extraordinary bosses see change as an inevitable part of life. While they don’t value change for its own sake, they know that success is only possible if employees and organization embrace new ideas and new ways of doing business. The paradox is the companies who make real big money do things as if not interested in money. Quora has a new thread, titled What is Mark Zuckerberg’s true attitude towards money ? Here is a quote from highest ranking answer: Mark’s main motivations were pretty clearly based around materially changing the world and building technology that was used by everyone on the planet…. My impression back then was that if he had to choose, he’d rather be the most important/influential person in the world rather than the richest. And I think that’s visible in how he directed the company to focus on user growth and product impact rather than revenue or business considerations. Even today, while Facebook makes a ton of money, it could probably make magnitudes more if that were its primary goal. What people usually say? “Well I am not Zuckerberg.” or “I am not Steve Jobs.” or “We can never make Grid Engine a business of this magnitude.” “Yes we can!” This my answer. Oracle is part of the Grid EngineEcosystem, They are one of the most powerful high tech companies in the world. In September 2008, Larry Ellison dismissed the concept of Cloud . In 2012, Cloud Computing is one of the main initiatives in Oracle. Univa web site points out that Oracle does not have a Grid Engine roadmap. This can change any moment, as Big Data becomes the buzz of the decade. Since Oracle takeover, Grid Engine is part of the Ops Center, a business unit whose culture is not in sync with Grid Engine. This may change Rackspace announced at the OpenStack Design Summit and Conference that it’s ready to run its public cloud service on OpenStack, an open source software they own and made accessible to anyone. 55 companies worldwide, including IBM support implementations. Oracle may get some inspiration here for Grid Engine The Grid Engine ecosystem has extremely giftedcontributors. In addition to Univa team (Gary Tyreman, Bill Bryce, Fritz Ferstl), we have a superb team from open source including Chris Dagdigian from BioTeam (the creator of the legendary http://gridengine.info domain), Daniel Templeton, now with Cloudera. We haveRayson Ha, Ron Chen, Dave Love, Chi Chan, Reuti and many more from the Son of Grid Engine project. Wolfgang Gentzsch tops the list to restore the soul of Grid Engine. He has made it before. He will do it again. I believe in miracles. When Steve Jobs returned to Apple, they had a month or so to dismantle and liquidate the company. But…Reference: The True Story of the Grid Engine Dream from our JCG partner Miha Ahronovitz at the The memories of a Product Manager blog. (Copyright 2012 – Ahrono Associates)...
dbunit-logo

DBUnit, Spring and Annotations for Database testing

If you have ever tried writing database tests in Java you might have come across DBUnit. DBUnit allows you to setup and teardown your database so that it contains consistent rows that you can write tests against. You usually specify the rows that you want DBUnit to insert by writing a simple XML document, for example:           <?xml version="1.0" encoding="UTF-8"?> <dataset> <Person id="0" title="Mr" firstName="Dave" lastName="Smith"/> <Person id="1" title="Mrs" firstName="Jane" lastName="Doe"/> </dataset>You can also use the same format XML files to assert that a database contains specific rows. DBUnit works especially well using in-memory databases, and if you work with Spring, setting them up is pretty straightforward. Here is a good article describing how to get started. Working directly with DBUnit is fine, but after a while it can become apparent how many of your tests are following the same pattern of setting-up the database then testing the result. To cut down on this duplication you can use the spring-test-dbunit project. This project is hosted on GitHub and provides a new set of annotations that can be added to your test methods. Version 1.0.0 has just been released and is now available in the maven central repository: <dependency> <groupId>com.github.springtestdbunit</groupId> <artifactId>spring-test-dbunit</artifactId> <version>1.0.0</version> <scope>test</scope> </dependency>Once installed three new annotations are available for use in your tests: @DatabaseSetup, @DatabaseTearDown and @ExpectedDatabase. All three can either be used on the test class or individual test methods. The @DatabaseSetup and @DatabaseTearDown annotations are used to put your database into a consistent state, either before the test runs or after it has finished. You specify the dataset to use as the annotation value, for example: @Test @DatabaseSetup("sampleData.xml") public void testFind() throws Exception { // test code }The @ExpectedDatabase annotation is used to verify the state of the database after the test has finished. As with the previous annotations you must specify the dataset to use. @Test @DatabaseSetup("sampleData.xml") @ExpectedDatabase("expectedData.xml") public void testRemove() throws Exception { // test code }You can use @ExpectedDatabase in a couple of different modes depending on how strict the verification should be (see the JavaDocs for details). For the annotations to be processed you need to make sure that your test is using the DbUnitTestExecutionListener. See the project readme for full details. If you want to learn more there is an example project on GitHub and some walk-though instructions available here. Reference: Database testing using DBUnit, Spring and Annotations from our JCG partner Phillip Webb at the Phil Webb’s Blog blog....
java-logo

Using the final keyword on method parameters

After some own confusion which specific meaning final declared method parameters have this blog entry will try to clarify this. At least the final keyword on method parameters can be seen as an indicator for the Java compiler that this parameter can not be reassigned to another reference. Java parameter handling is always Call by Value (yes, even when dealing with Objects) and here comes why.: It is true, that Java handles a reference to the Object when dealing with non-primitive data types. The Object itself is not passed from the callee to the target function! Instead a reference is passed that points to the desired Object. But this reference is not equal to the one on callee side since it is just a copy. What is passed to a function is a copied reference as value – ok, everyone’s still on board? :-) Maybe Java should use the more matching explanation Call by Copied Reference as Value. To sum up: Java exclusively passes ALL method parameters (primitive data types or references to objects) in Call by Value style! As a proof for this let’s have a look at the following demo code and its output. /** * Call by Value Test Application. * * @author Christopher Meyer * @version 0.1 * Apr 21, 2012 */ public class CBVTest {public static void main(String[] args) { Integer mainInternInteger = new Integer(1);/* * Even references are copied during calls! * * Explanation Objects are never passed, only references to them, BUT * references are copied! So only reference COPIES reach the method. * Neither changes to the reference inside/outside the method will * influence the counterpart. * * Maybe it should be called "Call by Copied Reference as Value". */class RunMe implements Runnable {Integer runnerInternInteger;public RunMe(Integer i) { runnerInternInteger = i;/* * The following operation will have no effect on the main * thread, since the reference to "i" is a copied one. * Interfacing the "caller" reference is prevented. */ i = new Integer(3); }@Override public void run() { while (true) { System.out.println(runnerInternInteger.intValue() + "\t (runner intern value)"); } } }Thread runner = new Thread(new RunMe(mainInternInteger)); runner.start();// Create a new object and assign it to "mainInternInteger". mainInternInteger = new Integer(2); while (true) { System.out.println( mainInternInteger.intValue() + "\t (main intern value)"); } } }The output of the code looks like this: ... 2 (main intern value) 2 (main intern value) 2 (main intern value) 2 (main intern value) 1 (runner intern value) 2 (main intern value) 1 (runner intern value) 2 (main intern value) 1 (runner intern value) 2 (main intern value) 1 (runner intern value) 1 (runner intern value) 1 (runner intern value) 1 (runner intern value) 1 (runner intern value) ...So neither the assignment to the handled parameter (i = new Integer(3)), nor the reassignment from the calling class (mainInternInteger = new Integer(2)) have any influence on each other. So what is it worth if it isn’t really necessary? If added to the Constructor of RunMe (public RunMe(final Integer i)) the reassignment i = new Integer(3) throws an exception: Exception in thread “main” java.lang.RuntimeException: Uncompilable source code – final parameter i may not be assigned. It prevents failures related to unintentional reassignment. Accidental assignments to the handled parameter will always fail! final forces a developer to produce accurate code. The final keyword is not part of the method signature. So if declared final or not, the compiled code will be identical (everyone can easily check this by using diff). This means that a method can’t be overloaded by declaring the method parameters once final and once not. Since the byte code remains identical it also has absolutely no influence on performance. To confuse even more keep in mind that inner classes require to define a variable final when the variable can be modified (for example when dealing with anonymous inner classes for Threads – if this isn’t clear to you consider multiple variables in the same context with identical names that can be altered). Reference: Using the final keyword on method parameters from our JCG partner Christopher Meyer at the Java security and related topics blog....
devops-logo

Keep As Much Stuff As Possible In The Application Itself

There’s a lot of Ops work to every project. Setting up server machines, and clusters of them, managing the cloud instances, setting up the application servers, HAProxy, load balancers, database clusters, message queues, search engine, DNS, alerts, and whatnot. That’s why the Devops movement is popular – there’s a lot more happening outside the application that is vital to its success. But unix/linux is tedious. I hate it, to be honest. Shell script is ugly and I would rather invent a new language and write a compiler for it, that write a shell script. I know many “hackers” will gasp at this statement, but let’s face it – it should be used only as a really last resort, because it will most likely stay out of the application’s repository, it is not developer friendly, and it’s ugly (yes, you can version it, you can write it with good comments, and still…) But enough for my hate for shell scripting (and command-line executed perl scripts for that matter). That’s not the only thing that should be kept to minimum. (Btw, this is the ‘whining’ paragraph’, you can probably skip it). The “Getting Started” guide of all databases, message queues, search engines, servers, etc. says “easy to install”. Sure, you just apt-get install it, then go to /usr/lib/foo/bar and change a configuration file, then give permissions to a newly-created user that runs it, oh, and you customize the shell-script to do something, and you’re there. Oh, and /usr/lib/foo/bar – that’s different depending on how you install it and who has installed it. I’ve seen tomcat installed in at least 5 different ways. One time all of its folders (bin, lib, conf, webapps, logs, temp) were in a completely different location on the server. And of course somebody decided to use the built-in connection pool, so the configuration has to be done in the servlet container itself. Use the defaults. Put that application server there and leave it alone. But we need a message queue. And a NoSQL database in addition to MySQL. And our architects say “no, this should not be run in embedded mode, it will couple the components”. So a whole new slew of configurations and installations for stuff that can very easily be run inside our main application virtual machine/process. And when you think the external variables are just too many – then comes URL rewriting. “Yes, that’s easy, we will just add another rewrite rule”. 6 months later some unlucky developer will be browsing through the code wondering for hours why the hell this doesn’t open. And then he finally realizes it is outside the application, opens the apache configuration file, and he sees wicked signs all over. To summarize the previous paragraph – there’s just too much to do on the operations side, and it is (obviously) not programming. Ops people should be really strict about versioning configuration and even versioning whole environment setups (Amazon’s cloud gives a nice option to make snapshots and then deploy them on new instances). But then, when somethings “doesn’t work”, it’s back to the developers to find the problem in the code. And it’s just not there. That’s why I have always strived to keep as much stuff as possible in the application itself. NoSQL store? Embedded, please. Message queue? Again. URL rewrites – your web framework does that. Application server configurations? None, if possible, you can do them per-application. Modifications of the application server startup script? No, thanks. Caching? It’s in-memory anyway, why would you need a separate process. Every external configuration needed goes to a single file that resides outside the application, and Ops (or devs, or devops) can change that configuration. No more hidden stones to find in /usr/appconf, apache or whatever. Consolidate as much as possible in the piece that you are familiar and experienced with – the code. Obviously, not everything can be there. Some databases you can’t run embedded, or you really want to have separate machines. You need a load balancer, and it has to be in front of the application. You need to pass initialization parameters for the virtual machine / process, in the startup script. But stick to the bare minimum. If you need ti make something transparent to the application, do it with a layer of code, not with scripts and configurations. I don’t know if that aligns somehow with the devops philosophy, because it is more “dev” and less “ops”, but it actually allows developers to do the ops part, because it is kept down to a minimum. And it does not involve ugly scripting languages and two-line long shell commands. I know I sound like a big *nix noob. And I truly am. But as most of these hacks can be put up in the application and be more predictable and easy to read and maintain – I prefer to stay that way. If it is not possible – let them be outside it, but really version them, even in the same repository as the code, and document them. The main purpose of all that is to improve maintainability and manageability. You have a lot of tools, infrastructure and processes around your code, so make use of them for as much as possible. Reference: Keep As Much Stuff As Possible In The Application Itself from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog....
software-development-2-logo

High Performance Webapps – Data URIs

I continue to write tips for perfomance optimization of websites. The last post was about jQuery objects. This post is about data URIs. Data URIs are an interesting concept on the Web. Read ” Data URIs explained” please if you don’t know what it does mean. Data URIs are a technique for embedding resources as base 64 encoded data, avoiding the need for extra HTTP requests. It gives you the ability to embed files, especially images, inside of other files, especially CSS. Not only images are supported by data URIs, but embedded inline images are the most interesting part of this technique. This technique allows separate images to be fetched in a single HTTP request rather than multiple HTTP requests, what can be more efficient. Decreasing the number of requests results in better page performance. “Minimize HTTP requests” is actually the first rule of the ” Yahoo! Exceptional Performance Best Practices“, and it specifically mentions data URIs. ” Combining inline images into your (cached) stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages… 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these first time visitors is key to a better user experience.” Data URI format is specified asdata:[<mime type>][;charset=<charset>][;base64],<encoded data>We are only interesting for images, so that mime types can be e.g. image/gif, image/jpeg or image/png. Charset should be omitted for images. The encoding is indicated by ;base64. One example of a valid data URI: <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO 9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot">HTML fragments with inline images like the above example are not really interesting because they are not cached. Data URIs in CSS files (style sheets) are cached along with CSS files and that brings benefits. Some advantages describing in Wikipedia:HTTP request and header traffic is not required for embedded data, so data URIs consume less bandwidth whenever the overhead of encoding the inline content as a data URI is smaller than the HTTP overhead. For example, the required base64 encoding for an image 600 bytes long would be 800 bytes, so if an HTTP request required more than 200 bytes of overhead, the data URI would be more efficient. For transferring many small files (less than a few kilobytes each), this can be faster. TCP transfers tend to start slowly. If each file requires a new TCP connection, the transfer speed is limited by the round-trip time rather than the available bandwidth. Using HTTP keep-alive improves the situation, but may not entirely alleviate the bottleneck. When browsing a secure HTTPS web site, web browsers commonly require that all elements of a web page be downloaded over secure connections, or the user will be notified of reduced security due to a mixture of secure and insecure elements. On badly configured servers, HTTPS requests have significant overhead over common HTTP requests, so embedding data in data URIs may improve speed in this case. Web browsers are usually configured to make only a certain number (often two) of concurrent HTTP connections to a domain, so inline data frees up a download connection for other content.Furthermore, data URIs are better than sprites. Images organized as CSS sprites (many small images combined to one big) are difficult to be maintained. Maintenance costs are high. Imagine, you want to change some small images in the sprite, their position, size, color or whatever. Well, there are tools allowing to generate sprites, but later changes are not easy. Especially changes in size cause a shift of all positions and a lot of CSS changes. And don’t forget – a sprite still requires one HTTP request :-). What browsers support data URIs? Data URIs are supported for all modern browsers: Gecko-based (Firefox, SeaMonkey, Camino, etc.), WebKit-based (Safari, Google Chrome), Opera, Konqueror, Internet Explorer 8 and higher. For Internet Explorer 8 data URIs must be smaller than 32 KB. Internet Explorer 9 does not have this 32 KB limitation. IE versions 5-7 lack support of data URIs, but there is MHTML – when you need data URIs in IE7 and under. Are there tools helping with automatic data URI embedding? Yes, there are some tools. The most popular is a command line tool CSSEmbed. Especially if you need to support old IE versions, you can use this command line tool which can deal with MHTML. Maven plugin for web resource optimization, which is a part of PrimeFaces Extensions project, has now a support for data URIs too. The plugin allows to embed data URIs for referenced images in style sheets at build time. This Maven plugin doesn’t support MHTML. It’s problematic because you need to include CSS files with conditional comments separately – for IE7 and under and all other browsers. How does the conversion to data URIs work?Plugin reads the content of CSS files. A special java.io.Reader implementation looks for tokens #{resource[…]} in CSS files. This is a syntax for image references in JSF 2. Token should start with #{resource[ and ends with ]}. The content inside contains image path in JSF syntax. Theoretically we can also support other tokens (they are configurable), but we’re not interested in such kind of support :-) Examples: .ui-icon-logosmall { background-image: url("#{resource['images/logosmall.gif']}") !important; }.ui-icon-aristo { background-image: url("#{resource['images:themeswitcher/aristo.png']}") !important; }In the next step the image resource for each background image is localized. Images directories are specified according to the JSF 2 specification and suit WAR as well as JAR projects. These are ${project.basedir}/src/main/webapp/resources and ${project.basedir}/src/main/resources/META-INF/resources. Every image is tried to be found in those directories. If the image is not found in the specified directories, then it doesn’t get transformed. Otherwise, the image is encoded into base64 string. The encoding is performed only if the data URI string is less than 32KB in order to support IE8 browser. Images larger than that amount are not transformed. Data URIs looks like .ui-icon-logosmall { background-image: url("data:image/gif;base64,iVBORw0KGgoAAAANSUhEUgA ... ASUVORK5CYII=") !important; }.ui-icon-aristo { background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgA ... BJRU5ErkJggg==") !important; }Configuration in pom.xml is simple. To enable this feature set useDataUri flag to true. Example: <plugin> <groupId>org.primefaces.extensions</groupId> <artifactId>resources-optimizer-maven-plugin</artifactId> <configuration> <useDataUri>true</useDataUri> <resourcesSets> <resourcesSet> <inputDir>${project.build.directory}/webapp-resources</inputDir> </resourcesSet> </resourcesSets> </configuration> </plugin>Enough theory . Now, i will describe a practice part. I will expose some measurements, screenshots and give tips how large images should be, where CSS should be placed, what is the size of CSS file with data URIs and whether a GZIP filter can help here. Read on. The first question is if it’s worth to put data URIs in style sheets? Yes, it’s worth. First, I would like to point you to this great article ” Data URIs for CSS Images: More Tests, More Questions” where you can try to test all three scenarios for your location. Latency is different depending on your location. But you can see a tendency that a web page containing data URIs is loaded faster. We can see one of the main tricks to achieve better performance with data URIs: Split your CSS in two files – one with main data and one with data URIs only and place the second one in the footer. “In the footer” means close to the HTML body tag. Page rendering feels faster then because of the progressive rendering. In the second article you can see that this technique really accelerates page rendering. Style sheet in footer leads to a nice effect that large images download in parallel with the data URI style sheet. Why? Well, browser thinks stuff placed in footer can not have any impact on page structure above included files and doesn’t block resource loading. I also read that in this case all browsers (except old IE versions) render a page immediately without waiting until CSS with data URIs has been loaded. The same is valid for JavaScript files, as far as I know. Is it valid at all to put CSS files in page footer? Well, it’s not recommended in the HTML specification. But it’s valid in practice and it’s not bad at all in special cases. There is an interesting discussion on Stackoverflow ” How bad is it to put a CSS include in the middle of the body?” The second tip is to use data URIs for small images, up to 1-2 KB. It’s not worth to use data URIs for large images. A large image has a very long data URI string (base64 encoded string) which can increase the size of CSS file. Files with a big size can block loading of other files. Remember, browsers have connection limitations. They can normally open 2-8 conection to the same domain. That means only 2-8 files can be loaded parallel at the same time. After reading some comments in internet I got an acknowledge about my assumption with 1-2 KB images. We can soften this behavior by using of GZIP filter. A GZIP filter reduces size of resources. I have read that sometimes the size of an image encoded as data URI is even smaller than the size of original image. A GZIP filter is appled to web resources like CSS, JavaScript and (X)HTML files. But it’s not recommended to apply it to images and PDF files e.g. So, not encoded images aren’t going through the filer, but CSS files are going through. In 99%, if you gzip your CSS file, the resulting size is about the same as the regular image URL reference! And that was the third tip – use a GZIP filter. I would like to show now my test results. My test environment: Firefox 11 on Kubuntu Oneiric. I prepared the showcase of PrimeFaces Extensions with 31 images which I added to the start page. These images display small themes icons in PNG format. Every image has the same size 30 x 27 px. Sizes in kilobytes lie in range 1.0 – 4.6 KB. CSS file without data URIs was 4.8 KB and with data URIs 91,6 KB. CSS files were included quite normally in HTML head section, by the way. I deployed showcases with and without data URIs on my VPS with Jetty 8 server. First without a GZIP filer. I cleared browser cache and opened Firebug for each showcase. Here results: Without data URIs: 65 requests. Page loading time 3.84s (onload: 4.14s). That means, document ready event occured after 3.84 sek. and window onload after 4.14 sek. Subsequent calls for the same page (resources were fetched from browser cache) took 577 ms, 571 ms, 523 ms, … With data URIs: 34 requests. Page loading time 3.15s (onload: 3.33s). That means, fewer requests (remember 31 embedded images), document ready event occured after 3.15 sek. and window onload after 3.33 sek. Subsequent calls for the same page (resources were fetched from browser cache) took 513 ms, 529 ms, 499 ms, … There isn’t much difference for subsequent calls (page refreshes), but there is a significant difference for the first time visiting. Especially onload event occurs faster with data URIs. No wonder. Images being loading after document is ready. Because they can not be loaded parallel (number of opened connection is limited), they get blocked. I took some pictures from Google Chrome Web Inspector. Below you can see timing for an image (vader.png) for the first (regular) case without data URI.And the second case for the same image encoded as data URI.You see in the second picture there isn’t any blocking at all. Tests with a GZIP Filter didn’t have much impact in my case (don’t know why, maybe I haven’t too much resources). Average times after a couple of tests with empty cache: Without data URIs: 65 requests. Page loading time 3.18s (onload: 3.81s). With data URIs: 34 requests. Page loading time 3.03s (onload: 3.19s). Reference: High Performance Webapps. Use Data URIs. Practice, High Performance Webapps. Use Data URIs. Theory from our JCG partner Oleg Varaksin at the Thoughts on software development blog....
crunchbase-aleri-logo

Aleri – Complex Event Processing

Sybase’s Aleri streaming platform is one of the more popular products in the CEP market segment. It’s is used in Sybase’s trading platform – the RAP edition, which is widely used in capital markets to manage positions in a portfolio. Today, in the first of the multi-part series, I want to provide an overview of the Aleri platform and provide some code samples where required. In the second part, I will present the Aleri Studio, the eclipse based GUI that simplifies the task of modeling CEP workflow and monitor the Aleri server through a dashboard. In my previous blog post on Complex Event Processing, I demonstrated the use of Esper, the open source CEP software and Twitter4J API to handle stream of tweets from Twitter. A CEP product is much more thanhandling just one stream of data though. Single stream of data could be easily handled through the standard asynchronous messaging platforms and does not pose very challenging scalability or latency issues. But when it comes to consuming more than one real time stream of data and to analyzing it in real time, and when correlation between the streams of data is important, nothing beats a CEP platform. The sources feeding streaming platform could vary in speed, volume and complexity. A true enterprise class CEP should deal effectively with various real time high speed data like stock tickers and slower but voluminous offline batch uploads, with equal ease. Apart from providing standard interfaces, CEP should also provide an easier programming language to query the streaming data and to generate continuous intelligence through such features as pattern matching and snapshot querying.Sybase Trading Platform – the RAP edition. Trackback URLTo keep it simple and at high level, CEP can be broken down to three basic parts. The first is the mechanism to grab/consume source data. Next is the process of investigating that data, identifying events & patterns and then interacting with target systems by providing them the actionable items. The actionable events take different forms and formats depending on the application you are using the CEP for. An action item could be – selling an equity position based on calculated risk in a risk monitoring application. indicating potential fraud events in money laundering applications or alerting to a catastrophic event in a monitoring system by reading thousands of sensors in a chemical plant. There literally are thousands of scenarios where a manual and off-line inspection of data is simply not an option. After you go through the following section, you may want to try Aleri yourself. This link http://www.sybase.com/aleriform directly takes you to the Aleri download page. Evaluation copy valid for 90 days is freely available from Sybase’s official website. Good amount of documentation, an excellent tutorial and some sample code on the website should help you getstarted quickly. If you are an existing user of any CEP product, I encourage you to compare Aleri with that product and share it with the community or comment on this blog. By somewhat dated estimates, Tibco CEP is the biggest CEP vendor in the market. I am not sure how much market share another leading product StreamBasehas. There is also a webinar you can listen to on Youtube.comthat explains CEP benefits in general and some key features of Streambase in specific. For new comers, this serves as an excellent introduction to CEP and a capital markets use case. An application on Aleri CEP is built by creating a model using the Studio (the gui) or using Splash(the language) or by using the Aleri Modeling language (ML) – the final stage before it is deployed. Following is a list of the key features of Splash.Data Types – Supports standard data types and XML . Also supports ‘Typedef ‘ for user defined data types. Access Control – a granular level access control enabling access to a stream or modules (containing many streams) SQL – another way of building a model. Building an Aleri studio model could take longer due to its visual paradigm. Someone proficient with SQL should be able to do it much faster using Aleri SQL which is very similar to regular SQL we all know. Joins – supported joins are Inner, Left, Right and Full Filter expressions – include Where, having, Group having ML – Aleri SQL produces data model in Aleri modeling language (ML) – A proficient ML users might use only ML (in place of Aleri Studio and Aleri SQL)to build a model. The pattern matching language – includes constructs such as ‘within’ to indicate interval (sliding window), ‘from’ to indicate the stream of data and the interesting ‘fby’ that indicates a sequence (followed by) User defined functions – user defined function interface provided in the splash allows you to create functions in C++ or Java and to use them within a splash expression in the model.Advanced pattern matching – capabilities are explained through example here. – Following three code segments and their explanations are directly taken from Sybase’s documentation on Aleri. The first example checks to see whether a broker sends a buy order on the same stock as one of his orher customers, then inserts a buy order for the customer, and then sells that stock. It creates a “buyahead” event when those actions have occurred in that sequence. within 5 minutes from BuyStock[Symbol=sym; Shares=n1; Broker=b; Customer=c0] as Buy1, BuyStock[Symbol=sym; Shares=n2; Broker=b; Customer=c1] as Buy2, SellStock[Symbol=sym; Shares=n1; Broker=b; Customer=c0] as Sell on Buy1 fby Buy2 fby Sell { if ((b = c0) and (b != c1)) { output [Symbol=sym; Shares=n1; Broker=b]; } }This example checks for three events, one following the other, using the fby relationship. Because thesame variable sym is used in three patterns, the values in the three events must be the same. Differentvariables might have the same value, though (e.g., n1 and n2.) It outputs an event if the Broker andCustomer from the Buy1 and Sell events are the same, and the Customer from the Buy2 event is different. The next example shows Boolean operations on events. The rule describes a possible theft condition,when there has been a product reading on a shelf (possibly through RFID), followed by a non-occurrenceof a checkout on that product, followed by a reading of the product at a scanner near the door. within 12 hours from ShelfReading[TagId=tag; ProductName=pname] as onShelf, CounterReading[TagId=tag] as checkout, ExitReading[TagId=tag; AreaId=area] as exit on onShelf fby not(checkout) fby exit output [TagId=t; ProductName=pname; AreaId=area];The next example shows how to raise an alert if a user tries to log in to an account unsuccessfully three times within 5 minutes. from LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login1, LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login2, LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login3, LoginAttempt[IpAddress=ip; Account=acct; Result=1] as login4 on (login1 fby login2 fby login3) and not(login4) output [Account=acct];People wishing to break into computer systems often scan a number of TCP/IP ports for an open one,and attempt to exploit vulnerabilities in the programs listening on those ports. Here’s a rule that checkswhether a single IP address has attempted connections on three ports, and whether those have been followedby the use of the “sendmail” program. within 30 minutes from Connect[Source=ip; Port=22] as c1, Connect[Source=ip; Port=23] as c2, Connect[Source=ip; Port=25] as c3 SendMail[Source=ip] as send on (c1 and c2 and c3) fby send output [Source=ip];Aleri provides many interfaces out of the box for an easy integration with source and target systems. Through these interfaces/adapters the Aleri platform can communicate with standard relational databases, messaging frameworks like IBM MQ, sockets and file system files. Data in various formats like csv, FIX, Reuters market data, SOAP, http, SMTP is easily consumed by Aleri through standardized interfaces. Following are available techniques for integrating Aleri with other systems.Pub/sub API is provided in Java, C++ and dot net – A standard pub/sub mechanism SQL interface with SELECT, UPDATE, DELETE and INSERT statements used through ODBC and JDBC connection. Built in adapters for market data and FIXIn the next part of this series we will look at the Aleri Studio, the gui that helps us build the CEP application the easy way. Aleri, the complex event processing platform from Sybase was reviewed at high level in my last post. This week, let’s review the Aleri Studio, the user interface to Aleri platform and the use of pub/sub api, one of many ways to interface with the Aleri platform. The studio is an integral part of the platform and comes packaged with the free evaluation copy. If you haven’t already done so, please download a copy from here. The fairly easy installation process of Aleri product gets you up and running in a few minutes. The aleri studio is an authoring platform for building the model that defines interactions and sequencing between various data streams. It also can merge multiple streams to form one or more streams. With this eclipse based studio, you can test the models you build by feeding them with the test data and monitor the activity inside the streams in real time. Let’s look at the various type of streams you can define in Aleri and their functionality. Source Stream - Only this type of stream can handle incoming data. The operations that can be performed by the incoming data are insert, update, delete and upsert. Upsert, as the name suggests updates data if the key defining a row is already present in the stream. Else, it inserts a record in the stream. Aggregate Stream – This stream creates a summary record for each group defined by specific attribute. This provides functionality equivalent to ‘group by’ in ANSI SQL. Copy stream – This stream is created by copying another stream but with a different retention rule. Compute Stream – This stream allows you to use a function on each row of data to get a new computed element for each row of the data stream. Extend Stream – This stream is derived from another stream by additional column expressions Filter Stream – You can define a filter condition for this stream. Just like extend and compute streams, this stream applies filter conditions on other streams to derive a new stream. Flex Stream – Significant flexibility in handling streaming data is achieved through custom coded methods. Only this stream allows you to write your own methods to meet special needs. Join Stream – Creates a new stream by joining two or more streams on some condition. Both, Inner and Outer joins can be used to join streams. Pattern Stream – Pattern matching rules are applied with this stream Union Stream – As the name suggests, this joins two or more streams with same row data structure. Unlike the join stream, this stream includes all the data from all the participating streams. By using some of these streams and the pub api of Aeri, I will demonstrate the seggregation of twitter live feed into two different streams. The twitter live feed is consumed by a listener from Twitter4j library. If you just want to try Twitter4j library first, please follow my earlier post ‘ Tracking user sentiments on Twitter‘. The data received by the twitter4j listener, is fed to a source stream in our model by using the publication API from Aleri. In this exercise we will try to separate out tweets based on their content. Built on the example from my previous post, we will divide the incoming stream into two streams based on the content. One stream will get any tweets that consists ‘lol’ and the other gets tweets with a smiley “:)” face in the text . First, let’s list the tasks we need to perform to make this a working example.Create a model with three streams Validate the model is error free Create a static data file Start the Aleri server and feed the static data file to the stream manually to confirm correct working of the model. Write java code to consume twitter feed. Use the publish API to publish the tweets to Aleri platform. Run the demo and see the live data as it flows through various streams.This image is a snapshot of the Aleri Studio with the three streams – one on the left named “tweets” is a source stream and two on the right named “lolFilter” and “smileyFilter” are of the filter type. Source stream accepts incoming data while filter streams receive the filtered data. Here is how I defined the filter conditions – like (tweets.text, ‘%lol%’). tweets is the name of the stream and text is the field in the stream we are interested in. %lol% means, select any tweets that have ‘lol’ string in the content. Each stream has only 2 fields – id and text. The id and text maps to id and text-message sent by twitter. Once you define the model, you can check it for any errors by clicking on the check mark in the ribbon at the top. Erros if any will show up in the panel at bottom right of the image. Once your model is error free, it’s time to test it. The following image shows the test interface of the studio. Try running your model with a static data file first. The small red square at the top indicates that Aleri server is currently running. The console window at the bottom right shows server messages like successful starts and stops etc. The Run-test tab in the left pane, is where you pick a static data file to feed the source stream. The pane on the right shows all the currently running streams and live data processed by the streams.The image below shows the format of the data file used to test the model tweets ALERI_OPS="i" id="1" text="324test 1234" ; tweets ALERI_OPS="i" id="2" text="test 12345"; tweets ALERI_OPS="i" id="3" text="test 1234666" ; tweets ALERI_OPS="i" id="4" text="test 1234888" ; tweets ALERI_OPS="i" id="5" text="test 1234999" ;The source code for this exercise is at the bottom. Remember that you need to have twitter4j library in the build path and have Aleri server running before you run the program. Because I have not added any timer to the execution thread, the only way to stop the execution is to abort it. For brevity and to keep the code line short, I have deleted all the exception handling and logging. The code utilizes only the publishing part of the pub/sub api of Aleri. I will demonstrate the use of sub side of the api in my next blog post. package com.sybase.aleri;import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException;import twitter4j.Status; import twitter4j.StatusDeletionNotice; import twitter4j.StatusListener; import twitter4j.TwitterException; import twitter4j.TwitterStream; import twitter4j.TwitterStreamFactory; import twitter4j.conf.Configuration; import twitter4j.conf.ConfigurationBuilder;import com.aleri.pubsub.SpGatewayConstants; import com.aleri.pubsub.SpObserver; import com.aleri.pubsub.SpPlatform; import com.aleri.pubsub.SpPlatformParms; import com.aleri.pubsub.SpPlatformStatus; import com.aleri.pubsub.SpPublication; import com.aleri.pubsub.SpStream; import com.aleri.pubsub.SpStreamDataRecord; import com.aleri.pubsub.SpStreamDefinition; import com.aleri.pubsub.SpSubscription; import com.aleri.pubsub.SpSubscriptionCommon; import com.aleri.pubsub.impl.SpFactory; import com.aleri.pubsub.impl.SpUtils; import com.aleri.pubsub.test.ClientSpObserver;import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Collection; import java.util.Date; import java.util.HashMap; import java.util.Vector; import java.util.TimeZone;public class TwitterTest_2 { //make sure that Aleri server is running prior to running this program static { //creates the publishing platform createPlatform();} // Important objects from the publish API static SpStream stream; static SpPlatformStatus platformStatus; static SpPublication pub;public static void main(String[] args) throws TwitterException, IOException { TwitterTest_2 tt2 = new TwitterTest_2(); ConfigurationBuilder cb = new ConfigurationBuilder(); cb.setDebugEnabled(true); //use your twitter id and passcode cb.setUser("Your user name"); cb.setPassword("Your Password");// creating the twitter4j listenerConfiguration cfg = cb.build(); TwitterStream twitterStream = new TwitterStreamFactory(cfg) .getInstance(); StatusListener_1 listener; listener = new StatusListener_1(); twitterStream.addListener(listener); //runs the sample that comes with twitter4j twitterStream.sample();}private static int createPlatform() { int rc = 0; //Aleri platform configuration - better alternative is to your properties file String host = "localhost"; int port = 22000; //aleri configured to run with empty userid and pwd strings String user = ""; String password = ""; //name of the source stream - the one that gets the data from the twitter4j String streamName = "tweets"; String name = "TwitterTest_2"; SpPlatformParms parms = SpFactory.createPlatformParms(host, port, user, password, false, false); platformStatus = SpFactory.createPlatformStatus(); SpPlatform sp = SpFactory.createPlatform(parms, platformStatus); stream = sp.getStream(streamName); pub = sp.createPublication(name, platformStatus); // Then get the stream definition containing the schema information. SpStreamDefinition sdef = stream.getDefinition(); /* int numFieldsInRecord = sdef.getNumColumns(); Vector colTypes = sdef.getColumnTypes(); Vector colNames = sdef.getColumnNames();*/ return 0; }static SpStream getStream() { return stream; }static SpPlatformStatus getPlatformStatus() { return platformStatus; }static SpPublication getPublication() { return pub; }static int publish(SpStream stream, SpPlatformStatus platformStatus, SpPublication pub, Collection fieldData) { int rc = 0; int i = pub.start();SpStreamDataRecord sdr = SpFactory.createStreamDataRecord(stream, fieldData, SpGatewayConstants.SO_UPSERT, SpGatewayConstants.SF_NULLFLAG, platformStatus);Collection dataSet = new Vector(); dataSet.add(sdr); System.out .println("\nAttempting to publish the data set to the Platform for stream <" + stream.getName() + ">.");rc = pub.publishTransaction(dataSet, SpGatewayConstants.SO_UPSERT, SpGatewayConstants.SF_NULLFLAG, 1);// commit blocks the thread until data is consumed by the platform System.out.println("before commit() call to the Platform."); rc = pub.commit();return 0; }}Reference: Aleri – Complex Event Processing – Part I, Understanding Aleri – Complex Event Processing – Part II from our JCG partner Mahesh Gadgil at the Simple yet Practical blog....
apache-maven-logo

Maven Does Not Suck . . . but the Maven Docs Do

I’m not going to go into the whole Maven debate, but suffice it to say that I’m a strong proponent of everything best practice, and, to me, Maven is an embodiment of best practice. By this I mean that Maven is built around a specific best practice build methodology. Note, I said a specific best practice build methodology. In the real world, there are more than a handful of build methodologies that could qualify for the best practice accolade, but Maven assumes a single one of them. This does not mean that the others are not good, it just means that if you use Maven, you going to need to buy-in to the conventions it assumes . . . or suffer. This is true for any Convention Over Configuration ( CoC ) tool, and Maven is pretty darn  CoC. Maven, like all design patterns, is a reuseable solution to the process of building software I think the occasionally discussed notion of Maven as a design pattern for builds is a powerful metaphor.  It’s useful because it emphasises that Maven, like all design patterns, is a reuseable solution to the process of building software.  It’s a best practice solution that has been refined by a community of smart people over years of heavy use.  The most obvious benefits of leveraging a design pattern for building software are the same as those for writing software.  Namely:You get a bunch of functionality with out having to write it yourself An engineer that understands the pattern as applied to one project, can instantly understand the pattern as applied to another project.Nominally, the first bullet is about productivity and and the second is about simplicity. Obviously, everybody wants to be more productive, i.e. accomplishing more with less lines of code. But, I actually think the second point — simplicity — is far more important. In my opinion, the entire field of engineering boils down, most elegantly, to the concept of “managing complexity”. By complexity, I refer directly to that headache you get when bombarded with piles of spaghetti code.  Design patterns help eliminate this intellectual discord by sealing off a big chunk of complexity in a higher level notation.  In case you’ve forgotten, this is what frees our minds up for the bigger and cooler tasks that inevitably reside on the next level. It is this point of view that makes me rank learning a new project’s ad hoc build to be one of the most annoying aspects of my profession. Even if an ant or make build is very cleanly implemented, follows a localized best practice, and automates a broad scope of the software lifecycle, it still punishes  new developers with a mountain of raw data, i.e. lines of scriptcode. Note, it’s only the ad hoc-ness that is a problem here. This is certainly not a knock on these tools. ant in particular is very good at automating your tasks and providing a reusable set of build widgets. But it does nothing to provide a reusable solution to the entire process of building software, and, accordingly, it does nothing to ease a new developers on their road to comprehending the build.  it’s the conventions that matter most with a CoC tool like Maven So, as I see it, it’s the conventions that matter most with a CoC tool like Maven.  You have to know and follow the assumed conventions in order to be successful with Maven. Projects that don’t follow the conventions quickly run afoul of Maven. First, they struggle to implement their own build process with a tool that assumes a build process of it’s own. It’s easy to fall into being upset that you can’t easily do what you’ve been doing, but the preceding paragraphs are meant to suggest that it’s actually you who needs to change, at least if you plan to continue on with Maven. When you choose Maven, you need to accept the conventions. I you can’t, I suggest you stick with Ant, which is flexible enough to meet you on your terms. Just remember that you are losing the ability to leverage the design pattern aspect of Maven to manage the complexity of your build. And if you think your build doesn’t have complexity issues, ask your self these questions:Can every engineer on our team easily build all the components of our software system? Do our engineers have the confidence to modify build scripts without angst? Do our engineers flee the room when someone is needed to address a build problem?So, if you’re with me so far, you’d probably agree that following the conventions assumed by Maven is a critical prerequisite for entering Maven nirvana. And this is what leads me to the conclude that the Maven docs suck.  They are not only inadequate, but perhaps detrimental; they mostly document the configuration while utterly failing on the critical topic of conventions.  The emphasis on configuration, which I assume is largely by accident, leads newbies into thinking it’s okay, and perhaps even  normal, to configure Maven. The Maven documentation is not only inadequate, but perhaps detrimental; it mostly documents the configuration while utterly failing on the critical topic of conventions. By documentation, I mostly mean all that stuff you find when you visit the Maven or Codehaus plugin pages. For instance, consider the extremely core maven-assembly-plugin.  Browse through the docs on the Maven sit and you’ll find that it’s almost entirely about configuration. The problem, as I’ve stated and restated, is that you don’t really want to configure Maven; you want to follow the conventions. Configuration should be only an option of last resort. plugin puts things and then the next plugin can’t find that stuff.  Use a profile to tell Maven where to find something, and then nothing else can find that thing without the profile.  Configuring Maven gets you into a bit of a configuration feedback loop, and geometric growth of configuration does not lend itself to pom readability.  Even if you can get Maven to do what you need by configuring it to death, you quickly get an incomprehensible build.     Use the configuration to change where one plugin puts things and then the next plugin can’t find that stuff. So, avoid configuration!  Stick instead to the conventional path. Your engineers will know and love their build, and you will easily leverage the many benefits offered by the Maven ecosystem — from the rich plugin libraries to the repository servers and build servers.    But how does one go about learning the Maven conventions?  It’s all about community. Luckily, it’s a pretty friendly community.  Here are some of the most important resources that I use when trying to determine how things should be done in Maven.Sonatype Blog Stackoverflow Maven Users ListAdditionally, in an effort to be a friendly community member, I’m using this blog entry as an introduction to a series of Maven entries.  Each of these entries will outline important Maven conventions. I’ll detail the convention as well as offer example poms.  So, keep in touch if you want to learn about Maven conventions. Reference: Maven Does Not Suck . . . but the Maven Docs Do from our W4G partner Chad Davis at the zeroInsertionForce blog....
software-development-2-logo

Which Private Cloud is Best and How to Select One

This litmus test is proposed to compare private cloudsHow long does it take to place in production an application delivered as service in your private cloud? (comparing apples to apples)? Less than 1 hour? Less than 1 day? Less than 1 week? More than 1 week? What is the skill level required for (1)? . Rate 1 any user, 2 any sysadmin, no training, 3 only trained computer science sysadmins Does it have a ready to use billing system to be used internally and externally? Most reply ” it has “hooks” to external “unnamed billing systems”. The reply is either Yes or Not. How the server scalability works? Manual or Automatic? Where the additional servers are located? (a) More servers on site or other sites inside the same organization are added function of aggregated demand? Or (b) servers are added from public sites for additional costswheneverthey are needed? If (b) , how outside bills are allocated to internal and external users?Now read on to see why. OpenStack vs Eucalyptus vs OpenNebula is an animated discussion on linked in. Here is my take. Don’t compete on features This discussion assumes that the winner will determined from technical features. This is wrong of course. Experience shows, the executives who back the product, – Eucalyptus has Marten Mickos – and who know all the right people – will win. OpenStack is yet to produce a startup backed by some big names well connected. If you think this is not important, read the blog from Andreessen Horowitz http://bit.ly/ww37ZZ. You will see how Opsware was transformed from one product with a single customer, and full of holes bugs, into something that HP bought for $1.6B Flaunting product features to win the war with competitors, is a mistake, because no one knows the winning features anyway. Martin Mickos tweeted a quote; ” Remember that not getting what you want is sometimes a wonderful stroke of luck.” I had a look at Andrew Chen blog Don’t compete on features http://bit.ly/xu9iZn He says: There are three key ramifications for teams building the first version of a product.Don’t compete on features. Find an interesting way to position yourself differently – not better, just differently – than your competitors and build a small featureset that addresses that use case well. If your product initially doesn’t find a fit in the market (as is common), don’t react by adding additional new features to “fix” the problem. That rarely works. Make sure your product reflects the market positioning. If your product is called the Ultimate Driving Machine,…, bring that positioning into the core of your product so that it’s immediately obvious to anyone using it…. Your product will be fundamentally differentiated from the start.I was the product manager of Sun Grid Engine for a decade and the most frequent request I had was to produce comparison with competing products LSF, PBSpro, and so on. Each time such a document was produced, it was leaked to competitors, they immediately added (or claimed they added) the features we claimed as exclusive. Some ofthefeatures were so esoteric (see A glimpse into the new features of Sun Grid Engine 6.2 Update 5, due in December 2009) that you can count the users who demanded them on your fingers. The vast majority of users did not need them Private Clouds versus wishful thinking Private Clouds Tom Morse has an excellent web site where he lists most private cloud offerings which are claimed to be products. http://www.cloudcomputeinfo.com/private-clouds It is a very nice work. Here are the companies he lists :http://aws.amazon.com/vpc/ http://www.bmc.com/solutions/cloud-computing http://www.ca.com/us/cloud-solutions.aspx http://www.cisco.com/web/about/ent/cloud/index.html http://www.cloud.com/?ntref=prod_top http://www.cloupia.com/en/ http://content.dell.com/us/en/enterprise/cloud-computing.aspx http://www.enomaly.com/ http://www.eucalyptus.com/ http://www.hexagrid.com/ http://www.hp.com http://www.hpchost.com/pages-Private_Cloud.html http://tinyurl.com/3wvj864 (IBM) http://www.microsoft.com/virtualization/en/us/private-cloud.aspx http://nebula.com/ http://nimbula.com/ http://opennebula.org/start http://openstack.org/ http://www.os33.com/ http://www.platform.com/private-cloud-computing http://www.redhat.com/solutions/cloud/foundations/ http://silver.tibco.com/run.php http://www.suse.com/solutions/platform.html#cloud http://www.vmware.com/solutions/cloud-computing/private-cloud/However this list completeness pays as price the inclusion of wishful thinking companies who they believe they are a private cloud, like IMO Cisco. Cisco under December 2011 claimed they integrated 3rd parties cloud software in their solutions, creating complicatedlabyrinthine implementations. On February 1, in Padmasree Warrior, Cisco’s CTO claimed in Cisco Live Europe event that …Cisco also has plans to build out its cloud offerings, with a four-pillar strategy to help customers build private, public and hybrid clouds on its Unified Computing System (UCS) This statement – a surprise for many engineers at execution level in Cisco who are reading on Internet what their company is up to – contradicts teh claim that Cisco has a Private Cloud Solution now. The litmus test to identify the real Private Cloud Can someone do one table comparison of all the private clouds offering on Paul Morse’s web site? I do not mean comparing features, just a few categories:How long does it take make an application delivered as service (comparing apples to apples)? ( less than 1 hour? Less than 1 day? Less than 1 week? More than 1 week? What is the skill level required for (1)? . Rate 1 any user, 2 any sysadmin, no training, 3 only trained computer science sysadmins Does it have a ready to use billing system to be used internally and externally?(Most reply” it has “hooks” to external “unnamed billing systems). The reply is either Yes or Not. How the scalability works? (a) More servers on site or other sites inside the same org. are added automatically function of aggregated demand? Or (b) servers are added from public sites for additional costs? If (4) , how outside bills are allocated to internal and external users?I don’t think there is even one person among the people I know – and I know some very competent people – who is able to answer these questions for each product from Paul Morse rather complete private clouds list. IMHO, if the resulting data center can not provide satisfactory replies to (1) through (4) questions without exception, no matter what product is used, we do not have a cloud, but another, slightly less cumbersome to run data center Note none of the litmus test questions include virtualization. Virtualization is just one tool, not an end by itself Reference: Which Private Cloud is Best and How to Select One from our JCG partner Miha Ahronovitz at the The memories of a Product Manager blog. (Copyright 2012 – Ahrono Associates)...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close