Featured FREE Whitepapers

What's New Here?


Elasticsearch: Text analysis for content enrichment

Every text search solution is as powerful as the text analysis capabilities it offers. Lucene is such open source information retrieval library offering many text analysis possibilities. In this post, we will cover some of the main text analysis features offered by ElasticSearch available to enrich your search content. Content Enrichment Taking an example of a typical eCommerce site, serving the right content in search to the end customer is very important for the business. The text analysis strategy provided by any search solution plays very big role in it. As a search user, I would prefer some of typical search behavior for my query to automatically return,should look for synonyms matching my query text should match singluar and plural words or words sounding similar to enter query text should not allow searching on protected words should allow search for words mixed with numberic or special characters should not allow search on html tags should allow search text based on proximity of the letters and number of matching lettersEnriching the content here would be to add above search capabilities to you content while indexing and searching for the content. Lucene Text Analysis Lucene is information retrieval (IR) allowing full text indexing and searching capability. For quick reference, check post Text Analysis inside Lucene. In Lucene, the document contains fields of Text. Analysis is the process of converting field text further into terms. These terms are used to match a search query. There are three main implementations for the whole analysis process,Analyzer: An Analyzer is responsible for building a TokenStream which can be consumed by the indexing and searching processes. Tokenizer: A Tokenizer is a TokenStream and is responsible for breaking up incoming text into Tokens. In most cases,an Analyzer will use a Tokenizer as the first step in the analysis process. TokenFilter: A TokenFilter is also a TokenStream and is responsible for modifying Tokens that have been created by the Tokenizer.A common usage style of TokenStreams and TokenFilters inside an Analyzer is to use the chaining pattern that lets you build complex analyzers from simple Tokenizer/TokenFilter building blocks. Tokenizers start the analysis process by demarcating the character input into tokens (mostly these correspond to words in the original text). TokenFilters then take over the remainder of the analysis, initially wrapping a Tokenizer and successively wrapping nested TokenFilters. ElasticSearch Text Analysis ElasticSearch uses Lucene inbuilt capabilities of text analysis and allows you to enrich your search content. As stated above text analysis is dividing into filters, tokenizers and analyzers. ElasticSearch offers you quite some inbuilt analyzers with preconfirgured tokenizers and filters. For details list of existing analyzers, check complete list for Analysis  Update Analysis Settings ElasticSearch allows you to dynamically update index settings and mapping. To update index setting from java api client, Settings settings = settingsBuilder().loadFromSource(jsonBuilder() .startObject() //Add analyzer settings .startObject("analysis") .startObject("filter") .startObject("test_filter_stopwords_en") .field("type", "stop") .field("stopwords_path", "stopwords/stop_en") .endObject() .startObject("test_filter_snowball_en") .field("type", "snowball") .field("language", "English") .endObject() .startObject("test_filter_worddelimiter_en") .field("type", "word_delimiter") .field("protected_words_path", "worddelimiters/protectedwords_en") .field("type_table_path", "typetable") .endObject() .startObject("test_filter_synonyms_en") .field("type", "synonym") .field("synonyms_path", "synonyms/synonyms_en") .field("ignore_case", true) .field("expand", true) .endObject() .startObject("test_filter_ngram") .field("type", "edgeNGram") .field("min_gram", 2) .field("max_gram", 30) .endObject() .endObject() .startObject("analyzer") .startObject("test_analyzer") .field("type", "custom") .field("tokenizer", "whitespace") .field("filter", new String[]{"lowercase", "test_filter_worddelimiter_en", "test_filter_stopwords_en", "test_filter_synonyms_en", "test_filter_snowball_en"}) .field("char_filter", "html_strip") .endObject() .endObject() .endObject() .endObject().string()).build(); CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(indexName); createIndexRequestBuilder.setSettings(settings); You can also set your index and settings in your configuration file. The path mentioned in the above example are relative to the config directory of installed elasticsearch server. The above example allows you to create custom filters and analyzers for your index, ElasticSearch has existing combination of different filters and tokenizers allowing you to select right combination for your data. Synonyms Synonym  are the words with the same or similar meaning. Synonym Expansion is where we take variants of the word and assign them to the search engine at the indexing and/or query time. To add synonym filter to the settings for the index. .startObject("test_filter_synonyms_en") .field("type", "synonym") .field("synonyms_path", "synonyms/synonyms_en") .field("ignore_case", true) .field("expand", true) .endObject() Check the Synonym Filter  for complete syntax. You can add synonym in Slor or WordNet format. Have a look at Slor Synonym Format for further examples, # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: ipod, i-pod, i pod => ipod Check the wordlist  for the list of words and synonyms matching to your requirements. Stemming Word stemming  is defined as the ability to include word variations. For example any noun-word would include variations (whose importance is directly proportional to the degree of variation) With word stemming, we use quantified methods for the rules of grammar to add word stems and rank them according to their degree of separation from the root word. To add stemming filter to the settings for the index. .startObject("test_filter_snowball_en") .field("type", "snowball") .field("language", "English") .endObject() Check the Snowball Filter syntax for details. Stemming programs are commonly referred to as stemming algorithms or stemmers. Lucene analysis can be algorithmic or dictionary based. Snowball,  based on Martin Porter’s Snowball algorithm provides stemming functionality and used as stemmer in above example. Check the list of snowball stemmers  for different supported languages. Synonym and stemming sometime return you strange results based on the order of text processing. Make sure to use the two in the order matching your requirements. Stop words Stop words are the list of words which you do not want to allow user to index or query upon. To add a stop word filter to the settings, .startObject("test_filter_stopwords_en") .field("type", "stop") .field("stopwords_path", "stopwords/stop_en") .endObject() Check the complete syntax for stop words filter. Check Snowball Stop words list for English language to derive your own list. Check Solr shared list of stop words  for English language. Word Delimiter Word delimiter filter allows you to split a word into sub words, for further processing on the sub words. To add a word delimiter filter to the settings, .startObject("test_filter_worddelimiter_en") .field("type", "word_delimiter") .field("protected_words_path", "worddelimiters/protectedwords_en") .field("type_table_path", "typetable") .endObject() The common split of words is based on non alphanumeric nature, case transitions and intra word delimiters etc. Check the complete syntax and different available options for Word Delimiter Filter. The list of protected words allows you to protect business relevant words from being delimited in the process. N-grams N-gram  is a continuous sequence of n letters for a given sequence of text. To add a edge ngram filter to the settings, .startObject("test_filter_ngram") .field("type", "edgeNGram") .field("min_gram", 2) .field("max_gram", 30) .endObject() Based on your configuration, the input text will be broken down into multiple token of length configured above during the indexing time. It allows you to return the result based on matching ngram tokens also based on the proximity. Check the detailed syntax from the Edge NGram Filter HTML Strip Char Filter Most of the websites have HTML content content available that should be indexable. Allowing to index and query on standard html text is not desired for most of the sites. ElasticSearch allows you to filter the html tags, which won’t be indexed and won’t be available for query. .startObject("analyzer") .startObject("test_analyzer") .field("type", "custom") .field("tokenizer", "whitespace") .field("filter", new String[]{"lowercase", "test_filter_worddelimiter_en", "test_filter_stopwords_en", "test_filter_synonyms_en", "test_filter_snowball_en"}) .field("char_filter", "html_strip") .endObject() .endObject() Check the complete syntax of HTML Strip Char Filter  for details. In addition to the above mentioned common filters, there are many more available filters allowing you to enrich your search content in desired way based on end user requirements and your business data.   Reference: Elasticsearch: Text analysis for content enrichment from our JCG partner Jaibeer Malik at the Jai’s Weblog blog. ...

Rare Uses of a “ControlFlowException”

Control flows are a “relict” from imperative programming, which has leaked into various other programming paradigms, including Java’s object oriented paradigm. Apart from the useful and ubiquitous branch and loop structures, there are also primitives (e.g. GOTO) and non-locals (e.g. exceptions). Let’s have a closer look at these controversial control flow techniques. GOTO goto is a reserved word in the Java language. goto is also a valid instruction in JVM bytecode. Yet, in Java, it isn’t easily possible to peform goto operations. One example taken from this Stack Overflow question can be seen here: Jumping forward label: { // do stuff if (check) break label; // do more stuff } In bytecode: 2 iload_1 [check] 3 ifeq 6 // Jumping forward 6 .. Jumping backward label: do { // do stuff if (check) continue label; // do more stuff break label; } while(true); In bytecode: 2 iload_1 [check] 3 ifeq 9 6 goto 2 // Jumping backward 9 .. Of course, these tricks are useful only in very very rare occasions, and even then, you might want to re-consider. Because we all know what happens when we use goto in our code:Drawing taken from xkcd: http://xkcd.com/292/ Breaking out of control flows with exceptions Exceptions are a good tool to break out of a control flow structure in the event of an error or failure. But regular jumping downwards (without error or failure) can also be done using exceptions: try { // Do stuff if (check) throw new Exception(); // Do more stuff } catch (Exception notReallyAnException) {} This feels just as kludgy as the tricks involving labels, mentioned before. Legitimate uses of exceptions for control flow: However, there are some other very rare occasions, where exceptions are a good tool to break out of a complex, nested control flow (without error or failure). This may be the case when you’re parsing an XML document using a SAXParser. Maybe, your logic is going to test the occurrence of at least three <check/> elements, in case of which you may want to skip parsing the rest of the document. Here is how to implement the above: Create a ControlFlowException: package com.example;public class ControlFlowException extends SAXException {} Note that usually, you might prefer a RuntimeException for this, but the SAX contracts require handler implementations to throw SAXException instead. Use that ControlFlowException in a SAX handler: package com.example;import java.io.File;import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory;import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler;public class Parse { public static void main(String[] args) throws Exception { SAXParser parser = SAXParserFactory .newInstance() .newSAXParser();try { parser.parse(new File("test.xml"), new Handler()); System.out.println( "Less than 3 <check/> elements found."); } catch (ControlFlowException e) { System.out.println( "3 or more <check/> elements found."); } }private static class Handler extends DefaultHandler {int count;@Override public void startElement( String uri, String localName, String qName, Attributes attributes) {if ("check".equals(qName) && ++count >= 3) throw new ControlFlowException(); } } } When to use exceptions for control flow: The above practice seems reasonable with SAX, as SAX contracts expect such exceptions to happen, even if in this case, they’re not exceptions but regular control flow. Here are some indications about when to use the above practice in real world examples:You want to break out of a complex algorithm (as opposed to a simple block). You can implement “handlers” to introduce behaviour into complex algorithms. Those “handlers” explicitly allow throwing exceptions in their contracts. Your use case does not pull the weight of actually refactoring the complex algorithm.A real-world example: Batch querying with jOOQ In jOOQ, it is possible to “batch store” a collection of records. Instead of running a single SQL statement for every record, jOOQ collects all SQL statements and executes a JDBC batch operation to store them all at once. As each record encapsulates its generated SQL rendering and execution for a given store() call in an object-oriented way, it would be quite tricky to extract the SQL rendering algorithm in a reusable way, without breaking (or exposing) too many things. Instead, jOOQ’s batch operation implements this simple pseudo-algorithm: // Pseudo-code attaching a "handler" that will // prevent query execution and throw exceptions // instead: context.attachQueryCollector();// Collect the SQL for every store operation for (int i = 0; i < records.length; i++) { try { records[i].store(); }// The attached handler will result in this // exception being thrown rather than actually // storing records to the database catch (QueryCollectorException e) {// The exception is thrown after the rendered // SQL statement is available queries.add(e.query()); } } A real-world example: Exceptionally changing behaviour Another example from jOOQ shows how this technique can be useful to introduce exceptional behaviour that is applicable only in rare cases. As explained in issue #1520, some databases have a limitation regarding the number of possible bind values per statement. These are:SQLite: 999 Ingres 10.1.0: 1024 Sybase ASE 15.5: 2000 SQL Server 2008: 2100In order to circumvent this limitation, it will be necessary for jOOQ to inline all bind values, once the maximum has been reached. As jOOQ’s query model heavily encapsulates SQL rendering and variable binding behaviour by applying the composite pattern, it is not possible to know the number of bind values before traversing a query model tree. For more details about jOOQ’s query model architecture, consider this previous blog post: http://blog.jooq.org/2012/04/10/the-visitor-pattern-re-visited So the solution is to render the SQL statement and count bind values that are effectively going to be rendered. A canonical implementation would be this pseudo code: String sql;query.renderWith(countRenderer); if (countRenderer.bindValueCount() > maxBindValues) { sql = query.renderWithInlinedBindValues(); } else { sql = query.render(); } As can be seen, a canonical implementation will need to render the SQL statement twice. The first rendering is used only to count the number of bind values, whereas the second rendering will generate the true SQL statement. The problem here is that the exceptional behaviour should only be put in place, once the exceptional event (too many bind values) occurs. A much better solution is to introduce a “handler” that counts bind values in a regular “rendering attempt”, throwing a ControlFlowException for those few exceptional “attempts” where the number of bind values exceeds the maximum: // Pseudo-code attaching a "handler" that will // abort query rendering once the maximum number // of bind values was exceeded: context.attachBindValueCounter(); String sql; try {// In most cases, this will succeed: sql = query.render(); } catch (ReRenderWithInlinedVariables e) { sql = query.renderWithInlinedBindValues(); } The second solution is better, because:We only re-render the query in the exceptional case. We don’t finish rendering the query to calculate the actual count, but abort early for re-rendering. I.e. we don’t care if we have 2000, 5000, or 100000 bind values.Conclusion As with all exceptional techniques, remember to use them in the right moment. If in doubt, think again.   Reference: Rare Uses of a “ControlFlowException” from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog. ...

Try the jBPM Console NG (Beta)

Hi everyone out there! This is another post about the jBPM Console NG. After 6 months of heavy work I’m happy to be writing this post for the developers community to try it out. On this post I will be explaining how to build the application from the sources. The main idea behind this is to know how to set up your environment and modify the application while your testing it. You will basically learn all you need to know to contribute with the project.Introduction The jBPM Console NG aims to provide a Task & Process   Management collaborative environment to facilitate the adoption of the BPM Suite in a company. Downloading the sources and compiling the application will allow you to try the application and modify it in the case that you want to extend it or fix bugs. The application is under the Apache License V2 so it can be used and modified according with this license. Working with the Source Code The first step in order to get everything running is to get the source code using GIT. This are the things that you need to have installed in your computer in order to proceed:JDK 6 Maven 3.x Git Any IDE (Eclipse, IntelliJ, Netbeans) with the maven plugin installed JBoss Application Server 7.1.1 (optional)Once you get all these tools installed we can proceed to get the source code from the github repository: https://github.com/droolsjbpm/jbpm-console-ng/ In order to get a “Clone” of the repository to work you must from the terminal: git clone https://github.com/droolsjbpm/jbpm-console-ng.git Once it’s done, you can compile the source code, here you have two alternatives:Compile the project for development purposes with: mvn clean installCompile the project to generate the distribution wars for JBoss and Tomcat + the documentation mvn clean install -PfullProfileSit back and relax! The first time that you do this step Maven requires to download tons of libraries, so you will need to wait. Running the application in Hosted Mode Once the project is compiled, the jbpm-console-ng-showcase can be executed in what GWT calls “Hosted Mode” (also known as Developer Mode) In order to start up the application in hosted mode you should do the following:The jBPM Console NG Showcase contains the final application distribution code: cd jbpm-console-ng-showcase/Run in hosted mode using the GWT Maven Plugin mvn gwt:runThis will start up a Jetty + the GWT Development Mode screen which will allow you to copy the URL where the application is hosted for you to try it:GWT Hosted Mode   Copying the URL ( into your browser (For hosted mode you need to have the GWT plugin installed in your browser, don’t worry it’s automatically installed if you don’t have it) will open the application. I strongly recommend to use Firefox for development mode or Chrome (usually slower), because for developing we scope the compilations to work on FF and Chrome (gecko browsers). Running the application in JBoss AS  7 Now if you want to deploy the application on JBoss, you need to go the the second compilation option (-PfullProfile) which will take some extra time to compile the application for all the browsers and all the languages (English, Spanish, etc.). In order to deploy the application to your jboss as 7 instance you will need to move the war file generated inside the jbpm-console-ng/jbpm-console-ng-distribution-wars/target/jbpm-console-ng-jboss-as7.war into the <jboss-as>/standalone/deployments directory and then rename the war file to jbpm-console-ng.war. The name of the application will be used as the root context for the application. For the JBoss you also need to do some configurations for the users and roles. Inside the jBPM Console NG you will need to have set up the users that will be available for your installation. Those are handle by JBoss Security Domains. In order to set up the security domains, you need to do the following:Edit the <jboss_as>/configuration/standalone.xml and add a new security domain: <security-domain name=”jbpm-console-ng” cache-type=”default”> <authentication> <login-module code=”UsersRoles” flag=”required”> <module-option name=”usersProperties”   value=”${jboss.server.config.dir}/users.properties”/> <module-option name=”rolesProperties” value=”${jboss.server.config.dir}/roles.properties”/> </login-module> </authentication> </security-domain> add the users.properties and roles.properties files content of the user.properties file: maciek=Merck salaboy=salaboy katy=katy john=john content of the roles.properties file: maciek=jbpm-console-user,kie-user,analyst,HR,PM,Reviewer salaboy=jbpm-console-user,user,analyst,PM,IT,Reviewer katy=jbpm-console-user,HR john=jbpm-console-user,AccountingThe only requirement for the roles file is to include the jbpm-console-user role for all the users. Note that this is the simplest way of configuring a security domain, but you can go for more advanced options, like configuring the security domain to use an LDAP server or a Database to authenticate your users and roles. (https://docs.jboss.org/author/display/AS7/Security+subsystem+configuration) Then you are ready to go, you can start jboss with:Go into the bin directory: cd <jboss-as>/bin/Start the application server: ./standalone.shOn Openshift In order to deploy the application into openshift you need to obviously have an openshift account. Once you set up your account you will need to do almost the same configurations as in the JBoss Application. In the openshift git repository that you clone, you will have a specific dir to apply this configuration: .openshift/config There you will find the standalone.xml file and you can place the users.properties and roles.properties files. So in the standalone.xml file you will need to configure the security domains as we did before and add the users.property and roles.properties files. Besides this configuration you will need to set up a system property for storing the knowledge repository: <system-properties> <property name="org.kie.nio.git.dir" value="~/jbossas-7/tmp/data"/> </system-properties> The Application Now you are ready to use the application, so if you point your browser to the URL provided by the hosted mode or to http://localhost:8080/jbpm-console-ng/ you will be able to access the login form. As you will see, before entering the application you will need to provide your credentials. Once you are in the application is divided in:CycleIn the Authoring section you will be able to access to the Process Designer to model your business processes. The Process Management section will allow you to list the available Business Processes and Start new instances, and also monitor those instances. The Work Section will enable you to access the Task Lists (Calendar and Grid View)  to work on the tasks assigned to you. In order to use the BAM section you will need to deploy the BAM dashboard application but I will describe that in a future post. Feel free to try it out and write a comment back if you find something wrong. Contributions Your feedback means a lot, but if you want to contribute, you can fork the jbpm-console-ng repository in github: https://github.com/droolsjbpm/jbpm-console-ng/ I will appreciate if you can test the Task Lists and Process Management screens and write feedback in this post, so I can iteratively improve what we have. I will be writing another post to describe the screens and also to list a set of small tasks that you can contribute back.   Reference: Try the jBPM Console NG (Beta)! (for developers) from our JCG partner Mauricio Salatino at the Drools & jBPM blog. ...

Login Tokens In Email Links

Your system is probably sending some emails. Sometimes these emails contain links to the public part of the site, sometimes they have links to the authentication-protected part. Either way, if the email is sent to registered users (as opposed to just subscribed emails) you should not make the user type in username and password. Even if it’s the public part of the site, the user may then want to do something that requires authentication with the content you display – e.g. add it to favourites. It’s a bit tricky with public content though, and you might not want to have it there, as users tend to forward “digest” or “best this week” emails, and then the recipients will be able to impersonate them, without even knowing. Anyway, when users click on a link in an email sent by your system, they could be   logged in automatically. That’s the product requirement I think many systems should have, and it sounds pretty reasonable. But I’ve rarely seen in. Let’s assume you like it and you want to implement it in your system. How to do that, as there are certainly some security implications?For each email sent, generate a token. The token should be the HMAC of (the user’s email (or username, or id) concatenated with the token generation timestamp) Use an application-wide, configurable per-environment key for the HMAC. Also use a good hash algorithm, like SHA-256 (and not MD5) Append the token to each link in the email. You are probably using an email template, so just have ?emailLoginToken=${token} for each link. Also pass the email and the timestamp as parameters. Have a filter/interceptor that looks for that specific parameter name, and if it is found, invoke the authentication-by-token logic (described below) Redo the HMAC with the parameters passed (email and timestamp) and compare it to the token parameter Check if the token is still valid – you can have an expiration period and don’t allow tokens to be used after more than 2 weeks after generating them. If everything is OK (the token passed and the result of redoing the HMAC are the same), login the user (send a session cookie). But add a flag (in the session) that the login was via a token. Then do not allow changing password, email or any “sensitive” action without password confirmation. Use the same conditions as in your “remember me” login, if you have implemented that. Have all such links go through httpsThere’s another, slightly different scenario. Instead of using hmac and sending all the parameters, just generate the token as the hash of the email and timestamp and store it (and the timestamp) in the database, with foreign key to the users table. In this case, the authenticity confirmation comes from the fact, that it’s found in your database, rather than through verifying an HMAC. Then, when the token is presented as a parameter, simply look it up in the database. You should still pass the user’s email as parameter and compare the passed email parameter to the email of the user corresponding to the persisted token (to avoid guesses). On successful login you can delete the token. You may decide not to do this, and have a scheduled job that cleans up tokens older than X weeks. If you delete it immediately, it will be more secure. If you leave it, it will make it possible for the user to open the same mail again, and the login will still work. This approach lets you invalidate tokens on demand. For example if users decides to change their password or email, you can invalidate all their tokens. It seems like a tough process, but it’s fairly easy to implement. The tough part is taking all the points into account and not compromising security. Note that security is an important aspect here. As pointed out on reddit, email may be insecure – when the user downloads emails via unsecured POP3 (with Outlook, Thunderbird, etc), an attacker may obtain the links and impersonate the user. That’s why you should restrict the actions the user can perform when logged in via a token. This may not matter that much, since most users either use webmail or secured POP3, or internal email server. But you should not do that for banking software, for example. Having tokens in links is probably less problematic than password reset links, though, which can also be intercepted the same way. But I’ll probably discuss that in a separate article. One extra step you can take when addressing security is restrict the token authentication to the most often used IP addresses by the user. Overall, the above approach has more pros than security risks, so I believe any mainstream site should implement it (having in mind all the security implications).   Reference: Login Tokens In Email Links from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog. ...

Spring MVC, Ajax and JSON Part 1 – Setting The Scene

I’ve been thinking about writing a blog on Spring, Ajax and JSON for a good while, but I’ve never got around to it. This was mainly because it was quite complicated and the technique required has been in a state of flux. When I decided to write this blog, I had a scout around the Internet and if you look on places such as Stack Overflow you’ll see many different and often contradictory answers to the question “how do I write a Spring Ajax/JSON application?” I think that this is fault of the Guys at Spring in that they’ve been really busy improving Spring’s support for JSon; not only that the Guys at JQuery have also been busy, which means that overall things have changed dramatically over the last couple of years and the answers to this “how do I write a Spring Ajax/JSON application?” are out of date.     If you take a look at Keith Donald’s original Spring 3 MVC Ajax application you’ll see that it’s hideously complex. There’s tons of boiler plate code and several hacky bits of JavaScript required to support JSON integration. In the latest release of Spring, all that’s changed; as I said the Guys at Spring and JQuery have been busy and things are now a lot simpler. When writing this kind of application there’s a few steps to consider. Firstly, you need to load a page that’s capable of making an Ajax request into a browser. Secondly, you have to write some code to service the Ajax request and, last of all, the page has to present its results. In order to demonstrate Spring MVC, Ajax and JSON I’m going to use a shopping web site scenario. In this scenario when the user clicks on the eCommerce page link, the app loads some the items from the catalogue and displays them on the page. The user then checks a number of items and presses ‘Confirm Purchase’. Now, this is where Ajax and JSON comes in, on pressing ‘Confirm Purchase’ the browser makes an Ajax request to the server sending it the item ids. The server then retrieves the items from the database returns them as JSON to the browser. The browser then processes the JSON, displaying the items on he screen. In writings the code, the first step is to create a Spring MVC project using the project templates available on the Spring Dashboard.Once you have a blank project there are a couple of changes that you need to make to the project’s POM file. Firstly you need to add in the Jackson JSON Processor dependencies. Next, you need to update the version of Spring to 3.2.2. This is because the template project generator still produces a version 3.1.1 project. <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.0.4</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.0.4</version> </dependency> <org.springframework-version>3.2.2.RELEASE</org.springframework-version> If you look at my code, available on Github, you’ll see that I’ve also added the Tomcat Maven plugin and set the Java compiler source/target versions to 1.7. These are both optional steps. The next thing to do is to create an Item class to define items that the user will purchase from our imaginary catalogue. public class Item {  private final int id;  private final String description;  private final String name;  private final BigDecimal price;  private Item(int id, String name, String description, BigDecimal price) {     this.id = id;     this.name = name;     this.description = description;     this.price = price;   }  public final BigDecimal getPrice() {    return price;   }  public final int getId() {    return id;   }  public final String getDescription() {    return description;   }  public final String getName() {    return name;   }  public static Item getInstance(int id, String name, String description, BigDecimal price) {    Item item = new Item(id, name, description, price);     return item;   }} The code above defines our simple Item. It’s attributes include id, name, description and price. The next step in this shopping scenario is to write the code that displays the items on the screen, so that the user can make his/her selections and submit them to the server. As you might expect this involves writing a JSP that includes a form and handling the request from the form with a Spring MVC controller. I’m going to talk about the controller code first because it determines how the JSP is written.   /**    * Create the form    */   @RequestMapping(value = "/shopping", method = RequestMethod.GET)   public String createForm(Model model) {    logger.debug("Displaying items available in the store...");     addDisplayItemsToModel(model);    addFormObjectToModel(model);    return FORM_VIEW;   }  private void addDisplayItemsToModel(Model model) {    List<Item> items = catalogue.read();     model.addAttribute("items", items);   }  private void addFormObjectToModel(Model model) {     UserSelections userSelections = new UserSelections();     model.addAttribute(userSelections);   } The controller method that gets our form displayed on the screen is createForm(…). This method is annotated by the usual RequestMapping annotation that tells Spring to map all GET requests with a URL of ‘shopping‘ to this location. The method has three steps: it first reads the catalogue to obtain a list of items to display; then it creates a UserSelections object, which is used by the form when it submits the items a user has bought; and finally it directs us to the shopping.jsp. These setup steps are pretty normal for this kind of form creation method: first you add the display data to the model and then you add the forms means of submission to the model; however, quite often these two steps are combined. In the controller code you’ll also see a catalogue object that’s used to get hold of the items. In a real application this would be equivalent to creating a service layer component that reads data using a DAO and all the paraphernalia usually associated with this kind of application. In this case it simply creates a list of items from hardcoded arrays and is not important. This controller code ties up very nicely to the JSP snippet below: <form:form modelAttribute="userSelections" action="confirm" method="post"> <c:forEach items="${items}" var="item"> <div class="span-4 border"> <p><c:out value="${item.name}" /></p> </div> <div class="span-8 border"> <p><c:out value="${item.description}" /></p> </div> <div class="span-4 border"> <p>£<c:out value="${item.price}" /></p> </div> <div class="span-4 append-4 last"> <p><form:checkbox value="${item.id}" path="selection"/></p> </div> </c:forEach> <div class="prepend-12 span-4 append-12"> <p><input class="command" type="submit" name="action" value="Confirm Purchase" accesskey="A" /></p> </div> </form:form> There are a couple of points to note here. Firstly, I’m making life easy for myself by using the Spring form tag (<form:form … >) and secondly I’m using Blueprint to format my page. In setting up the form tag, the first thing to consider is the form tag’s attributes: modelAttribute, command and method. modelAttribute is used bind the UserSelections class supplied by the controller to the HTML form. The command attribute is a URL that tells the browser where to submit its data and the method attribute tells the browser to POST the submission to the server. In the next part of the JSP I’ve used a forEach loop to display the items previously retrieved from the catalogue. The important line here is the form:checkbox tag. This creates, as you may suspect, a HTML checkbox using the item’s id and a selection ‘path’. To me the term ‘path’ sounds confusing. What the guys at Spring actually mean is “on submission, take the value stored in the checkbox’s value attribute (item.id) and if selected store it in the UserSelections object using the setSelection(…) method”. This is done in the background probably by parsing the HttpServletRequest object and then doing some jiggery poker with Java reflection. The point to note is how the names in the JSP tie into the attribute names of the UserSelection class. I’ve found the Spring form tag pretty useful in a large majority of cases; however, in making things simple it does have occasional limitations in what data it can bind to which HTML object. When you bump into one of these limitations then use more the verbose Spring Bind tag in conjunction with the form tag. Okay, so when you run this code you get a screen displayed that looks something like this:The thing is, I know that I haven’t talked about AKAX and JSON in this blog, but I needed to set the scene. In the second part of this blog I’ll definitely get down to implementing the nitty-gritty part of the scenario: obtaining and displaying JSON data via an Ajax call. For the full source code to this blog, see GitHub -https://github.com/roghughe/captaindebug/tree/master/ajax-json   Reference: Spring MVC, Ajax and JSON Part 1 – Setting The Scene from our JCG partner Roger Hughes at the Captain Debug’s Blog blog. ...

Git Explained: For Beginners

I’m working with Git now for about two years but only for my personal projects and those I have on GitHub. At work we still use TFS and SVN (as of now). Recently Paolo Perrotta came to our company to hold a course about Agile planning and since Git was quite new to most of my mates, he also quickly explained Git in the context of refactoring. I really liked his approach of explaining it and that’s why I’d like to replicate his explanation here. Just before we start.. How is Git different from other VCS (Version Control Systems)? Probably the most obvious difference is that Git is distributed (unlike SVN or TFS for instance). This   means, you’ll have a local repository which lives inside a special folder named .git and you’ll normally (but not necessarily) have a remote, central repository where different collaborators may contribute their code. Note that each of those contributors has an exact clone of the repository on their local workstation. Git itself can be imagined as something that sits on top of your file system and manipulates files. Even better, you can imagine Git as a tree structure where each commit creates a new node in that tree. Nearly all Git commands actually serve to navigate on this tree and to manipulate it accordingly. As such in this tutorial I’d like to take a look at how Git works by viewing a Git repository from the point of view of the tree it constructs. To do so I walk through some common use cases likeadding/modifying a new file creating and merging a branch with and without merge conflicts Viewing the history/changelog Performing a rollback to a certain commit Sharing/synching your code to a remote/central repositoryTerminology Here’s the git terminology:master - the repository’s main branch. Depending on the work flow it is the one people work on or the one where the integration happens clone - copies an existing git repository, normally from some remote location to your local environment. commit - submitting files to the repository (the local one); in other VCS it is often referred to as “checkin” fetch or pull - is like “update” or “get latest” in other VCS. The difference between fetch and pull is that pull combines both, fetching the latest code from a remote repo as well as performs the merging. push - is used to submit the code to a remote repository remote - these are “remote” locations of your repository, normally on some central server. SHA - every commit or node in the Git tree is identified by a unique SHA key. You can use them in various commands in order to manipulate a specific node. head - is a reference to the node to which our working space of the repository currently points. branch - is just like in other VCS with the difference that a branch in Git is actually nothing more special than a particular label on a given node. It is not a physical copy of the files as in other popular VCS.Workstation Setup I do not want to go into the details of setting up your workstation as there are numerous tools which partly vary on the different platforms. For this post I perform all of the operations on the command line. Even if you’re not the shell-guy you should give it a try (it never hurts). To setup command line Git access simply go to git-scm.com/downloads where you’ll find the required downloads for your OS. More detailed information can be found here as well. After everything is set up and you have “git” in your PATH environment variable, then the first thing you have to do is to config git with your name and email: $ git config --global user.name "Juri Strumpflohner" $ git config --global user.email "myemail@gmail.com" Lets get started: Create a new Git Repository Before starting, lets create a new directory where the git repository will live and cd into it: $ mkdir mygitrepo $ cd mygitrepo Now we’re ready to initialize a brand new git repository. $ git init Initialized empty Git repository in c:/projects/mystuff/temprepos/mygitrepo/.git/ We can check for the current status of the git repository by using $ git status # On branch master # # Initial commit # nothing to commit (create/copy files and use "git add" to track) Create and commit a new file The next step is to create a new file and add some content to it. $ touch hallo.txt $ echo Hello, world! > hallo.txt Again, checking for the status now reveals the following $ git status # On branch master # # Initial commit # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # hallo.txt nothing added to commit but untracked files present (use "git add" to track) To “register” the file for committing we need to add it to git using $ git add hallo.txt Checking for the status now indicates that the file is ready to be committed: $ git status # On branch master # # Initial commit # # Changes to be committed: # (use "git rm --cached <file>..." to unstage) # # new file: hallo.txt # We can now commit it to the repository $ git commit -m "Add my first file" 1 file changed, 1 insertion(+) create mode 100644 hallo.txt It is common practice to use the “presence” in commit messages. So rather than writing “added my first file” we write “add my first file”. So if we now step back for a second and take a look at the tree we would have the following.There is one node where the “label” master points to. Add another file Lets add another file: $ echo "Hi, I'm another file" > anotherfile.txt $ git add . $ git commit -m "add another file with some other content" 1 file changed, 1 insertion(+) create mode 100644 anotherfile.txt Btw, note that this time I used git add . which adds all files in the current directory (.). From the point of view of the tree we now have another node and master has moved on to that one.Create a (feature)branch Branching and merging is what makes Git so powerful and for what it has been optimized, being a distributed version control system (VCS). Indeed, feature branches are quite popular to be used with Git. Feature branches are created for every new kind of functionality you’re going to add to your system and they are normally deleted afterwards once the feature is merged back into the main integration branch (normally the master branch). The advantage is that you can experiment with new functionality in a separated, isolated “playground” and quickly switch back and forth to the original “master” branch when needed. Moreover, it can be easily discarded again (in case it is not needed) by simply dropping the feature branch. But lets get started. First of all I create the new feature branch: $ git branch my-feature-branch Executing $ git branch * master my-feature-branch we get a list of branches. The * in front of master indicates that we’re currently on that branch. Lets switch to my-feature-branch instead: $ git checkout my-feature-branch Switched to branch 'my-feature-branch' Again $ git branch master * my-feature-branch Note you can directly use the command git checkout -b my-feature-branch to create and checkout a new branch in one step. What’s different to other VCS is that there is only one working directory. All of your branches live in the same one and there is not a separate folder for each branch you create. Instead, when you switch between branches, Git will replace the content of your working directory to reflect the one in the branch you’re switching to. Lets modify one of our existing files $ echo "Hi" >> hallo.txt $ cat hallo.txt Hello, world! Hi …and then commit it to our new branch $ git commit -a -m "modify file adding hi" 2fa266a] modify file adding hi 1 file changed, 1 insertion(+) Note, this time I used the git commit -a -m to add and commit a modification in one step. This works only on files that have already been added to the git repo before. New files won’t be added this way and need an explicit git add as seen before. What about our tree?So far everything seems pretty normal and we still have a straight line in the tree, but note that now master remained where it was and we moved forward with my-feature-branch. Lets switch back to master and modify the same file there as well. $ git checkout master Switched to branch 'master' As expected, hallo.txt is unmodified: $ cat hallo.txt Hello, world! Lets change and commit it on master as well (this will generate a nice conflict later). $ echo "Hi I was changed in master" >> hallo.txt $ git commit -a -m "add line on hallo.txt" c8616db] add line on hallo.txt 1 file changed, 1 insertion(+) Our tree now visualizes the branch:Merge and resolve conflicts The next step would be to merge our feature branch back into master. This is done by using the merge command $ git merge my-feature-branch Auto-merging hallo.txt CONFLICT (content): Merge conflict in hallo.txt Automatic merge failed; fix conflicts and then commit the result. As expected, we have a merge conflict in hallo.txt. Hello, world! <<<<<<< HEAD Hi I was changed in master ======= Hi >>>>>>> my-feature-branch Lets resolve it: Hello, world! Hi I was changed in master Hi ..and then commit it $ git commit -a -m "resolve merge conflicts" [master 6834fb2] resolve merge conflicts The tree reflects our merge. Jump to a certain commit Lets assume we want to jump back to a given commit. We can use the git log command to get all the SHA identifiers that uniquely identify each node in the tree. $ git log commit 6834fb2b38d4ed12f5486ebcb6c1699fe9039e8e Merge: c8616db 2fa266a Author: = <juri.strumpflohner@gmail.com> Date: Mon Apr 22 23:19:32 2013 +0200resolve merge conflictscommit c8616db8097e926c64bfcac4a09306839b008dc6 Author: Juri <juri.strumpflohner@gmail.com> Date: Mon Apr 22 09:39:57 2013 +0200add line on hallo.txtcommit 2fa266aaaa61c51bd77334516139597a727d4af1 Author: Juri <juri.strumpflohner@gmail.com> Date: Mon Apr 22 09:24:00 2013 +0200modify file adding hicommit 03883808a04a268309b9b9f5c7ace651fc4f3f4b Author: Juri <juri.strumpflohner@gmail.com> Date: Mon Apr 22 09:13:49 2013 +0200add another file with some other contentcommit aad15dea687e46e9104db55103919d21e9be8916 Author: Juri <juri.strumpflohner@gmail.com> Date: Mon Apr 22 08:58:51 2013 +0200Add my first file Take one of the identifiers (also if it isn’t the whole one, it doesn’t matter) and jump to that node by using the checkout command $ git checkout c8616db Note: checking out 'c8616db'.You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:git checkout -b new_branch_nameHEAD is now at c8616db... add line on hallo.txt Note the comment git prints out. What does that mean? Detached head means “head” is no more pointing to a branch “label” but instead to a specific commit in the tree. You can think of the HEAD as the “current branch”. When you switch branches with git checkout, the HEAD revision changes to point to the tip of the new branch. … It is possible for HEAD to refer to a specific revision that is not associated with a branch name. This situation is called a detached HEAD. Stackoverflow Post Basically when I now change hallo.txt and commit the change, the tree looks as follows:As you can see, the newly created node has no label on it. The only reference that currently points towards it is head. However, if we now switch to master again then the previous commit will be lost as we have no way of jumping back to that tree node. $ git checkout master Warning: you are leaving 1 commit behind, not connected to any of your branches:576bcb8 change file undoing previous changesIf you want to keep them by creating a new branch, this may be a good time to do so with:git branch new_branch_name 576bcb8239e0ef49d3a6d5a227ff2d1eb73eee55Switched to branch 'master' And in fact, git is so kind to remind us about this fact. The tree looks now again as in figure 6. Rollback Jumping back is nice, but what if we want to undo everything back to the state before the merge of the feature branch? It is as easy as $ git reset --hard c8616db HEAD is now at c8616db add line on hallo.txtThe generic syntax here is git reset --hard <tag/branch/commit id>. Sharing/Synching your Repository Ultimately we want to share our code, normally by synching it to a central repository. For doing so, we have to add a remote. $ git remote add origin git@github.com:juristr/intro.js.git To see whether I succeeded, simply type: $ git remote -v which lists all of the added remotes. Now we need to publish our local branch master to the remote repository. This is done like $ git push -u origin master And we’re done. The real powerful thing is that you can add multiple different remotes. This is often used in combination with cloud hosting solutions for deploying your code on your server. For instance, you could add a remote named “deploy” which points to some cloud hosting server repository, like $ git remote add deploy git@somecloudserver.com:juristr/myproject and then whenever you want to publish your branch you execute a $ git push deploy Cloning Similarly it works if you’d like to start from an existing remote repository. The first step that needs to be done is to “checkout” the source code which is called cloning in Git terminology. So we would do something like $ git clone git@github.com:juristr/intro.js.git Cloning into 'intro.js'... remote: Counting objects: 430, done. remote: Compressing objects: 100% (293/293), done. remote: Total 430 (delta 184), reused 363 (delta 128) Receiving objects: 100% (430/430), 419.70 KiB | 102 KiB/s, done. Resolving deltas: 100% (184/184), done. This will create a folder (in this case) named “intro.js” and if we enter it $ cd intro.js/ and check for the remotes we see that the according tracking information of the remote repository is already set up $ git remote -v origin git@github.com:juristr/intro.js.git (fetch) origin git@github.com:juristr/intro.js.git (push) We can now start the commit/branch/push cycle just normally. Resources and Links The scenarios above were the simples, but at the same time probably also the most used ones. But there’s a lot more Git is capable of. To get more details you may want to consult the links below.http://gitready.com/ Book: Pro Git by Scott Chacon Try Git in 15 minutes Introduction to Git with Scott Chacon of GitHub My personal Git Cheat Sheet where I continuously add stuff I want to remember  Reference: Git Explained: For Beginners from our JCG partner Juri Strumpflohner at the Juri Strumpflohner’s TechBlog blog. ...

Apache Camel 2.11 released

Last week Apache Camel 2.11 was released. This blog post is a summary of the most noticeable new features and improvements. For a detailed description, see the Camel 2.11 release notes. 1) New components As usual each new release contains a number of new components, contributed by our large user base. Thanks guys. For example there is camel-cmis which allows to integrate with content management systems, such as Alfresco, or any of the systems supported by Apache Chemistry, which is what we use in camel-cmis. We also got a new camel-couchdb for integrating with our fellow Apache CouchDB project. Also very   exiting is the new camel-elasticsearch component, to integrate with the excellent elastichsearch project.In the hawt new project hawtio we are also working on some great new stuff with elastichsearch around log aggregation and analytics, with a shiny HTML5 graphical user interface using Kibana. If you haven’t see hawtio yet, then make sure to check it out. Then James Strachan created the new camel-rx component, for integrating Camel beautifully, as Eric Maijer said, with the fantastic Netflix port of Reative Extensions (RX) library. And when we got Java8 as well, this is going to rock. Then I created the new camel-servletlistener component to allow bootstrapping Apache Camel applications in web applications with no other dependencies. Usually people would need to use Spring or other library to do this. To demonstrate this in example, we have a new servlet-tomcat-no-spring example. As well my blog entry – Camel web applications without Spring, earlier this year about this new functionality. For the upcoming Camel 2.12 we will work on an alternative example using the new blueprint-web module that allows Spring XML like configurations but with only two JARs as dependencies (blueprint-noosgi, blueprint-web). This works beautifully, and we already use it hawtio project to easily bootstrap hawtio web application from a blueprint XML file. Scott Sullivan stepped up and created the new camel-sjms component for a light-weight JMS component which only depends on the JMS API. This component is expected to be further improved and hardened in the upcoming releases. The camel-sjms component is not a 1 to 1 replacement for the existing camel-jms component. They are two independent components. We want the freedom in camel-sjms to implement the functionality we think is best needed, and as well to avoid the many many options that Spring JMS exposes, and hence creeped into camel-jms as well. We also have a new component for integration with Redis with the camel-spring-redis component. And last year I created the camel-urlrewrite component, which allows people to do Camel routes for proxying HTTP services with URL rewrites. I have previously blogged about this as well. We also created a new control bus component, which allows you to send messages to a control-bus endpoint to control routes. This may make it easier for people to start/stop their Camel routes. This component is expected to be improved in the future, so you for example can get performance statistics and other information as well. 2) SQL component can now consume as well The SQL component has been improved, so you can now consume as well. This allows you to pickup new data from table(s) and route the data in Camel routes. This is best illustrated by the new camel-example-sql that we created. Oh and we also added support for using named parameters in the SQL queries. 3) Groovy DSL The Groovy DSL in Camel has been totally overhauled, thanks to community contributions. The DSL is now fully up to date and uses the Groovy’sh style that makes it much more Groovy like. We also added a new Camel Maven Archetype to create a new Camel groovy project. 4) CDI improvements In the earlier phase of development of Camel 2.11 we worked on improving the camel-cdi component. We are not there yet but its a big step in the right direction. We are also waiting a bit for Apache DeltaSpike project to do new releases so we can finish the last pieces. So expect this to be improved in upcoming releases as well. 5) camel-netty scales better We also worked on improving the camel-netty component to be faster. Most noticeable the netty producer that now pools channels for reuse. 6) JAXB controlling namespace prefixes For people stuck in XML land and using JAXB, then we made it easier to control namespace prefixes, so you 100% can control the prefix names in use. This allows to conform the XML to a naming style, or if you must use a specific prefix name; usually if a legacy system expects prefix names to be hardcoded. 7) Guice 3.0 People who are fan of Guice, would be glad to hear we have upgraded camel-guice to use Guice 3.0 as is. The old guicyfruit dependency, which was needed when using Guice 1.x is now gone. 8) Backlog tracer We introduced a new backlog tracer, which allows tooling to trace Camel messages at runtime, on-demand. There is new camel-backlog-tracer command(s) for Apache Karaf / ServiceMix, which allows you to trace messages on your running Camel applications at runtime. You can even enable a predicate filter, to only trace matched messages etc. 9) OSGi upgrades Apache Camel 2.11 now requires OSGi 4.3 and Apache Aries 1.0 if you use the camel-blueprint component. This means that you should use Apache Karaf 2.3 or better as the container. 10) Miscellanies We have improved the startup of Apache Camel a bit, as well the simple language is now faster when invoking OGNL like expressions. And we disabled the type converter utilization statistics as there is a slight performance impact under heavy load. And we managed to let the camel-jms component re-create temporary queues when doing request/reply over JMS with temporary queues, and the connection has been re-connected (eg automatic self-heal). The camel-cxfrs component has a simpler binding, making it easier to use. And as usual we have a ton of bug fixes, minor improvements and new features. See the release notes for full details. Apache Camel 2.11 is available for download from the Apache web site, and Maven users can get it from maven central. The Camel team is now busy working on Apache Camel 2.12, where we have some exiting new work on improved documentations for Camel components.   Reference: Apache Camel 2.11 released from our JCG partner Claus Ibsen at the Claus Ibsen riding the Apache Camel blog. ...

Java PDF Libraries

Recently I had a task to select some Java PDF libraries for PDF generation. But it wasn’t a simple task. The first thing which came into my mind was iText. It’s well know Java library with good reputation. But… there is some stopper. iText version 5+ is released under the AGPL license. I.e. we have to pay money if we want to use iText in commercial product. I’ve created the next small checklist which covers project needs:liberal license support maximum amount of project features (e.g. absolute element positioning) good documentation huge amount of samples possibility to render HTML to PDFI’ve reviewed the next libraries:iText 5.0+ AGPL license iText 4.2 MPL/LGPL licenses PDF Box Apache License, Version 2.0 JPedal JPedal has a LGPL release to provide a full java PDF viewer under a LGPL license FOP Apache License, Version 2.0 gnujpdf LGPL License PJX GPLv2 License PDFjet Strange Open Source license model jPod BSD License PDF Renderer Maintaining is not activeiText reviewiText 2.1.7: the latest official release under the MPL & LGPL license; iText 4.2.0: the latest unofficial release under the MPL & LGPL license; iText 5.0.0 and higher: released under the AGPL license.Notes about iText 5.0+ vs iText 4.2 Beginning with iText version 5.0 the developers have moved to the AGPL to improve their ability to sell commercial licenses. .. To assist those desiring to stick with the old license was made the final MPL/LGPL version more easily available and forked on github. Apache™ FOP Apache™ FOP (Formatting Objects Processor) is a print formatter driven by XSL formatting objects (XSL-FO) and an output independent formatter. It is a Java application that reads a formatting object (FO) tree and renders the resulting pages to a specified output. This lib doesn’t have enough flexibility for absolute page element positioning. But, it might be really valuable as content convertor. Apache PDFBox Very interesting project. It has very impressive amount of features. And most important it’s in active development. Summary I’ve selected iText v.4.2 which has acceptable license and huge community. But the most important feature it’s a very good documentation (actually it’s a book iText in Action — 2nd Edition), tons of samples. Almost all samples for iText v.5 can be easily applied to iText v.4.2. Other libraries have not so much samples/demos. And for quick start it’s very important. Here is maven dependency info: com.lowagie itext 4.2.0 PDFBox is selected as backup library. I.e. I will use it when iText has some limitations. Resourceshttp://stackoverflow.com/questions/14213195/itext-latest-maven-dependency http://java-source.net/open-source/pdf-libraries http://javatoolbox.com/categories/pdf  Reference: Java PDF Libraries from our JCG partner Orest Ivasiv at the Knowledge Is Everything blog. ...

Code Ownership – Who Should Own the Code?

A key decision in building and managing any development team is agreeing on how ownership of the code will be divided up: who is going to work on what code; how much work can be, and should be, shared across the team; and who will be responsible for code quality. The approach that you take has immediate impact on the team’s performance and success, and a long-term impact on the shape and quality of the code. Martin Fowler describes three different models for code ownership on a team:      Strong code ownership – every module is owned exclusively by someone, developers can only change the code that they own, and if they need to change somebody else’s code, they need to talk to that owner and get the owner’s agreement first – except maybe in emergencies. Weak code ownership – where modules are still assigned to owners, but developers are allowed to change code owned by other people. Owners are expected to keep an eye on any changes that other people make, and developers are expected to ask for permission first before making changes to somebody else’s code.This can be thought of as a shared custody model, where an individual is forced to share ownership of their code with others; or Code Stewardship, where the team owns all of the code, but one person is held responsible for the quality of specific code, and for helping other people make changes to it, reviewing and approving all major changes, or pairing up with other developers as necessary. Brad Appleton says the job of a code steward is not to make all of the changes to a piece of code, but to “safeguard the integrity + consistency of that code (both conceptually and structurally) and to widely disseminate knowledge and expertise about it to others”. Collective Code Ownership – the code base is owned or shared by the entire team, and everyone is free to make whatever changes they need – or want – to make, including refactoring or rewriting code that somebody else originally wrote. This is a model that came out of Extreme Programming, where the Whole Team is responsible together for the quality and integrity of the code and for understanding and keeping the design.Arguments against Strong/Individual Code Ownership Fowler and other XP advocates such as Kent Beck don’t like strong individual code ownership, because it creates artificial barriers and dependencies inside the team. Work will stall and pause if you need to wait for somebody to make or even approve a change, and one owner can often become the critical path for the entire team. This could encourage developers to come up with their own workarounds and compromises. For example, instead of changing an API properly (which would involve a change to somebody else’s code), they might shoe horn in a change, like stuffing something into an existing field. Or they might take a copy of somebody’s code and add whatever they need to it, making maintenance harder in the future. Other arguments against strong ownership are that it can lead to defensiveness and protectionism on the part of some developers (“hey, don’t touch my code!”), where they take any criticism of the code as a personal attack, creating tension on the team and discouraging reviewers from offering feedback and discouraging refactoring efforts; and local over-optimization, if developers are given too much time to spend to polish and perfect their precious code without thinking of the bigger picture. And of course there is the “hit by a truck factor” to consider – the impact that a person leaving the team will have on productivity if they’re the only one who works on a piece of code. Ward Cunningham. one of the original XPers, also believes that there is more pride of ownership when code is shared, because everyone’s work is always on display to everyone else on the team. Arguments against Collective Code Ownership But there are also arguments against Collective Code Ownership. A post by Mike Spille lists some problems that he has seen when teams try to “over-share” code:Inconsistency. No overriding architecture is discernible, just individual solutions to individual problems. Lots of duplication of effort results, often leading to inconsistent behavior Bugs. People “refactoring” code they don’t really understand break something subtle in the original code. Constant rounds of “The Blame Game”. People have a knee jerk reaction to bugs, saying “It worked when I wrote it, but since Joe refactored it….well, that’s his problem now.”. Slow delivery. Nobody has any expertise in any given domain, so people are spending more time trying to understand other people’s code, less time writing new code.Matthias Friedrich, in Thoughts on Collective Code Ownership believes that Collective Code Ownership can only work if you have the right conditions in place:Team members are all on a similar skill level Programmers work carefully and trust each other The code base is in a good state Unit tests are in place to detect problematic changes (although unit tests only go so far)Remember that Collective Code Ownership came out of Extreme Programming. Successful team ownership depends on everyone sharing an understanding of the domain and the design, and maintaining a high-level of technical discipline: not only writing really good automated tests as a safety net, but everyone following consistent code conventions and standards across the code base, and working in pairs because hopefully one of you knows the code, or at least with two heads you can try to help each other understand it and make fewer mistakes. Another problem with Collective Code Ownership is that ownership is spread so thin. Justin Hewlett talks about the Tragedy of the Commons problem: people will take care of their own yard, but how many people will pick up somebody else’s litter in the park, or on a street – even if they walk in that park or down that street everyday? If the code belongs to everyone, then there is always “someone else” who can take care of it – whoever that “someone else” may be. As a developer, you’re under pressure, and you may never touch this piece of code again, so why not get whatever you need to do as quickly as possible and get on to the next thing on your list, and let “somebody else” worry about refactoring or writing that extra unit test or…? Code Ownership in the Real World I’ve always worked on or with teams that follow individual (strong or weak) code ownership, except for an experiment in pure XP and Collective Code Ownership on one team over 10 years ago. One (or maybe two) people own different pieces of the code and do all or most of the heavy lifting work on that code. Because it only makes sense to have the people who understand the code best do most of the work, or the most important work. It’s not just because you want the work “done right” – sometimes you don’t really have a choice over who is going to do the work. As Ralf Sudelbucher points out, Collective Code ownership assumes that all coding work is interchangeable within a team, which is not always true. Some work isn’t interchangeable because of technology: different parts of a system can be written in different languages, with different architectures. You have to learn the language and the framework before you can start to understand the other problems that need to be solved. Or it might be because of the problem space. Sure, there is always coding on any project that is “just typing”: journeyman work that is well understood, like scaffolding work or writing another web form or another CRUD screen or fixing up a report or converting a file format, work that has to be done and can be taken on by anyone who has been on the team for a while and who understands where to find stuff and how things are done – or who pairs up with somebody who knows this. But other software development involves solving hard domain problems and technical problems that require a lot of time to understand properly – where it can take days, weeks, months or sometimes even years to immerse yourself in the problem space well enough to know what to do, where anyone can’t just jump in and start coding, or even be of much help in a pair programming situation.The worst disasters occur when you turn loose sorcerers’ apprentices on code they don’t understand. In a typical project, not everyone can know everything – except in some mature domains where there have been few business paradigm shifts in the past decade or two. Jim Coplien, Code Ownership I met someone who manages software development for a major computer animation studio. His team has a couple of expert developers who did their PHDs and post grad work in animating hair – that’s all that they do, and even if you are really smart you’ll need years of study and experience just to understand how they do what they do. Lots of scientific and technical engineering domains are also like this – maybe not so deeply specialized, but they involve non-trivial work that can’t be easily or competently done by generalists, even competent generalists. Programming medical devices or avionics or robotics or weapons control; or any business domain where you are working at the leading edge of problem solving, applying advanced statistical models to big data analysis or financial trading algorithms or risk-management models; or supercomputing and high-scale computing and parallel programming, or writing an operating system kernel or solving cryptography problems or doing a really good job of User Experience (UX) design. Not everyone understands the problems that need to be solved, not everyone cares about the problems and not everyone can do a good job of solving them. Ownership and Doing it Right If you want the work done right, or need it to be done right the first time, it should be done by someone who has worked on the code before, who knows it and who has proven that they can get the job done. Not somebody who has only a superficial familiarity with the code. Research work by Microsoft and others have shown that as more people touch the same piece of code, there is more chance of misunderstandings and mistakes – and that the people who have done the most work on a piece of code are the ones who make the fewest mistakes. Fowler comes back to this in a later post about “Shifting to Code Ownership” where he shares a story from a colleague who shifted a team from collective code ownership to weak individual code ownership because weaker or less experienced programmers were making mistakes in core parts of the code and impacting quality, velocity and the team’s morale. They changed their ownership model so anyone could work around the code base, but if they needed to change core code, they had to do this with the help of someone who knew that part of the code well. In deciding on an an ownership approach, you have to make a trade-off between flexibility and quality, team ownership and individual ownership. With individual ownership you can have siloing problems and dependencies on critical people, and you’ll have to watch out for trucks. But you can get more done, faster, better and by fewer people.   Reference: Code Ownership – Who Should Own the Code? from our JCG partner Jim Bird at the Building Real Software blog. ...

JPA – Should I become a laziness extremist?

When you spoke with the Developers about mapping objects to relational databases, they very often complain about poor JPA performance, unpredictable behavior of JPA Providers, etc. Usually at some point of the conversation you will hear: “Let’s drop this technology at all, we’ve seen something much better on the conference last month. We will use it in our projects instead of JPA and develop them happily ever after.” – Sounds familiar? It’s nothing wrong in learning new technologies, in fact you should do it constantly, to improve your skills and knowledge, but when you have problems with one of them, will you choose an easy path to another technology, or ask yourself: “Am I using it in a right way?” Let’s look at the JPA usage example. Suppose that we have simple database, mapped to the entities:and we have to display all employee names, regardless their employer (and department). Nothing easier – simple JPQL query will do that: select employee from Employee employee order by employee.name Many developers finish at this point, and celebrate with Friends another successful JPQL query in their life, but some of us have this strange feeling, that something creepy is lurking beneath the shiny surface. SQL queries produced by the JPA provider (ex. Hibernate) will reveal the truth: select [...] from EMPLOYEE employee0_ order by employee0_.EMPLOYEE_NAMENothing special, so far , but here comes the naked truth:select [...] from DEPARTMENT department0_ left outer join EMPLOYER employer1_ on department0_.EMPLOYER_ID=employer1_.EMPLOYER_ID where department0_.DEPARTMENT_ID=?select [...] from EMPLOYER employer0_ where employer0_.EMPLOYER_ID=?select [...] from DEPARTMENT department0_ left outer join EMPLOYER employer1_ on department0_.EMPLOYER_ID=employer1_.EMPLOYER_ID where department0_.DEPARTMENT_ID=?select [...] from DEPARTMENT department0_ left outer join EMPLOYER employer1_ on department0_.EMPLOYER_ID=employer1_.EMPLOYER_ID where department0_.DEPARTMENT_ID=?select [...] from DEPARTMENT department0_ left outer join EMPLOYER employer1_ on department0_.EMPLOYER_ID=employer1_.EMPLOYER_ID where department0_.DEPARTMENT_ID=? What the heck?! What are these queries for?! – Well the reason lies in default fetch attribute values for @ManyToOne annotations, which is EAGER. My database holds 2 Employers, one of them has 4 Departments, while second one hasn’t any. When the Employee is loaded, JPA provider loads by default all EAGER associations (in our case both Department, and Employer), thus we have the additional queries. As you see above the JPA provider is clever enough to load both Employer and Department at once, when it is possible. You’ve just found magical JPQL query fetching all the database content at once . Does this situation remind you something in the past?  What can we do about it? – My Friend, all you need is a laziness  – Don’t use EAGER unless it is REALLY needed (and remember that @ManyToOne and @OneToOne annotations use it by default). You may call me a lunatic, or laziness extremist at this point and ask: Have you ever encountered LazyInitializationException, Bro!? Have you heard of all the mess with lazy loading problems!? Performance degradation, etc. … Of course I did, but don’t you think that if we are getting in such troubles with JPA, maybe we use it in a wrong way?! What we do usually in Web Applications is presenting or editing some data on UI, and usually it is only small subset of specific entities’ properties. Doing it requires fetching the entities tree from the database – without batting an eye, we ask Entity Manager: give me all Employees, sorted by name, with all related entities, and then complain on degraded performance! We don’t care what we fetch from the database, because Entity Manager will do the donkey work for us. We get LazyInitializationException, so what! We will use Open Entity Manager in View pattern, and silence this stupid exception! Give a me a break! Don’t you think it’s a dead end? – It’s about time to change something. There are sophisticated methods which you can use in your projects, like CQRS for example, along with possibilities already existing in JPA, which can help you change the bad manners described by me in this post. Few links for the dessert:CQRS info Martin Fowler’s article on CQRS  Reference: JPA – Should I become a laziness extremist? from our JCG partner Michal Jastak at the Warlock’s Thoughts blog. ...
Java Code Geeks and all content copyright © 2010-2015, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: