Featured FREE Whitepapers

What's New Here?

career-logo

Programming Language Job Trends Part 3 – August 2014

After a slight delay we finally get to the third part of the programming language job trends. Today we review Erlang, Groovy, Scala, Lisp, and Clojure. If you do not see some of the more popular languages, take a look at Part 1 and Part 2. Lisp is included almost as a baseline, because it has had sustained usage for decades but never enough to get into the mainstream. Go and Haskell are still not included due to the noise in the data and the current lack of demand. Most likely, Go will be included in the next update assuming I can craft a good query for it. If there is an emerging language that you think should be included, please let me know in the comments. To start, we look at the long term trends from Indeed.com:      Much like the previous two parts of this series, there is a definite downward trend for this group of languages. The trend is not nearly as negative as the previous two posts, but it is there. Groovy demand seems to be a bit cyclical, with new peaks every few months, and it still leads this pack. Scala has followed the same general trend as Groovy and keeping a large lead on the rest of the pack. Clojure has stayed fairly flat for the past two years and that allowed it to take a lead over Erlang. Erlang has slowly declined since its peak in early 2011, barely maintaining a lead over Lisp. Lisp is in a very slow decline over the past 4 years. Unlike the previous two parts, the short-term trends from SimplyHired.com provide decent data:  As you can see, SimplyHired is showing similar downward trends to Indeed for Groovy and Scala. However, the Clojure, Erlang and Lisp trends look much flatter. Clojure has been leading Erlang and Lisp since the middle of 2013 and looks to be increasing its lead while the others decline. Here, Lisp is in a flatter trend which lets it overtake Erlang in the past few months. Erlang seems to be in a bit of a lull after a slight rise at the beginning of 2014. Lastly, we look at the relative growth from Indeed.com:  Groovy maintains a very high growth trend, but it is definitely lessening in 2014. Scala is showing very strong growth at just over 10,000% and being somewhat flat overall since late 2011. Much like the other graphs, Clojure growth is outpacing Erlang, sitting above 5000%. Erlang growth is on a negative trend, falling below 500% for the first time since early 2011. Lisp is basically not registering on this graph as it is not really growing, staying barely positive. While most of these languages continue to grow, the trends for Erlang are not a good thing. Steadily decreasing growth for the past 3 years points to a language that will eventually become niche. Given that the overall demand was never that high, future prospects for Erlang are unpleasant. Overall, these trends and the previous two installments make industry growth look flat. Granted, much of this is due to the breadth of languages being used, but even emerging languages are not seeing the same type of increasing growth. If you look at languages like Go and Haskell, the trends are not that much better. Go is definitely growing but Haskell is not. It is possible that both of these languages get included in our next update. Clojure growth is definitely interesting as it seems to be one of the few positive trends in all of the job trends. I would not be surprised if Clojure starts separating itself from the bottom pack before the next update.Reference: Programming Language Job Trends Part 3 – August 2014 from our JCG partner Rob Diana at the Regular Geek blog....
java-logo

Memory leaks – measuring frequency and severity

This post is part of our open culture – we continue sharing insights from our day-to-day work. This time we take a peek at the very core of our value proposition, namely – looking for the answer to these questions:How often do memory leaks occur in Java applications? How big is a memory leak? How quickly does a memory leak grow?If you stay with me for the next couple of minutes, I will open up the answers one by one, based on the data gathered by Plumbr memory leak detector agents throughout the last ~six months. First and foremost, the analysis is based on 2,180 different applications running with Plumbr Agents. The definition of a “different application” is somewhat tricky and I spare you the mundane details, but we did our best to identify a unique JVM based on the data available. In these 2,180 applications Plumbr found 754 different heap memory leaks. As some applications contained several memory leaks, the number of unique applications where a leak was detected was a bit lower – 682 to be precise. Based on this data, we can conclude that 31% of the Java applications contain a heap memory leak. Take this with a grain of salt – we do admit the fact that the applications Plumbr ends up monitoring are more likely to contain a memory leak than the ones we do not monitor. Now, knowing that you have roughly one in three chances of having a heap memory leak in your application, lets see whether you should be worried about the leaks at all. For this, lets look at two different characteristics we have for these 754 heap memory leaks. Memory leak size When Plumbr finds a memory leak, it runs a complex calculation to determine the retained size of the leak. Or, in more simpler way – Plumbr calculates how big is the particular leak in megabytes. This data is visible in the following chart:From the data we can see that Plumbr detects many leaks at their infancy – for example it has found 187 leaks (25% of total leaks) while the leak was still smaller than 1MB at the time of discovery. In the other extreme, some leaks take longer to detect, so in 31 cases the leak was detected only after it had grown to 1GB. The biggest leaks had managed to escalate to 3GB in size before detection. Another interesting conclusion to draw from the above is that majority of the leaks get caught by Plumbr before the application’s end users feel any impact - 70% of the leaks are still smaller than 100MB at the time Plumbr reports the leak as an incident. Memory leak velocity Now, the fact that an application contains a leak occupying less than 100MB is not something to take action upon. Coupling the size of the leak with the velocity of the leak, the severity of the incident becomes more clear:The information on the above chart can be interpreted this way: for 6% (37 occurrences) of the cases the leak velocity at the time of discovery was between 100 and 500 MB/hour. In the extreme cases, we have either very slow or extremely fast leaks. On 398 occasions (53% of the leaks discovered) the leak was escalating at the pace of 1MB per hour or less. At the other end of the spectrum we had 31 leaks escalating at mind-boggling 1GB/hour or faster. The “record holder” in this regard managed to leak more than 3GB per hour. Coupling velocity information with current leak size and maximum heap available to your application, you can estimate the amount of time the particular application has left before crashing with the OutOfMemoryError. One specific example from last Friday: Plumbr reported an incident where the leak size was 120MB. The velocity of the leak was a modest 160MB/day. Linking this information with the current heap usage and maximal heap available, we could predict that the particular JVM would be dead by Tuesday 2PM. We were wrong by six hours, which, if taking into account that the application usage patterns tend to change over time is part of the prediction game, is close enough a prediction.Reference: Memory leaks – measuring frequency and severity from our JCG partner Sergei Mihhailov at the Plumbr Blog blog....
java-logo

Garbage Collectors – Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8)

The 4 Java Garbage Collectors – How the Wrong Choice Dramatically Impacts Performance The year is 2014 and there are two things that still remain a mystery to most developers – Garbage collection and understanding the opposite sex. Since I don’t know much about the latter, I thought I’d take a whack at the former, especially as this is an area that has seen some major changes and improvements with Java 8, especially with the removal of the PermGen and some new and exciting optimizations (more on this towards the end). When we speak about garbage collection, the vast majority of us know the concept and employ it in our everyday programming. Even so, there’s much about it we don’t understand, and that’s when things get painful. One of the biggest misconceptions about the JVM is that it has one garbage collector, where in fact it provides four different ones, each with its own unique advantages and disadvantages. The choice of which one to use isn’t automatic and lies on your shoulders and the differences in throughput and application pauses can be dramatic. What’s common about these four garbage collection algorithms is that they are generational, which means they split the managed heap into different segments, using the age-old assumptions that most objects in the heap are short lived and should be recycled quickly. As this too is a well-covered area, I’m going to jump directly into the different algorithms, along with their pros and their cons. 1. The Serial Collector The serial collector is the simplest one, and the one you probably won’t be using, as it’s mainly designed for single-threaded environments (e.g. 32 bit or Windows) and for small heaps. This collector freezes all application threads whenever it’s working, which disqualifies it for all intents and purposes from being used in a server environment. How to use it: You can use it by turning on the -XX:+UseSerialGC JVM argument, 2. The Parallel / Throughput collector Next off is the Parallel collector. This is the JVM’s default collector. Much like its name, its biggest advantage is that is uses multiple threads to scan through and compact the heap. The downside to the parallel collector is that it will stop application threads when performing either a minor or full GC collection. The parallel collector is best suited for apps that can tolerate application pauses and are trying to optimize for lower CPU overhead caused by the collector. 3. The CMS Collector Following up on the parallel collector is the CMS collector (“concurrent-mark-sweep”). This algorithm uses multiple threads (“concurrent”) to scan through the heap (“mark”) for unused objects that can be recycled (“sweep”). This algorithm will enter “stop the world” (STW) mode in two cases: when initializing the initial marking of roots (objects in the old generation that are reachable from thread entry points or static variables) and when the application has changed the state of the heap while the algorithm was running concurrently, forcing it to go back and do some final touches to make sure it has the right objects marked. The biggest concern when using this collector is encountering promotion failures which are instances where a race condition occurs between collecting the young and old generations. If the collector needs to promote young objects to the old generation, but hasn’t had enough time to make space clear it,  it will have to do so first which will result in a full STW collection – the very thing this CMS collector was meant to prevent. To make sure this doesn’t happen you would either increase the size of the old generation (or the entire heap for that matter) or allocate more background threads to the collector for him to compete with the rate of object allocation. Another downside to this algorithm in comparison to the parallel collector is that it uses more CPU in order to provide the application with higher levels of continuous throughput, by using multiple threads to perform scanning and collection. For most long-running server applications which are adverse to application freezes, that’s usually a good trade off to make. Even so, this algorithm is not on by default. You have to specify XX:+USeParNewGC to actually enable it. If you’re willing to allocate more CPU resources to avoid application pauses this is the collector you’ll probably want to use, assuming that your heap is less than 4Gb in size.  However, if it’s greater than 4GB, you’ll probably want to use the last algorithm – the G1 Collector. 4. The G1 Collector The Garbage first collector (G1) introduced in JDK 7 update 4 was designed to better support heaps larger than 4GB. The G1 collector utilizes multiple background threads to scan through the heap that it divides into regions, spanning from 1MB to 32MB (depending on the size of your heap). G1 collector is geared towards scanning those regions that contain the most garbage objects first, giving it its name (Garbage first). This collector is turned on using the –XX:+UseG1GC flag. This strategy the chance of the heap being depleted before background threads have finished scanning for unused objects, in which case the collector will have to stop the application which will result in a STW collection. The G1 also has another advantage that is that it compacts the heap on-the-go, something the CMS collector only does during full STW collections. Large heaps have been a fairly contentious area over the past few years with many developers moving away from the single JVM per machine model to more micro-service, componentized architectures with multiple JVMs per machine. This has been driven by many factors including the desire to isolate different application parts, simplifying deployment and avoiding the cost which would usually come with reloading application classes into memory (something which has actually been improved in Java 8). Even so, one of the biggest drivers to do this when it comes to the JVM stems from the desire to avoid those long “stop the world” pauses (which can take many seconds in a large collection) that occur with large heaps. This has also been accelerated by container technologies like Docker that enable you to deploy multiple apps on the same physical machine with relative ease. Java 8 and the G1 Collector Another beautiful optimization which was just out with Java 8 update 20 for is the G1 Collector String deduplication. Since strings (and their internal char[] arrays) takes much of our heap, a new optimization has been made that enables the G1 collector to identify strings which are duplicated more than once across your heap and correct them to point into the same internal char[] array, to avoid multiple copies of the same string from residing inefficiently within the heap. You can use the -XX:+UseStringDeduplicationJVM argument to try this out. Java 8 and PermGen One of the biggest changes made in Java 8 was removing the permgen part of the heap that was traditionally allocated for class meta-data, interned strings and static variables. This would traditionally require developers with applications that would load significant amount of classes (something common with apps using enterprise containers) to optimize and tune for this portion of the heap specifically. This has over the years become the source of many OutOfMemory exceptions, so having the JVM (mostly) take care if it is a very nice addition. Even so, that in itself will probably not reduce the tide of developers decoupling their apps into multiple JVMs. Each of these collectors is configured and tuned differently with a slew of toggles and switches, each with the potential to increase or decrease throughput, all based on the specific behavior of your app. We’ll delve into the key strategies of configuring each of these in our next posts. In the meanwhile, what are the things you’re most interested in learning about regarding the differences between the different collectors? Hit me up in the comments section! Additional readingA really great in-depth review of the G1 Collector on InfoQ. Java performance – The definitive guide. My favorite book on Java performance. More about String deduplication on the CodeCentric blog.Reference: Garbage Collectors – Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8) from our JCG partner Tal Weiss at the Takipi blog....
java-logo

Creating an object stream from a JDBC ResultSet

The introduction of  features Stream API and Lambda in Java 8 enables us to make an elegant conversion from a JDBC ResultSet to a stream of objects just providing a mapping function. Such function could be, of course, a lambda. Basically, the idea is to generate a Stream using a ResultSet as Supplier:               public class ResultSetSupplier implements Supplier<T>{private final ResultSet rs; private final Function<ResultSet, T> mappingFunction;private ResultSetSupplier(ResultSet rs, Function<ResultSet, T> mappingFunction) { this.rs = rs; this.mappingFunction = mappingFunction; }@Override public T get() { try { if (rs.next()) return mappingFunction.apply(rs); } catch (SQLException e) { e.printStackTrace(); } return null; } } Parameter mappingFunction, which might be a lambda expression, is used to build T instances from a ResultSet. Just like ActiveRecord pattern, every row in such ResultSet maps to an instance of T, where columns are attributes of T. Let’s consider class City: public class City{ String city; String country; public City(String city, String country) { this.city = city; this.country = country; } public String getCountry() { return country; } @Override public String toString() { return "City [city=" + city + ", country=" + country + ";]"; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((city == null) ? 0 : city.hashCode()); result = prime * result + ((country == null) ? 0 : country.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; City other = (City) obj; if (city == null) { if (other.city != null) return false; } else if (!city.equals(other.city)) return false; if (country == null) { if (other.country != null) return false; } else if (!country.equals(other.country)) return false; return true; } } The mapping function for City objects could be a lambda expression like the following: (ResultSet rs) -> { try { return new City(rs.getString("city"), rs.getString("country")); } catch (Exception e) { return null; }} We have assumed database columns are called city and country, respectively. Although both PreparedStatement and ResultSet implement AutoCloseable interface, as a  resultSet must be provided to create the object stream, it does make sense to close such resultSet when the stream is closed as well. A possible approach could be to use a proxy to intercept method invocation on the object stream. Thus, as close() method is invoked on the proxy, it will invoke close() on the provided resultSet. All method invocations will be invoked on the object stream as well, in order to be able to provide all Stream features. That is easy to achieve using a proxy. Let’s have a look. We will have a proxy factory and a invocation handler: public class ResultSetStreamInvocationHandler<T> implements InvocationHandler{private Stream<T> stream; // proxy will intercept method calls to such stream private PreparedStatement st; private ResultSet rs;public void setup(PreparedStatement st, Function<ResultSet, T> mappingFunction) throws SQLException{ // PreparedStatement must be already setup in order // to just call executeQuery() this.st = st; rs = st.executeQuery(); stream = Stream.generate(new ResultSetSupplier(rs, mappingFunction)); }@Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {if (method == null) throw new RuntimeException("null method null");// implement AutoCloseable for PreparedStatement // as calling close() more than once has no effects if (method.getName().equals("close") && args == null){ // invoked close(), no arguments if (st != null){ st.close(); // closes ResultSet too } }return method.invoke(stream, args); }private class ResultSetSupplier implements Supplier<T>{private final ResultSet rs; private final Function<ResultSet, T> mappingFunction;private ResultSetSupplier(ResultSet rs, Function<ResultSet, T> mappingFunction) { this.rs = rs; this.mappingFunction = mappingFunction; }@Override public T get() { try { if (rs.next()) return mappingFunction.apply(rs); } catch (SQLException e) { e.printStackTrace(); } return null; } }} Please note how invoke is used to intercept method calls. In case close() is called, close() is called on PreparedStatement as well. For every method called, the corresponding method call is invoked in the stream being proxied. And the factory: public class ResultSetStream<T>{@SuppressWarnings("unchecked") public Stream<T> getStream(PreparedStatement st, Function<ResultSet, T> mappingFunction) throws SQLException{ final ResultSetStreamInvocationHandler<T> handler = new ResultSetStreamInvocationHandler<T>(); handler.setup(st, mappingFunction); Stream<T> proxy = (Stream<T>) Proxy.newProxyInstance(getClass().getClassLoader(), new Class<?>[] {Stream.class}, handler); return proxy; } } To put it all together, let’s write a simple test to show usage. Mockito will be used to mock both PreparedStatement and ResultSet to avoid running tests against a real database. public class ResultSetStreamTest {private class City{ String city; String country; public City(String city, String country) { this.city = city; this.country = country; } public String getCountry() { return country; } @Override public String toString() { return "City [city=" + city + ", country=" + country + "]"; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + getOuterType().hashCode(); result = prime * result + ((city == null) ? 0 : city.hashCode()); result = prime * result + ((country == null) ? 0 : country.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; City other = (City) obj; if (!getOuterType().equals(other.getOuterType())) return false; if (city == null) { if (other.city != null) return false; } else if (!city.equals(other.city)) return false; if (country == null) { if (other.country != null) return false; } else if (!country.equals(other.country)) return false; return true; } private ResultSetStreamTest getOuterType() { return ResultSetStreamTest.this; } }private String[][] data = new String[][]{ {"Karachi", "Pakistan"}, {"Istanbul", "Turkey"}, {"Hong Kong", "China"}, {"Saint Petersburg", "Russia"}, {"Sydney", "Australia"}, {"Berlin", "Germany"}, {"Madrid", "Spain"} };private int timesCalled; private PreparedStatement mockPST; private ResultSet mockRS;@Before public void setup() throws SQLException{ timesCalled = -1; mockRS = mock(ResultSet.class); mockPST = mock(PreparedStatement.class);when(mockRS.next()).thenAnswer(new Answer<Boolean>() {@Override public Boolean answer(InvocationOnMock invocation) throws Throwable { if (timesCalled++ >= data.length) return false; return true; } });when(mockRS.getString(eq("city"))).thenAnswer(new Answer<String>() {@Override public String answer(InvocationOnMock invocation) throws Throwable { return data[timesCalled][0]; } }); when(mockRS.getString(eq("country"))).thenAnswer(new Answer<String>() {@Override public String answer(InvocationOnMock invocation) throws Throwable { return data[timesCalled][1]; } });when(mockPST.executeQuery()).thenReturn(mockRS); }@Test public void simpleTest() throws SQLException{try (Stream<City> testStream = new ResultSetStream<City>().getStream(mockPST, (ResultSet rs) -> {try { return new City(rs.getString("city"), rs.getString("country")); } catch (Exception e) { return null; }})){Iterator<City> cities = testStream.filter( city -> !city.getCountry().equalsIgnoreCase("China")) .limit(3).iterator();assertTrue(cities.hasNext()); assertEquals(new City("Karachi", "Pakistan"), cities.next());assertTrue(cities.hasNext()); assertEquals(new City("Istanbul", "Turkey"), cities.next());assertTrue(cities.hasNext()); assertEquals(new City("Saint Petersburg", "Russia"), cities.next());assertFalse(cities.hasNext()); }}}Download full source code on Github.Reference: Creating an object stream from a JDBC ResultSet from our JCG partner Sergio Molina at the TODOdev blog....
enterprise-java-logo

Simple Aspect Oriented Programming (AOP) using CDI in JavaEE

We write service APIs which cater to certain business logic. There are few cross-cutting concerns that cover all service APIs like Security, Logging, Auditing, Measuring Latencies and so on. This is a repetitive non-business code which can be reused among other methods. One way to reuse is to move these repetitive code into its own methods and invoke them in the service APIs somethings like:             public class MyService{ public ServiceModel service1(){ isAuthorized(); //execute business logic. } }public class MyAnotherService{ public ServiceModel service1(){ isAuthorized(): //execute business logic. } } The above approach will work but not without creating code noise, mixing cross-cutting concerns with the business logic. There is another approach to solve the above requirements which is by using Aspect and this approach is called Aspect Oriented Programming (AOP). There are a different ways you can make use of AOP – by using Spring AOP, JavaEE AOP. In this example I Will try to use AOP using CDI in Java EE applications. To explain this I have picked a very simple example of building a web application to fetch few records from Database and display in the browser. Creating the Data access layer The table structure is: create table people( id INT NOT NULL AUTO_INCREMENT, name varchar(100) NOT NULL, place varchar(100), primary key(id)); Lets create a Model class to hold a person information package demo.model; public class Person{ private String id; private String name; private String place; public String getId(){ return id; } public String setId(String id) { this.id = id;} public String getName(){ return name; } public String setName(String name) { this.name = name;} public String getPlace(){ return place; } public String setPlace(String place) { this.place = place;} } Lets create a Data Access Object which exposes two methods –to fetch the details of all the people to fetch the details of one person of given idpackage demo.dao;import demo.common.DatabaseConnectionManager; import demo.model.Person; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.ArrayList; import java.util.List;public class PeopleDAO {public List<Person> getAllPeople() throws SQLException { String SQL = "SELECT * FROM people"; Connection conn = DatabaseConnectionManager.getConnection(); List<Person> people = new ArrayList<>(); try (Statement statement = conn.createStatement(); ResultSet rs = statement.executeQuery(SQL)) { while (rs.next()) { Person person = new Person(); person.setId(rs.getString("id")); person.setName(rs.getString("name")); person.setPlace(rs.getString("place")); people.add(person); } } return people; }public Person getPerson(String id) throws SQLException { String SQL = "SELECT * FROM people WHERE id = ?"; Connection conn = DatabaseConnectionManager.getConnection(); try (PreparedStatement ps = conn.prepareStatement(SQL)) { ps.setString(1, id); try (ResultSet rs = ps.executeQuery()) { if (rs.next()) { Person person = new Person(); person.setId(rs.getString("id")); person.setName(rs.getString("name")); person.setPlace(rs.getString("place")); return person; } } }return null; } } You can use your own approach to get a new Connection. In the above code I have created a static utility that returns me the same connection. Creating Interceptors Creating Interceptors involves 2 steps:Create Interceptor binding which creates an annotation annotated with @InterceptorBinding that is used to bind the interceptor code and the target code which needs to be intercepted. Create a class annotated with @Interceptor which contains the interceptor code. It would contain methods annotated with @AroundInvoke, different lifecycle annotations, @AroundTimeout and others.Lets create an Interceptor binding by name @LatencyLogger package demo;import java.lang.annotation.Target; import java.lang.annotation.Retention; import static java.lang.annotation.RetentionPolicy.*; import static java.lang.annotation.ElementType.*; import javax.interceptor.InterceptorBinding;@InterceptorBinding @Retention(RUNTIME) @Target({METHOD, TYPE}) public @interface LatencyLogger { } Now we need to create the Interceptor code which is annotated with @Interceptor and also annotated with the Interceptor binding we created above i.e @LatencyLogger: package demo; import java.io.Serializable; import javax.interceptor.AroundInvoke; import javax.interceptor.Interceptor; import javax.interceptor.InvocationContext;@Interceptor @LatencyLogger public class LatencyLoggerInterceptor implements Serializable{ @AroundInvoke public Object computeLatency(InvocationContext invocationCtx) throws Exception{ long startTime = System.currentTimeMillis(); //execute the intercepted method and store the return value Object returnValue = invocationCtx.proceed(); long endTime = System.currentTimeMillis(); System.out.println("Latency of " + invocationCtx.getMethod().getName() +": " + (endTime-startTime)+"ms"); return returnValue; } } There are two interesting things in the above code:use of @AroundInvoke parameter of type InvocationContext passed to the method@AroundInvoke designates the method as an interceptor method. An Interceptor class can have only ONE method annotated with this annotation. When ever a target method is intercepted, its context is passed to the interceptor. Using the InvocationContext one can get the method details, the parameters passed to the method. We need to declare the above Interceptor in the WEB-INF/beans.xml file <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/beans_1_1.xsd" bean-discovery-mode="all"> <interceptors> <class>demo.LatencyLoggerInterceptor</class> </interceptors> </beans> Creating Service APIs annotated with Interceptors We have already created the Interceptor binding and the interceptor which gets executed. Now lets create the Service APIs and then annotate them with the Interceptor binding /* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package demo.service;import demo.LatencyLogger; import demo.dao.PeopleDAO; import demo.model.Person; import java.sql.SQLException; import java.util.List; import javax.inject.Inject;public class PeopleService {@Inject PeopleDAO peopleDAO;@LatencyLogger public List<Person> getAllPeople() throws SQLException { return peopleDAO.getAllPeople(); }@LatencyLogger public Person getPerson(String id) throws SQLException { return peopleDAO.getPerson(id); }} We have annotated the service methods with the Interceptor binding @LatencyLogger. The other way would be to annotate at the class level which would then apply the annotation to all the methods of the class. Another thing to notice is the @Inject annotation that injects the instance i.e injects the dependency into the class. Next is to wire up the Controller and View to show the data. The controller is the servlet and view is a plain JSP using JSTL tags. /* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package demo;import demo.model.Person; import demo.service.PeopleService; import java.io.IOException; import java.sql.SQLException; import java.util.List; import java.util.logging.Level; import java.util.logging.Logger; import javax.inject.Inject; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse;@WebServlet(name = "AOPDemo", urlPatterns = {"/AOPDemo"}) public class AOPDemoServlet extends HttpServlet {@Inject PeopleService peopleService;@Override public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { try { List<Person> people = peopleService.getAllPeople(); Person person = peopleService.getPerson("2"); request.setAttribute("people", people); request.setAttribute("person", person); getServletContext().getRequestDispatcher("/index.jsp").forward(request, response); } catch (SQLException ex) { Logger.getLogger(AOPDemoServlet.class.getName()).log(Level.SEVERE, null, ex); } } } The above servlet is available at http://localhost:8080/ /AOPDemo. It fetches the data and redirects to the view to display the same. Note that the Service has also been injected using @Inject annotation. If the dependencies are not injected and instead created using new then the Interceptors will not work. This is an important point which I realised while building this sample. The JSP to render the data would be <%@page contentType="text/html" pageEncoding="UTF-8"%> <%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %> <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>AOP Demo</title> </head> <body> <h1>Hello World!</h1> <table> <tr> <th>Id</th> <th>Name</th> <th>Place</th> </tr> <c:forEach items="${requestScope.people}" var="person"> <tr> <td><c:out value="${person.id}"/></td> <td><c:out value="${person.name}"/></td> <td><c:out value="${person.place}"/></td> </tr> </c:forEach> </table> <br/> Details for person with id=2 <c:out value="Name ${person.name} from ${person.place}" /> </body> </html> With this you would have built a very simple app using Interceptors. Thanks for reading and staying with me till this end. Please share your queries/feedback as comments. And also share this article among your friends!Reference: Simple Aspect Oriented Programming (AOP) using CDI in JavaEE from our JCG partner Mohamed Sanaulla at the Experiences Unlimited blog....
enterprise-java-logo

JPA Hibernate Alternatives. What can I use if JPA or Hibernate is not good enough for my project?

Hello, how are you? Today we will talk about situations that the use of the JPA/Hibernate is not recommended. Which alternatives do we have outside the JPA world? What we will talk about:            JPA/Hibernate problems Solutions to some of the JPA/Hibernate problems Criteria for choosing the frameworks described here Spring JDBC Template MyBatis Sormula sql2o Take a look at: jOOQ and Avaje Is a raw JDBC approach worth it? How can I choose the right framework? Final thoughtsI have created 4 CRUDs in my github using the frameworks mentioned in this post, you will find the URL at the beginning of each page. I am not a radical that thinks that JPA is worthless, but I do believe that we need to choose the right framework for the each situation. If you do not know I wrote a JPA book (in Portuguese only) and I do not think that JPA is the silver bullet that will solve all the problems. JPA/Hibernate problems There are times that JPA can do more harm than good. Below you will see the JPA/Hibernate problems and in the next page you will see some solutions to these problems:Composite Key: This, in my opinion, is the biggest headache of the JPA developers. When we map a composite key we are adding a huge complexity to the project when we need to persist or find a object in the database. When you use composite key several problems will might happen, and some of these problems could be implementation bugs. Legacy Database: A project that has a lot of business rules in the database can be a problem when wee need to invoke StoredProcedures or Functions. Artifact size: The artifact size will increase a lot if you are using the Hibernate implementation. The Hibernate uses a lot of dependencies that will increase the size of the generated jar/war/ear. The artifact size can be a problem if the developer needs to do a deploy in several remote servers with a low Internet band (or a slow upload). Imagine a project that in each new release it is necessary to update 10 customers servers across the country. Problems with slow upload, corrupted file and loss of Internet can happen making the dev/ops team to lose more time. Generated SQL: One of the JPA advantages is the database portability, but to use this portability advantage you need to use the JPQL/HQL language. This advantage can became a disadvantage when the generated query has a poor performance and it does not use the table index that was created to optimize the queries. Complex Query: That are projects that has several queries with a high level of complexity using database resources like: SUM, MAX, MIN, COUNT, HAVING, etc. If you combine those resources the JPA performance might drop and not use the table indexes, or you will not be able to use a specific database resource that could solve this problem. Framework complexity: To create a CRUD with JPA is very simples, but problems will appear when we start to use entities relationships, inheritance, cache, PersistenceUnit manipulation, PersistenceContext with several entities, etc. A development team without a developer with a good JPA experience will lose a lot of time with JPA ‘rules‘. Slow processing and a lot of RAM memory occupied: There are moments that JPA will lose performance at report processing, inserting a lot of entities or problems with a transaction that is opened for a long time.After reading all the problems above you might be thinking: “Is JPA good in doing anything?”. JPA has a lot of advantages that will not be detailed here because this is not the post theme, JPA is a tool that is indicated for a lot of situations. Some of the JPA advantages are: database portability, save a lot of the development time, make easier to create queries, cache optimization, a huge community support, etc. In the next page we will see some solutions for the problems detailed above, the solutions could help you to avoid a huge persistence framework refactoring. We will see some tips to fix or to workaround the problems described here. Solutions to some of the JPA/Hibernate problems We need to be careful if we are thinking about removing the JPA of our projects. I am not of the developer type that thinks that we should remove a entire framework before trying to find a solution to the problems. Some times it is better to choose a less intrusive approach. Composite Key Unfortunately there is not a good solution to this problem. If possible, avoid the creation of tables with composite key if it is not required by the business rules. I have seen developers using composite keys when a simple key could be applied, the composite key complexity was added to the project unnecessarily. Legacy Databases The newest JPA version (2.1) has support to StoredProcedures and Functions, with this new resource will be easier to communicate with the database. If a JPA version upgrade is not possible I think that JPA is not the best solution to you. You could use some of the vendor resources, e.g. Hibernate, but you will lose database and implementations portability. Artifact Size An easy solution to this problem would be to change the JPA implementation. Instead of using the Hibernate implementation you could use the Eclipsellink, OpenJPA or the Batoo. A problem might appear if the project is using Hibernate annotation/resources; the implementation change will require some code refactoring. Generated SQL and Complexes Query The solution to these problems would be a resource named NativeQuery. With this resource you could have a simplified query or optimized SQL, but you will sacrifice the database portability. You could put your queries in a file, something like SEARCH_STUDENTS_ORACLE or SEARCH_STUDENTS_MYSQL, and in production environment the correct file would be accessed. The problem of this approach is that the same query must be written for every database. If we need to edit the SEARCH_STUDENTS query, it would be required to edit the oracle and mysql files. If your project is has only one database vendor the NativeQuery resource will not be a problem. The advantage of this hybrid approach (JPQL and NativeQuery in the same project) is the possibility of using the others JPA advantages. Slow Processing and Huge Memory Size This problem can be solved with optimized queries (with NativeQuery), query pagination and small transactions. Avoid using EJB with PersistenceContext Extended, this kind of context will consume more memory and processing of the server. There is also the possibility of getting an entity from database as a “read only” entity, e.g.: entity that will only be used in a report. To recover an entity in a “read only” state is not needed to open a transaction, take a look at the code below: String query = "select uai from Student uai"; EntityManager entityManager = entityManagerFactory.createEntityManager(); TypedQuery<Student> typedQuery = entityManager.createQuery(query, Student.class); List<Student> resultList = typedQuery.getResultList(); Notice that in the code above there is no opened transaction, all the returned entities will be detached (non monitored by the JPA). If you are using EJB mark your transaction as NOT_SUPPORTED or you could use @Transactional(readOnly=true). Complexity I would say that there is only one solution to this problem: to study. It will be necessary to read books, blogs, magazines or any other trustful source of JPA material. More study is equals to less doubts in JPA. I am not a developer that believes that JPA it is the only and the best solution to every problem, but there are moments that JPA is not the best to tool to use. You must be careful when deciding about a persistence framework change, usually a lot of classes are affected and a huge refactoring is needed. Several bugs may be caused by this refactoring. It is needed to talk with the project mangers about this refactoring and list all the positive and negative effects. In the next four pages we will see 4 persistence frameworks that can be used in our projects, but before we see the frameworks I will show how that I choose each framework. Criteria for choosing the frameworks described here Maybe you will think: “why the framework X is not here?”. Below I will list the criteria applied for choosing the framework displayed here:Found in more than one source of research: we can find in forums people talking about a framework, but it is harder to find the same framework appearing in more than one forum. The most quoted frameworks were chosen. Quoted by different sources: Some frameworks that we found in the forums are indicated only by its committers. Some forums does not allow “self merchandise”, but some frameworks owners still doing it. Last update 01/05/2013: I have searched for frameworks that have been updated in this past year. Quick Hello World: Some frameworks I could not do a Hello World with less than 15~20min, and with some errors. To the tutorials found in this post I have worked 7 minutes in each framework: starting counting in its download until the first database insert.The frameworks that will be displayed in here has good methods and are easy to use. To make a real CRUD scenario we have a persistence model like below:A attribute with a name different of the column name: socialSecurityNumber —-> social_security_number A date attribute a ENUM attributeWith this characteristics in a class we will see some problems and how the framework solve it. Spring JDBC Template One of the most famous frameworks that we can find to access the database data is the Spring JDBC Template. The code of this project can be found in here: https://github.com/uaihebert/SpringJdbcTemplateCrud The Sprint JDBC Template uses natives queries like below:As it is possible to see in the image above the query has a database syntax (I will be using MySQL). When we use a native SQL query it is possible to use all the database resources in an easy way. We need an instance of the object JDBC Template (used to execute the queries), and to create the JDBC Template object we need to set up a datasource:We can get the datasource now (thanks to the Spring injection) and create our JDBCTemplate:PS.: All the XML code above and the JDBCTemplate instantiation could be replace by Spring injection and with a code bootstrap, just do a little research about the Spring features. One thing that I did not liked is the INSERT statement with ID recover, it is very verbose:With the KeyHolder class we can recover the generated ID in the database, unfortunately we need a huge code to do it. The other CRUD functions are easier to use, like below:Notice that to execute a SQL query it is very simple and results in a populated object, thanks to the RowMapper. The RowMapper is the engine that the JDBC Template uses to make easier to populate a class with data from the database. Take a look at the RowMapper code below:The best news about the RowMapper is that it can be used in any query of the project. The developer that is responsible to write the logic that will populate the class data. To finish this page, take a look below in the database DELETE and the database UPDATE statement:About the Spring JDBC Template we can say:Has a good support: Any search in the Internet will result in several pages with tips and bug fixes. A lot of companies use it: several projects across the world use it Be careful with different databases for the same project: The native SQL can became a problem with your project run with different databases. Several queries will need to be rewritten to adapt all the project databases. Framework Knowledge: It is good to know the Spring basics, how it can be configured and used.To those that does not know the Spring has several modules and in your project it is possible to use only the JDBC Template module. You could keep all the other modules/frameworks of your project and add only the necessary to run the JDBC Template. MyBatis MyBatis (created with the name iBatis) is a very good framework that is used by a lot of developers. Has a lot of functionalities, but we will only see a few in this post. The code of this page can be found in here: https://github.com/uaihebert/MyBatisCrud To run your project with MyBatis you will need to instantiate a Session Factory. It is very easy and the documentation says that this factory can be static:When you run a project with MyBatis you just need to instantiate the Factory one time, that is why it is in a static code. The configuration XML (mybatis.xml) it is very simple and its code can be found below:The Mapper (an attribute inside the XML above) will hold information about the project queries and how to translate the database result into Java objects. It is possible to create a Mapper in XML or Interface. Let us see below the Mapper found in the file crud_query.xml:Notice that the file is easy to understand. The first configuration found is a ResultMap that indicates the query result type, and a result class was configured “uai.model.Customer”. In the class we have a attribute with a different name of the database table column, so we need to add a configuration to the ResultMap. All queries need a ID that will be used by MyBatis session. In the beginning of the file it is possible to see a namespace declared that works as a Java package, this package will wrap all the queries and the ResultMaps found in the XML file. We could also use a Interface+Annotation instead of the XML. The Mapper found in the crud_query.xml file could be translated in to a Interface like:Only the Read methods were written in the Interface to make the code smaller, but all the CRUD methods could be written in the Interface. Let us see first how to execute a query found in the XML file:The parsing of the object is automatically and the method is easy to read. To run the query all that is needed is to use the combination “namespace + query id” that we saw in the crud_query.xml code above. If the developer wants to use the Interface approach he could do like below:With the interface query mode we have a clean code and the developer will not need to instantiate the Interface, the session class of the MyBatis will do the work. If you want to update, delete or insert a record in the database the code is very easy:About MyBatis we could say:Excellent Documentation: Every time that I had a doubt I could answer it just by reading its site documentation Flexibility: Allowing XML or Interfaces+Annotations the framework gives a huge flexibility to the developer. Notice that if you choose the Interface approach the database portability will be harder, it is easier to choose which XML to send with the deploy artifact rather than an interface Integration: Has integration with Guice and Spring Dynamic Query: Allows to create queries in Runtime, like the JPA criteria. It is possible to add “IFs” to a query to decide which attribute will be used in the query Transaction: If your project is not using Guice of Spring you will need to manually control the transactionSormula Sormula is a ORM OpenSource framework, very similar to the JPA/Hibernate. The code of the project in this page can be found in here: https://github.com/uaihebert/SormulaCrud Sormula has a class named Database that works like the JPA EntityManagerFactory, the Database class will be like a bridge between the database and your model classes. To execute the SQL actions we will use the Table class that works like the JPA EntityManager, but the Table class is typed. To run Sormula in a code you will need to create a Database instance:To create a Database instance all that we need is a Java Connection. To read data from the database is very easy, like below:You only need to create a Database instance and a Table instance to execute all kind of SQL actions. How can we map a class attribute name different from the database table column name? Take a look below:We can use annotations to do the database mapping in our classes, very close to the JPA style. To update, delete or create data in the database you can do like below:About Sormula we can say that:Has a good documentation Easy to set up It is not found in the maven repository, it will make harder to attach the source code if needed Has a lot of checked exceptions, you will need to do a try/catch for the invoked actionssql2o This framework works with native SQL and makes easier to transform database data into Java objects. The code of the project in this page can be found in here: https://github.com/uaihebert/sql2oCrud sql2o has a Connection class that is very easy to create:Notice that we have a static Sql2o object that will work like a Connection factory. To read the database data we would do something like:Notice that we have a Native SQL written, but we have named parameters. We are not using positional parameters like ‘?1′ but we gave a name to the parameter like ‘:id’. We can say that named parameters has the advantage that we will not get lost in a query with several parameters; when we forget to pass some parameter the error message will tell us the parameter name that is missing. We can inform in the query the name of the column with a different name, there is no need to create a Mapper/RowMapper. With the return type defined in the query we will not need to instantiate manually the object, sql2o will do it for us. If you want to update, delete or insert data in the database you can do like below:It is a “very easy to use” framework. About the sql2o we can say that:Easy to handle scalar query: the returned values of SUM, COUNT functions are easy to handle Named parameters in query: Will make easy to handle SQL with a lot of parameters Binding functions: bind is a function that will automatically populate the database query parameters through a given object, unfortunately it did not work in this project for a problem with the enum. I did not investigate the problem, but I think that it is something easy to handlejOOQ jOOQ it is a framework indicated by a lot of people, the users of this frameworks praise it in a lot of sites/forums. Unfortunately the jOOQ did not work in my PC because my database was too old, and I could not download other database when writing this post (I was in an airplane). I noticed that to use the jOOQ you will need to generated several jOOQ classes based in your model. jOOQ has a good documentation in the site and it details how to generate those classes. jOOQ is free to those that uses a free database like: MySQL, Postgre, etc. The paid jOOQ version is needed to those that uses paid databases like: Oracle, SQL Server, etc.www.jooq.org/Avaje Is a framework quoted in several blogs/forums. It works with the ORM concept and it is easy to execute database CRUD actions. Problems that I found:Not well detailed documentation: its Hello World is not very detailed Configurations: it has a required properties configuration file with a lot of configurations, really boring to those that just want to do a Hello World A Enhancer is needed: enhancement is a method do optimize the class bytecode, but is hard to setup in the beginning and is mandatory to do before the Hello Worldwww.avaje.orgIs a raw JDBC approach worth it? The advantages of JDBC are:Best performance: We will not have any framework between the persistence layer and the database. We can get the best performance with a raw JDBC Control over the SQL: The written SQL is the SQL that will be executed in the database, no framework will edit/update/generate the query SQL Native Resource: We could access all natives database resources without a problem, e.g.: functions, stored procedures, hints, etcThe disadvantages are:Verbose Code: After receiving the database query result we need to instantiate and populate the object manually, invoking all the required “set” methods. This code will get worse if we have classes relationships like one-to-many. It will be very easy to find a while inside another while. Fragile Code: If a database table column changes its name it will be necessary to edit all the project queries that uses this column. Some project uses constants with the column name to help with this task, e.g. Customer.NAME_COLUMN, with this approach the table column name update would be easier. If a column is removed from the database all the project queries would be updated, even if you have a column constants. Complex Portability: If your project uses more than one database it would be necessary to have almost all queries written for each vendor. For any update in any query it would be necessary to update every vendor query, this could take a lot the time from the developers.I can see only one factor that would make me choose a raw JDBC approach almost instantly:Performance: If your project need to process thousands of transactions per minutes, need to be scalable and with a low memory usage this is the best choice. Usually median/huge projects has all this high performance requirements. It is also possible to have a hybrid solution to the projects; most of the project repository (DAO) will use a framework, and just a small part of it will use JDBCI do like JDBC a lot, I have worked and I still working with it. I just ask you to not think that JDBC is the silver bullet for every problem. If you know any other advantage/disadvantage that is not listed here, just tell me and I will add here with the credits going to you. How can I choose the right framework? We must be careful if you want to change JPA for other project or if you are just looking for other persistence framework. If the solutions in page 3 are not solving your problems the best solution is to change the persistence framework. What should you considerate before changing the persistence framework?Documentation: is the framework well documented? Is easy to understand how it works and can it answer most of your doubts?Community: has the framework an active community of users? Has a forum?Maintenance/Fix Bugs: Is the framework receiving commits to fix bugs or receiving new features? There are fix releases being created? With which frequency?How hard is to find a developer that knows about this framework? I believe that this is the most important issue to be considered. You could add to your project the best framework in the world but without developers that know how to operate it the framework will be useless. If you need to hire a senior developer how hard would be to find one? If you urgently need to hire someone that knows that unknown framework maybe this could be very difficult.Final thoughts I will say it again: I do not think that JPA could/should be applied to every situation in every project in the world; I do no think that that JPA is useless just because it has disadvantages just like any other framework. I do not want you to be offended if your framework was not listed here, maybe the research words that I used to find persistence frameworks did not lead me to your framework. I hope that this post might help you. If your have any double/question just post it. See you soon!Reference: JPA Hibernate Alternatives. What can I use if JPA or Hibernate is not good enough for my project? from our JCG partner Hebert Coelho at the uaiHebert blog....
software-development-2-logo

Unit Testing – Why not?

For JUnit implementation in our project, we see a great challenge in having them implemented as we are already running behind for Sprint 2 and Sprint 3. Team was provided with all the knowledge walkthrough of the JUnit implementation and example test case. I do not see the team would be able to meet the JUnit coverage as expected. We need to first come to track in terms of delivery and can plan to write JUnits as team gets bandwidth. Please let us know your thoughts. Really – you are going to give me that shit yet again. Does the behind-ness on your project does not tell you something, does it not even rake your mind that if developers weren’t testing their code there might actually be lesser problems and maybe the project would be on track. This is how I would have responded a week back. But today, I am willing to have a reasonable conversation to understand what are the pain points which makes it difficult for the team to do “automated” Unit Testing (automated being the key here). Let’s go back to the very core of unit testing and even before that – Self Testing Code. Quoting Martin Fowler … These kinds of benefits are often talked about with respect to TestDrivenDevelopment (TDD), but it’s useful to separate the concepts of TDD and self-testing code. I think of TDD as a particular practice whose benefits include producing self-testing code. It’s a great way to do it, and TDD is a technique I’m a big fan of. But you can also produce self-testing code by writing tests after writing code – although you can’t consider your work to be done until you have the tests (and they pass). The important point of self-testing code is that you have the tests, not how you got to them. Read this para again and again until you imprint the 2 key messages a) you cant consider your work to be done until you have the tests and they pass and b) important point is that you have the tests, not how you got to them. Now that you understand this full (hopefully), let me ask you once this simple question – Are the developers doing any unit testing (again not automated)? If the answer is No then I have a problem and I am outraged. I am big fan of Unit Testing and my first experience was on my first job and it was with TDD and yes it was a adrenaline rush a dose of dopamine (as David H says in his podcast). I saw it’s effectiveness when at a stage we decided to throw away the implementation because we realized that it would just not scale. We were 6 months into the development and we had a lot of LOC and along side a solid test suite of 400+ test cases. When we decided we needed to do something fundamentally different we chose to keep the test suite (we used NUnits back in the day). Over the course of next 2 months as we wrote the new implementation those test cases were our feedback loop. Our job (team of 4 developers) was to ensure all of those 400+ test cases turned Green. The day we saw all of them to be Green we knew we were exactly where (functional delivery) when we had been when threw the code/implementation out. The speed at which we redid the implementation was because the confidence we had in the new code we were writing – we were being told almost immediately if we did something right or wrong (they were not unit tests as per definition – but that is for later). Then a few years later in 2008 we did the same thing again and it was a 9 month x 25 people implementation which we redid and it took us 3 month x 20 people to catch up and meet same functional delivery. In this situation I read a project is behind track, and they need to go faster with confidence to catch up and meet delivery timelines. Not doing Unit Testing (again not saying “automated”), is like someone telling me – Kapil, you are driving to go to your wedding and you are late and now you have to drive faster. But, instead of putting in a few more airbags and giving you better set of wheels, better brakes; We are going to take the 1 Air Bag you have today and also replace your wheels with an older set. Now, go drive else you will not get married. What do you think I would do – Drive faster and risk my life or start driving even slower because I hope that my would be wife loves enough to know that I had no option but to drive slowly. Well you get the point – While I may get married, She is going to stay mad at me for a very long time for ruining her perfect day. I have not yet broken down this problem if delay on this project (which I have to), but I can bet my ridiculous salary that it’s because of 2 things – a) “Regression” that developer keeps introducing and b) sending incomplete functional code to QA where defects are found and then it all comes back to them and in some cases these defects are found even in client layer of testing. In my experience this is how my ideal day as a developer used to look like:Day 1Understand requirements – 1 hour Design and Implement and unit test- 6 hours Test (integration + functional/system) – 1 hourDay 2Repeat Day 1And when I delivered code to QA I hardly got back any defects to fix and I would continue this cycle day in and day out. The times would stretch if someone did wrong estimates but the activities remained same always and time spent in each of these always increased in some proportion. And I have seen developers do that. But, more recently, I see developers, this is how their day looks like:Day 1Implement new story – 12 hoursDay 2Fix Defects from Day 1 – 4 hours Implement new story – 12 hoursWell you already see that on Day 2 the developer is fixing their technical debt and they are struggling to catch up. This endless cycle lets call it Cyclone of technical debt” is unrecoverable. The project will deliver one day and people will get burnt. Developers blame estimations, and Architects blame skills, training, indiscipline. Well let’s call the team failed. As a developer I ask you – Why would you not want to get High? Why would you not want to be in a state of confidence where you know that your code commits is not going to break something else? Why would you not Unit Test?Reference: Unit Testing – Why not? from our JCG partner Kapil Viren Ahuja at the Scratch Pad blog....
enterprise-java-logo

How to customize Hibernate dirty checking mechanism

Introduction In my previous article I described the Hibernate automatic dirty checking mechanism. While you should always prefer it, there might be times when you want to add your own custom dirtiness detection strategy. Custom dirty checking strategies Hibernate offers the following customization mechanisms:  Hibernate Interceptor#findDirty() CustomEntityDirtinessStrategyA manual dirty checking exercise As an exercise, I’ll build a manual dirty checking mechanism to illustrate how easy you can customize the change detection strategy: Self dirty checking entity First, I’ll define a DirtyAware interface all manual dirty checking entities will have to implement: public interface DirtyAware {Set<String> getDirtyProperties();void clearDirtyProperties(); } Next I am going to encapsulate our current dirty checking logic in a base class: public abstract class SelfDirtyCheckingEntity implements DirtyAware {private final Map<String, String> setterToPropertyMap = new HashMap<String, String>();@Transient private Set<String> dirtyProperties = new LinkedHashSet<String>();public SelfDirtyCheckingEntity() { try { BeanInfo beanInfo = Introspector.getBeanInfo(getClass()); PropertyDescriptor[] descriptors = beanInfo.getPropertyDescriptors(); for (PropertyDescriptor descriptor : descriptors) { Method setter = descriptor.getWriteMethod(); if (setter != null) { setterToPropertyMap.put(setter.getName(), descriptor.getName()); } } } catch (IntrospectionException e) { throw new IllegalStateException(e); }}@Override public Set<String> getDirtyProperties() { return dirtyProperties; }@Override public void clearDirtyProperties() { dirtyProperties.clear(); }protected void markDirtyProperty() { String methodName = Thread.currentThread().getStackTrace()[2].getMethodName(); dirtyProperties.add(setterToPropertyMap.get(methodName)); } } All manual dirty checking entities will have to extend this base class and explicitly flag the dirty properties through a call to the markDirtyProperty method. The actual self dirty checking entity looks like this: @Entity @Table(name = "ORDER_LINE") public class OrderLine extends SelfDirtyCheckingEntity {@Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id;private Long number;private String orderedBy;private Date orderedOn;public Long getId() { return id; }public Long getNumber() { return number; }public void setNumber(Long number) { this.number = number; markDirtyProperty(); }public String getOrderedBy() { return orderedBy; }public void setOrderedBy(String orderedBy) { this.orderedBy = orderedBy; markDirtyProperty(); }public Date getOrderedOn() { return orderedOn; }public void setOrderedOn(Date orderedOn) { this.orderedOn = orderedOn; markDirtyProperty(); } } Whenever a setter gets called, the associated property becomes dirty. For simplicity sake this simple exercise doesn’t cover the use case when we revert a property to its original value. The dirty checking test To test the self dirty checking mechanisms I’m going to run the following test case: @Test public void testDirtyChecking() { doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { OrderLine orderLine = new OrderLine(); session.persist(orderLine); session.flush(); orderLine.setNumber(123L); orderLine.setOrderedBy("Vlad"); orderLine.setOrderedOn(new Date()); session.flush(); orderLine.setOrderedBy("Alex"); return null; } }); } The Hibernate Interceptor solution The Hibernate Interceptor findDirty callback allows us to control the dirty properties discovery process. This method may return:null, to delegate the dirty checking to Hibernate default strategy an int[] array, containing the modified properties indiciesOur Hibernate dirty checking interceptor looks like this: public class DirtyCheckingInterceptor extends EmptyInterceptor { @Override public int[] findDirty(Object entity, Serializable id, Object[] currentState, Object[] previousState, String[] propertyNames, Type[] types) { if(entity instanceof DirtyAware) { DirtyAware dirtyAware = (DirtyAware) entity; Set<String> dirtyProperties = dirtyAware.getDirtyProperties(); int[] dirtyPropertiesIndices = new int[dirtyProperties.size()]; List<String> propertyNamesList = Arrays.asList(propertyNames); int i = 0; for(String dirtyProperty : dirtyProperties) { LOGGER.info("The {} property is dirty", dirtyProperty); dirtyPropertiesIndices[i++] = propertyNamesList.indexOf(dirtyProperty); } dirtyAware.clearDirtyProperties(); return dirtyPropertiesIndices; } return super.findDirty(entity, id, currentState, previousState, propertyNames, types); } } When passing this interceptor to our current SessionFactory configuration we get the following output: INFO [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The number property is dirty INFO [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty INFO [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedOn property is dirty DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 07:35:05.649,1]} INFO [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 07:35:05.649,1]} The manual dirty checking mechanism has detected incoming changes and propagated them to the flushing event listener. The lesser-known CustomEntityDirtinessStrategy The CustomEntityDirtinessStrategy is a recent Hibernate API addition, allowing us to provide an application specific dirty checking mechanism. This interface can be implemented as follows: public static class EntityDirtinessStrategy implements CustomEntityDirtinessStrategy {@Override public boolean canDirtyCheck(Object entity, EntityPersister persister, Session session) { return entity instanceof DirtyAware; }@Override public boolean isDirty(Object entity, EntityPersister persister, Session session) { return !cast(entity).getDirtyProperties().isEmpty(); }@Override public void resetDirty(Object entity, EntityPersister persister, Session session) { cast(entity).clearDirtyProperties(); }@Override public void findDirty(Object entity, EntityPersister persister, Session session, DirtyCheckContext dirtyCheckContext) { final DirtyAware dirtyAware = cast(entity); dirtyCheckContext.doDirtyChecking( new AttributeChecker() { @Override public boolean isDirty(AttributeInformation attributeInformation) { String propertyName = attributeInformation.getName(); boolean dirty = dirtyAware.getDirtyProperties().contains( propertyName ); if (dirty) { LOGGER.info("The {} property is dirty", propertyName); } return dirty; } } ); }private DirtyAware cast(Object entity) { return DirtyAware.class.cast(entity); } } To register the CustomEntityDirtinessStrategy implementation we have to set the following Hibernate property: properties.setProperty("hibernate.entity_dirtiness_strategy", EntityDirtinessStrategy.class.getName()); Running our test yields the following output: INFO [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The number property is dirty INFO [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty INFO [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedOn property is dirty DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 12:51:30.068,1]} INFO [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 12:51:30.068,1]} Conclusion Although the default field-level checking or the bytecode instrumentation alternative are sufficient for most applications, there might be times when you want to gain control over the change detection process. On a long-term project, it’s not uncommon to customize certain built-in mechanisms, to satisfy exceptional quality of service requirements. A framework adoption decision should also consider the framework extensibility and customization support.Code available on GitHub.Reference: How to customize Hibernate dirty checking mechanism from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....
software-development-2-logo

Distributed Crawling

Around 3 months ago, I have posted one article explaining our approach and consideration to build Cloud Application. From this article, I will gradually share our practical design to solve this challenge. As mentioned before, our final goal is to build a Saas big data analysis application, which will deployed in AWS servers. In order to fulfill this goal, we need to build distributed crawling, indexing and distributed training systems. The focus of this article is how to build the distributed crawling system. The fancy name for this system will be Black Widow. Requirements As usual, let start with the business requirement for the system. Our goal is to build a scalable crawling system that can be deployed on the cloud. The system should be able to function in an unreliable, high-latency network and can recover automatically from a partial hardware or network failure. For the first release, the system can crawl from 3 kind of sources, Datasift, Twitter API and Rss feeds. The data crawled back are called Comment. The Rss crawlers suppose to read public sources like website or blog. It is free of charge. DataSift and Twitter both provide proprietary APIs to access their streaming service. Datasift charges its users by comment count and the complexity of CSLD (Curated Stream Definition Language, their own query language). Twitter, in the other hand, offers free Twitter Sampler streaming. In order to do cost control, we need to implement mechanism to limit the amount of comments crawled from commercial source like Datasift. As Datasift provided Twitter comment, it is possible to have single comment coming from different sources. At the moment, we did not try to eliminate and accept it as data duplication. However, this problem can be eliminated manually by user configuration (avoid choosing both Twitter and Datasift Twitter together). For future extension, the system should be able to link up related comments to from a conversation. Food for Thought Centralized Architecture Our first thought when getting requirement is to build the crawling on the nodes, which we called Spawn and let the hub, which we called Black Widow to manage the collaboration of effort among nodes. This idea was quickly accepted by team members as it allows the system to scale well with the hub doing limited work. As any other centralized system, Black Widow suffers from single point of failure problem. To help easing this problem, we allow the node to function independently for a short period after losing connection to Black Widow. This will give the support team a breathing room to bring up backup server. Another bottle neck in the system is data storage. For the volume of data being crawled (easily reach few thousands records per seconds), NoSQL is clearly the choice for storing the crawled comments. We have experiences working with Lucene and MongoDB. However, after research and some minor experiments, we choose Cassandra as the NoSQL database. With that few thoughts, we visualize the distributed crawling system to be build following this prototype:In the diagram above, Black Widow, or the hub is the only server that has access to the SQL database system. This is where we store the configuration for crawling. Therefore, all the Spawns, or crawling nodes are fully stateless. It simply wakes up, registers itself to Black Widow and does the assigned jobs. After getting the comments, the Spawn stores it to Cassandra cluster and also push it to some queues for further processing. Brainstorming of possible issues To explain the design to non-technical people, we like to relate the business requirement to a similar problem in real life so that it can be easier to understand. The similar problem we choose would be collaborating of efforts among volunteers. Imagine if we need to do a lot of preparation work for the upcoming Olympic and decide to recruit volunteers all around the world to help. We do not know volunteers but the volunteers know our email, so they can contact us to register. Only then, we know their emails and may send tasks to them through email. We would not want to send one task to two volunteers or left some tasks unattended. We want to distribute the tasks evenly so that no volunteers are suffering too much. Due to cost issue, we would not contact them through mobile phone. However, because email is less reliable, when sending out tasks to volunteers, we would request a confirmation. The task is consider assigned only when the volunteer replied with confirmation. With above example, the volunteers represent Spawn nodes while email communication represent unreliable and high latency network. Here are some problems that we need to solve: 1/ Node failure For this problems, the best way is to check regularly. If a volunteer stop responding to the regular progress check email, the task should be re-assign to someone else. 2/ Optimization of tasks assigning Some tasks are related. Therefore assigning related tasks to the same person can help to reduce total effort. This happen with our crawling as well because some crawling configurations have similar search terms, grouping  them together to share the streaming channel will help to reduce final bill. Another concern is the fairness or ability to distribute the amount of works evenly among volunteers. The simplest strategy we can think of is Round Robin but with a minor tweak by remembering earlier assignments. Therefore, if a task is pretty similar to the tasks we assigned before, the task can be skipped from Round Robin selection and directly assign to the same volunteer. 3/ The hub is not working If due to some reasons, our email server is down and we cannot contact volunteer any more, it is better to let the volunteers stop working on the assigning tasks. The main concern here is over-running of cost or wasted efforts. However, stopping working immediately is too hasty as temporary infrastructure issue may cause the communication problem. Hence, we need to find a reasonable amount of time for the node to continue functioning after being detached from the hub. 4/ Cost control Due to business requirement, there are two kinds of cost control that we need to implement. First is the total of comments being crawled per crawler and second is the total of comments crawled by all crawlers belong to the same user. This is where we have a debate about the best approach to implement cost control. It is very straight forward to implement the limit for each crawler. We can simply pass this limit to the Spawn node and it will automatically stop the crawler when the limit is reached. However, for the limit per user, it is not so straight forward and we have two possible approaches. For the simpler choice, we can send all the crawlers of one user to the same node. Then, similar to the earlier problem, the Spawn node knows  the amount of comments collected and stops all crawlers when limit reached. This approach is simple but it limits the ability to distribute jobs evenly among nodes. The alternative approach is to let all the nodes retrieve and update a global counter. This approach creates huge network traffic internally and add considerable delay to comment processing time. At this point, we temporarily choose the global counter approach. This can be considered again if the performance become a huge concern. 5/ Deploy on the cloud As any other Cloud application, we can not put too much trust in the network or infrastructure. Here is how we make our application conform to the check-list mentioned in last article:Stateless: Our spawn node is stateless but the hub is not. Therefore, in our design, the nodes do actual work and the hub only collaborates efforts. Idempotence: We implement hashCode and equal methods for every crawler configuration. We store the crawler configurations in the Map or Set. Therefore, the crawler configuration can be sent multiple times without any other side effect. Moreover, our node selection approach ensure that the job will be sent to the same node. Data Access Object: We apply the JsonIgnore filter on every model objects to make sure no confidential data flying around in the network. Play Safe: We implement health-check API for each node and the hub itself. The first level of support will get notified immediately when anything wrong happened.6/ Recovery We try our best to make the system heal itself from partial failure. There are some type of failure that we can recover from:Hub failure: Node register itself to the hub when it start up. From then, it is the one way communication when only the hub send jobs to node and also poll for status update. The node is consider detached if it failed to get any contact from Hub for a pre-defined period. If a node is detached, it will clear all the job configurations and start registering itself to the hub again. If the incident is caused by hub failure, a new hub will fetch crawling configurations from database and start distributing jobs again. All the existing jobs on Spawn nodes will be cleared when the Spawn node go to detached mode. Node failure: When hub fail to poll a node, it will do a hard reset by removing all working jobs and re-distribute from beginning again to the working nodes. This re-distribution process help to ensure optimized distribution. Job failure: There are two kind of failures happened when the hub do sending and polling jobs. If a job is failed in the polling process but the Spawn node is still working well, Black Widow can re-assign the job to the same node again. The same thing can be done if the job sending failed.Implementation Data Source and Subscriber In the initial thought, each crawler can open it own channel to retrieve data but this does not make sense any more when inspecting further. For Rss, we can scan all URLs once and find out the keywords that may belong to multiple crawlers. For Twitter, it supports up to 200 search terms for one single query. Therefore, it is possible for us to open single channel that serve multiple crawlers. For Datasift, it is quite rare, but due to human mistake or luck, it is possible to have crawlers with identical search terms. This situation lead us to split out crawler to two entities: subscriber and data source. Subscriber is in charge of consuming the comments while data source is in charge of crawling the comments. With this design, if there are two crawlers with similar keywords, a single data source will be created to serve two subscribers, each processing the comments their own ways. Data source will be created when and only when no similar data source exist. It starts working when having the first subscriber subscribe to it and retire when the last subscriber unsubscribe from it. With the help of Black Widow to send similar subscribers to the same node, we can minimize the amount of data sources created and indirectly, minimize the crawling cost. Data Structure The biggest concern of data structure is Thread Safe issue. In the Spawn node, we must store all running subscribers and data sources in memory. There are a few scenarios that we need to modify or access these data:When a subscriber hit the limit, it automatically unsubscribe from data source, which may lead to deactivation of data source. When Black Widow send a new subscriber to Spawn nodes. When Black Widow send a request to unsubscribe an existing subscriber. Health check API expose all running subscribers and data sources. Black Widow regularly polls the status of each assigned subscriber. The Spawn node regularly checks and disables orphan subscribers (subscriber which is not polled by Black Widow).Another concern of data structure is idempotence of operations. Any of operation above can be missing or being duplicated. To handle this problem, here is our approachImplement hashCode and equals method for every subscriber and data source. We choose the Set or Map to store collection of subscribers and data sources. For records with identical hash code, Map will replace the record when there is new insertion but Set will skip the new record. Therefore, if we use Set, we need to ensure new records can replace old record. We use synchronized in data access code. If Spawn node receive a new subscriber that similar to existing subscriber, it will compare and prefer to update existing subscriber instead of replacing. This avoid the process of unsubscribing and subscribing identical subscribers, which may interrupt data source streaming.Routing As mentioned before, we need to find a routing mechanism that serve two purposes:Distribute the jobs evenly among Spawn nodes. Route similar jobs to the same nodes.We solved this problem by generating an unique representation of each query  named uuid. After that, we can use a simple modular function to find out the note to route: int size = activeBwsNodes.size(); int hashCode = uuid.hashCode(); int index = hashCode % size; assignedNode = activeBwsNodes.get(index);With this implementation, subscribers with similar uuid will always be sent to the same node and each node has equals chance of being selected to serve a subscriber. This whole practice can be screwed up when there is change to the collection of active Spawn nodes. Therefore, Black Widow must clear up all running jobs and reassign from beginning whenever there is a node change. However, node change should be quite rare in production environment. Handshake Below is the sequence diagram of Black Widow and Node collaboration:Black Widow does not know Spawn node. It wait for the Spawn node to register itself to the Black Widow. From there, Black Widow has the responsibility to poll the node to maintain connectivity. If Black Widow fail to poll a node, it will remove the node from the its container. The orphan node will eventually go to detached mode because it is not being polled any more. In this mode, Spawn node will clear existing jobs and try to register itself again. The next diagram is the subscriber life-cycle:Similar to above, Black Widow has the responsibility of polling the subscribers it send to Spawn node. If a subscriber is not being polled by Black Widow anymore, Spawn node will treat the subscriber as orphan and remove it. This practice help to eliminate the threat of Spawn node running obsoleted subscriber. On Black Widow, when a subscriber polling fails, it will try to get a new node to assign the job. If the Spawn node of the subscriber still available, it is likely that the same job will go to the same node again due to our routing mechanism we used. Monitoring In a happy scenario, all the subscribers are running, Black Widow is polling and nothing else happen. However, this is not likely to happen in real life. There will be changes in Black Widow and Spawn nodes from time to time, triggered by various events. For Black Widow, there will be changes under following circumstances:Subscriber hit limit Found new subscriber Existing subscriber disabled by user Polling of subscriber fails Polling of Spawn node failsTo handle changes, Black Widow monitoring tool offers two services: hard reload and soft reload. Hard Reload happen on node change while Soft Reload happen on subscriber change. Hard Reload process takes back all running jobs, redistribute from beginning over available nodes. Soft Reload process removes obsoleted jobs, assigns new jobs and re-assigns failed jobs.Compare to Black Widow, the monitoring of Spawn node is simpler. The two main concerns are maintaining connectivity to Black Widow and removing orphan subscribers.Deployment Strategy The deployment strategy is straight forward. We need to bring up Black Widow and at least one Spawn node. The Spawn node should know the URL of Black Widow. From then, the Health Check API will give use the amount of subscribers per node. We can integrate Health Check with AWS API to automatically bring up new Spawn node if existing nodes are overloaded. The Spawn node image will need to have Spawn application running as service. Similarly, when the nodes are not utilized, we can bring down redundant Spawn nodes. Black Widow need special treatment due to its importance. If Black Widow fails, we can restart the application. This will cause all existing jobs on Spawn nodes to become orphan and all the Spawn nodes go to detached mode. Slowly, all the nodes will clean up itself and try to register again. Under default configuration, the whole restarting process will happen within 15 minutes. Threats and possible improvement When choosing centralized architecture, we know that Black Widow is the biggest risk to the system. While Spawn node failure only causes a minor interruption in the affected subscribers, Black Widow failure finally lead to Spawn nodes restart, which will take much longer time to recover. Moreover, even the system can recover from partial, there still be interruption of service in recovery process. Therefore, if the polling requests failed too often due to unstable infrastructure, the operation will be greatly hampered. Scalability is another concern for centralized architecture. We have not had a concrete amount of maximum Spawn nodes that the Black Widow can handle. Theoretically, this should be very high because Black Widow only do minor processing, most of its effort are on sending out HTTP requests. It is possible that network is the main limit factor for this architecture. Because of this, we let the Black Widow polling the nodes rather than the nodes polling Black Widow (other people do this, like Hadoop). With this approach, Black Widow may work at its own pace, not under pressure of Spawn nodes. One of the first question we got is whether it is a Map Reduce problem and the answer is No. Each subscriber in our Distributed Crawling System processes its own comments and does not reporting result back to Black Widow. That why we do not use any Map Reduce product like Hadoop. Our monitor is business logic aware rather than purely infrastructure monitoring, that why we choose to build ourselves over using monitoring tools like Zoo Keeper or AKKA. For future improvement, it is better to walk away from Centralized Architecture by having multiple hubs collaborating with each other. This should not be too difficult provided that the only time Black Widow accessing database is loading subscriber. Therefore, we can slice the data and let each Black Widow load a portion of it. Another point that make me feel pretty unsatisfied is the checking of global counter for user limit. As the check happened on every comment crawled, this greatly increases internal network traffic and limit the scalability of system. The better strategy should be divide of quota based on processing speed. Black Widow can regulate and redistribute quota for each subscriber (on different nodes).Reference: Distributed Crawling from our JCG partner Tony Nguyen at the Developers Corner blog....
software-development-2-logo

Implementing the ‘Git flow’

Git can be used in a variety of ways which is cool. But still, when working within a team, it is good to have a consensus on a common, shared approach in order to avoid conflicts. This article quickly explains how we implemented the “git flow” pattern in one of our projects. Git-flow… …is a popular strategy which works around the master branch, but in a less “aggressive” way (than the GitHub flow pattern for instance). You have two main branches:  master branch contains the latest production code that has been deployed, and versioned with appropriate tags for each release. develop branch that gets branched off master and contains the latest code for the next feature that is being developed. For each new feature there might be lots of feature branches (always branched off the “develop” branch). Beside the main branches, there are so-called supporting branches:Beside those, there are supporting branches:feature branches contain the development state for a single feature of the overall product (i.e. a user story). They are merged off the develop branch. hotfix branches are branches for quick and severe bugfixes. they are usually branched off the master branch, fixed in the hotfix branch and then merged back in master and develop as well. release branch is a branch for preparing the next release. No new features will be developed on this branch but rather it contains some last fixes (also bugfixes) and adjustments for going into production.Production branch oriented Many people prefer to see master as their development branch and instead have a dedicated one for the production environment. Such a production oriented branching strategy has:master branch which contains the actual development code (corresponds to the “develop” branch in the git-flow model) production branch contains the deployed code.Supporting branches are:feature branches which contain the development of specific features and are branched off master and merged back into master hotfix branches (works like in the standard git-flow model) release branch (works like in the standard git-flow model)Usage In my opinion tools are great as they (mostly) give you some productivity boost. Nevetheless you should always understand what they do behind the scenes. This section lists the commands you’ll need to manually implement the production-oriented “git flow” pattern shown above. First of all you have to initialize an empty repository and eventually connect it immediately to your remote one. Obviously, feel free to skip this step if you already have one. $ git init $ git remote add origin git@..... Furthermore I’d suggest to also add a .gitignore file. You may start from an existing one based on your project type: Github .gitignore repository. “push” everything up to your remote repo. $ git push --set-upstream origin master Create a new Feature From master $ git pull $ git checkout -b userstory/login Do some commits and then publish the feature on the remote repo (if not a tiny one of a couple of hours) $ git push origin userstory/login Update feature from master Frequently update from origin/master to get the latest changes that have been pushed to the repo by your peers. $ git fetch origin master $ git rebase origin/master Alternatively checkout your master branch and execute $ git pull master $ git checkout <yourfeaturebranche> $ git rebase master Finish a feature Merge it back into master $ git checkout master $ git pull $ git merge --no-ff userstory/login --no-ff means no fast-forward to keep track from where certain changes have originated. TIP: In order to not forget the --no-ff flag you might want to configure it as the default behavior when merging into master by executing the following command: git config branch.master.mergeoptions "--no-ff" In case of conflicts, resolve them and then push the master $ git push and remove the userstory (locally and remote) $ git branch -d userstory/login $ git push origin :userstory/login Prepare release $ git checkout -b release/0.1.0 Publish production $ git checkout production $ git pull $ git merge --no-ff release/0.1.0 $ git tag v0.1.0 $ git push --tags origin production Delete the release/x.x.x branch. Create a hotfix $ git checkout production $ git checkout -b hotfix/login-does-not-work After testing it, merge back into production $ git checkout production $ git merge --no-ff hotfix/login-does-not-work $ git tag v0.1.1 $ git push --tags Obviously also merge those changes back to master as well $ git checkout master $ git merge --no-ff hotfix/login-does-not-work And then delete the hotfix branch $ git branch -d hotfix/login-does-not-work Ok..I’m a Jedi..give me some tools Git flow CLI tool Git Flow is a git command line extension to facilitate the usage of the “git flow” pattern.Download & install Git flow cheatsheetSo, if you mastered to use the git flow pattern manually, you’re ready to go with it. Haacked’s Git Aliases Phil Haack (former Microsoft employee and now working on GitHub for Windows @ GitHub) published an interesting set of 13 git aliases to boost your productivity. You might want to take a look at them:http://haacked.com/archive/2014/07/28/github-flow-aliases/To install them, simply copy&paste the aliases into your .gitconfig file. You should find it in your user profile directory (~ on unix systems; C:\users\<yourname>\ on Windows). Configuring Jenkins Please refer to my recent blog post “Git flow with Jenkins and GitLab” for further details on how to configure your build environment. How we use it – our pipeline We adopted the git flow pattern in one of our projects with a team getting in touch with git for the first time (they used TFS before). I introduced them to the Git basics and then they started straight ahead and surprisingly the switch was really easy. By using git flow we minimized the conflicting merges and thus potential problems in the development flow. So how did we use it? The team applied some kind of Scrum (we’re new to it, thus “some kind of” :)). We have two weeks iterations with an initial planning phase (usually on thursday morning) and we have the tester on the team (yay!).At the start of the sprint cycle, our devs take their user stories (on Trello) and create corresponding feature branches having the pattern userstory/<trello-card-#>-userstory-title for userstories, task/<trello-card-#>-title for tasks and bug/<trello-card-#>-title for bugs. The develop on the feature branches and fequently update them with master (see git flow usage above). If the story/task/bug’s implementation takes longer than a day or two, the branch gets pushed to the remote GitLab server (for backup reasons). Each of these pushes gets automatically build and tested by our Jenkins. Once finished with the implementation, the developer either merges it with master or creates a merge request on GitLab assigned to another developer for code reviewing. When master gets pushed to GitLab, Jenkins automatically takes it and publishes it to our dev server instance. Once every night, the master branch gets automatically published to our test server instance s.t. the tester in our team can continue to test the implemented stories and either mark them as done or reject them within our spring cycle. Furthermore a series of automated jMeter tests get executed that verify the correct functioning of our REST api as well as the performance of our endpoints. After the 2-weeks-cycle one of our devs prepares a release (see the kind of commands to execute in the “git flow usage” above) by merging master onto production. This is automatically detected by Jenkins which – again – publishes to our preproduction server instance which is also accessible by our customer.We do not use release branches as we don’t need them so far. There is no preparatory work to be done, although that might eventually change in the future. That’s the flow we came up with after a few iterations and dicussions within the team and with our tester. What’s your approach?? I’d be interested to hear in the comments.Reference: Implementing the ‘Git flow’ from our JCG partner Juri Strumpflohner at the Juri Strumpflohner’s TechBlog blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close