Featured FREE Whitepapers

What's New Here?


Java Concurrency Tutorial – Thread-safe designs

After reviewing what the main risks are when dealing with concurrent programs (like atomicity or visibility), we will go through some class designs that will help us prevent the aforementioned bugs. Some of these designs result in the construction of thread-safe objects, allowing us to share them safely between threads. As an example, we will consider immutable and stateless objects. Other designs will prevent different threads from modifying the same data, like thread-local variables. You can see all the source code at github.     1. Immutable objects Immutable objects have a state (have data which represent the object’s state), but it is built upon construction, and once the object is instantiated, the state cannot be modified. Although threads may interleave, the object has only one possible state. Since all fields are read-only, not a single thread will be able to change object’s data. For this reason, an immutable object is inherently thread-safe. Product shows an example of an immutable class. It builds all its data during construction and none of its fields are modifiable: public final class Product { private final String id; private final String name; private final double price; public Product(String id, String name, double price) { this.id = id; this.name = name; this.price = price; } public String getId() { return this.id; } public String getName() { return this.name; } public double getPrice() { return this.price; } public String toString() { return new StringBuilder(this.id).append("-").append(this.name) .append(" (").append(this.price).append(")").toString(); } public boolean equals(Object x) { if (this == x) return true; if (x == null) return false; if (this.getClass() != x.getClass()) return false; Product that = (Product) x; if (!this.id.equals(that.id)) return false; if (!this.name.equals(that.name)) return false; if (this.price != that.price) return false; return true; } public int hashCode() { int hash = 17; hash = 31 * hash + this.getId().hashCode(); hash = 31 * hash + this.getName().hashCode(); hash = 31 * hash + ((Double) this.getPrice()).hashCode(); return hash; } } In some cases, it won’t be sufficient to make a field final. For example, MutableProduct class is not immutable although all fields are final: public final class MutableProduct { private final String id; private final String name; private final double price; private final List<String> categories = new ArrayList<>(); public MutableProduct(String id, String name, double price) { this.id = id; this.name = name; this.price = price; this.categories.add("A"); this.categories.add("B"); this.categories.add("C"); } public String getId() { return this.id; } public String getName() { return this.name; } public double getPrice() { return this.price; } public List<String> getCategories() { return this.categories; } public List<String> getCategoriesUnmodifiable() { return Collections.unmodifiableList(categories); } public String toString() { return new StringBuilder(this.id).append("-").append(this.name) .append(" (").append(this.price).append(")").toString(); } } Why is the above class not immutable? The reason is we let a reference to escape from the scope of its class. The field ‘categories‘ is a mutable reference, so after returning it, the client could modify it. In order to show this, consider the following program: public static void main(String[] args) { MutableProduct p = new MutableProduct("1", "a product", 43.00); System.out.println("Product categories"); for (String c : p.getCategories()) System.out.println(c); p.getCategories().remove(0); System.out.println("\nModified Product categories"); for (String c : p.getCategories()) System.out.println(c); } And the console output: Product categoriesABCModified Product categoriesBC Since categories field is mutable and it escaped the object’s scope, the client has modified the categories list. The product, which was supposed to be immutable, has been modified, leading to a new state. If you want to expose the content of the list, you could use an unmodifiable view of the list: public List<String> getCategoriesUnmodifiable() { return Collections.unmodifiableList(categories); } 2. Stateless objects Stateless objects are similar to immutable objects but in this case, they do not have a state, not even one. When an object is stateless it does not have to remember any data between invocations. Since there is no state to modify, one thread will not be able to affect the result of another thread invoking the object’s operations. For this reason, a stateless class is inherently thread-safe. ProductHandler is an example of this type of objects. It contains several operations over Product objects and it does not store any data between invocations. The result of an operation does not depend on previous invocations or any stored data: public class ProductHandler { private static final int DISCOUNT = 90; public Product applyDiscount(Product p) { double finalPrice = p.getPrice() * DISCOUNT / 100; return new Product(p.getId(), p.getName(), finalPrice); } public double sumCart(List<Product> cart) { double total = 0.0; for (Product p : cart.toArray(new Product[0])) total += p.getPrice(); return total; } } In its sumCart method, the ProductHandler converts the product list to an array since for-each loop uses an iterator internally to iterate through its elements. List iterators are not thread-safe and could throw a ConcurrentModificationException if modified during iteration. Depending on your needs, you might choose a different strategy. 3. Thread-local variables Thread-local variables are those variables defined within the scope of a thread. No other threads will see nor modify them. The first type is local variables. In the below example, the total variable is stored in the thread’s stack: public double sumCart(List<Product> cart) { double total = 0.0; for (Product p : cart.toArray(new Product[0])) total += p.getPrice(); return total; } Just take into account that if instead of a primitive you define a reference and return it, it will escape its scope. You may not know where the returned reference is stored. The code that calls sumCartmethod could store it in a static field and allow it being shared between different threads. The second type is ThreadLocal class. This class provides a storage independent for each thread. Values stored into an instance of ThreadLocal are accessible from any code within the same thread. The ClientRequestId class shows an example of ThreadLocal usage: public class ClientRequestId { private static final ThreadLocal<String> id = new ThreadLocal<String>() { @Override protected String initialValue() { return UUID.randomUUID().toString(); } }; public static String get() { return id.get(); } } The ProductHandlerThreadLocal class uses ClientRequestId to return the same generated id within the same thread: public class ProductHandlerThreadLocal { //Same methods as in ProductHandler class public String generateOrderId() { return ClientRequestId.get(); } } If you execute the main method, the console output will show different ids for each thread. As an example: T1 - 23dccaa2-8f34-43ec-bbfa-01cec5df3258T2 - 936d0d9d-b507-46c0-a264-4b51ac3f527dT2 - 936d0d9d-b507-46c0-a264-4b51ac3f527dT3 - 126b8359-3bcc-46b9-859a-d305aff22c7e... If you are going to use ThreadLocal, you should care about some of the risks of using it when threads are pooled (like in application servers). You could end up with memory leaks or information leaking between requests. I won’t extend myself in this subject since the post How to shoot yourself in foot with ThreadLocals explains well how this can happen. 4. Using synchronization Another way of providing thread-safe access to objects is through synchronization. If we synchronize all accesses to a reference, only a single thread will access it at a given time. We will discuss this on further posts. 5. Conclusion We have seen several techniques that help us build simpler objects that can be shared safely between threads. It is much harder to prevent concurrent bugs if an object can have multiple states. On the other hand, if an object can have only one state or none, we won’t have to worry about different threads accessing it at the same time.Reference: Java Concurrency Tutorial – Thread-safe designs from our JCG partner Xavier Padro at the Xavier Padró’s Blog blog....

Integrating jOOQ with PostgreSQL: Partitioning

Introduction jOOQ is a great framework when you want to work with SQL in Java without having too much ORM in your way. At the same time, it can be integrated into many environments as it is offering you support for many database-specific features. One such database-specific feature is partitioning in PostgreSQL. Partitioning in PostgreSQL is mainly used for performance reasons because it can improve query performance in certain situations. jOOQ has no explicit support for this feature but it can be integrated quite easily as we will show you.This article is brought to you by the Germany based jOOQ integration partner UWS Software Service (UWS). UWS is specialised in custom software development, application modernisation and outsourcing with a distinct focus on the Java Enterprise ecosystem. Partitioning in PostgreSQL With the partitioning feature of PostgreSQL you have the possibility of splitting data that would form a huge table into multiple separate tables. Each of the partitions is a normal table which inherits its columns and constraints from a parent table. This so-called table inheritance can be used for “range partitioning” where, for example, the data from one range does not overlap the data from another range in terms of identifiers, dates or other criteria. Like in the following example, you can have partitioning for a table “author” that shares the same foreign-key of a table “authorgroup” in all its rows. CREATE TABLE author ( authorgroup_id int, LastName varchar(255) );CREATE TABLE author_1 ( CONSTRAINT authorgroup_id_check_1 CHECK ((authorgroup_id = 1)) ) INHERITS (author);CREATE TABLE author_2 ( CONSTRAINT authorgroup_id_check_2 CHECK ((authorgroup_id = 2)) ) INHERITS (author);... As you can see, we set up inheritance and – in order to have a simple example – we just put one constraint checking that the partitions have the same “authorgroup_id”. Basically, this results in the “author” table only containing table and column definitions, but no data. However, when querying the “author” table, PostgreSQL will really query all the inheriting “author_n” tables returning a combined result. A trivial approach to using jOOQ with partitioning In order to work with the partitioning described above, jOOQ offers several options. You can use the default way which is to let jOOQ generate one class per table. In order to insert data into multiple tables, you would have to use different classes. This approach is used in the following snippet: // add InsertQuery query1 = dsl.insertQuery(AUTHOR_1); query1.addValue(AUTHOR_1.ID, 1); query1.addValue(AUTHOR_1.LAST_NAME, "Nowak"); query1.execute();InsertQuery query2 = dsl.insertQuery(AUTHOR_2); query2.addValue(AUTHOR_2.ID, 1); query2.addValue(AUTHOR_2.LAST_NAME, "Nowak"); query2.execute();// select Assert.assertTrue(dsl .selectFrom(AUTHOR_1) .where(AUTHOR_1.LAST_NAME.eq("Nowak")) .fetch().size() == 1);Assert.assertTrue(dsl .selectFrom(AUTHOR_2) .where(AUTHOR_2.LAST_NAME.eq("Nowak")) .fetch().size() == 1); You can see that multiple classes generated by jOOQ need to be used, so depending on how many partitions you have, generated classes can pollute your codebase. Also, imagine that you eventually need to iterate over partitions, which would be cumbersome to do with this approach. Another approach could be that you use jOOQ to build fields and tables using string manipulation but that is error prone again and prevents support for generic type safety. Also, consider the case where you want true data separation in terms of multi-tenancy. You see that there are some considerations to do when working with partitioning. Fortunately jOOQ offers various ways of working with partitioned tables, and in the following we’ll compare approaches, so that you can choose the one most suitable for you. Using jOOQ with partitioning and multi-tenancy JOOQ’s runtime-schema mapping is often used to realize database environments, such that for example during development, one database is queried but when deployed to production, the queries are going to another database. Multi-tenancy is another recommended use case for runtime-schema mapping as it allows for strict partitioning and for configuring your application to only use databases or tables being configured in the runtime-schema mapping. So running the same code would result in working with different databases or tables depending on the configuration, which allows for true separation of data in terms of multi-tenancy. The following configuration taken from the jOOQ documentation is executed when creating the DSLContext so it can be considered a system-wide setting: Settings settings = new Settings() .withRenderMapping(new RenderMapping() .withSchemata( new MappedSchema().withInput("DEV") .withOutput("MY_BOOK_WORLD") .withTables( new MappedTable().withInput("AUTHOR") .withOutput("AUTHOR_1"))));// Add the settings to the Configuration DSLContext create = DSL.using( connection, SQLDialect.ORACLE, settings);// Run queries with the "mapped" configuration create.selectFrom(AUTHOR).fetch();// results in SQL: // “SELECT * FROM MY_BOOK_WORLD.AUTHOR_1” Using this approach you can map one table to one partition permanently eg. “AUTHOR” to “AUTHOR_1” for environment “DEV”. In another environment you could choose to map “AUTHOR” table to “AUTHOR_2”. Runtime-schema mapping only allows you to map to exactly one table on a per-query basis, so you could not handle the use case where you would want to manipulate more than one table partition. If you would like to have more flexibility you might want to consider the next approach. Using jOOQ with partitioning and without multi-tenancy If you need to handle multiple table partitions without having multi-tenancy, you need a more flexible way of accessing partitions. The following example shows how you can do it in a dynamic and type safe way, avoiding errors and being usable in the same elegant way you are used to by jOOQ: // add for(int i=1; i<=2; i++) { Builder part = forPartition(i); InsertQuery query = dsl.insertQuery(part.table(AUTHOR)); query.addValue(part.field(AUTHOR.ID), 1); query.addValue(part.field(AUTHOR.LAST_NAME), "Nowak"); query.execute(); }// selectfor(int i=1; i<=2; i++) { Builder part = forPartition(i); Assert.assertTrue(dsl .selectFrom(part.table(AUTHOR)) .where(part.field(AUTHOR.LAST_NAME).eq("Nowak")) .fetch() .size() == 1); } What you can see above is that the partition numbers are abstracted away so that you can use “AUTHOR” table instead of “AUTHOR_1”. Thus, your code won’t be polluted with many generated classes. Another thing is that the the partitioner object is initialized dynamically so you can use it for example in a loop like above. Also it follows the Builder pattern so that you can operate on it like you are used to by jOOQ. The code above is doing exactly the same as the first trivial snippet, but there are multiple benefits like type safe and reusable access to partitioned tables. Integration of jOOQ partitioning without multi-tenancy into a Maven build process (optional) If you are using Continuous-Integration you can integrate the solution above so that jOOQ is not generating tables for the partitioned tables. This can be achieved using a regular expression that excludes certain table names when generating Java classes. When using Maven, your integration might look something like this: <generator> <name>org.jooq.util.DefaultGenerator</name> <database> <name>org.jooq.util.postgres.PostgresDatabase</name> <includes>.*</includes> <excludes>.*_[0-9]+</excludes> <inputSchema>${db.schema}</inputSchema> </database> <target> <packageName>com.your.company.jooq</packageName> <directory>target/generated-sources/jooq</directory> </target> </generator> Then it’s just calling mvn install and jOOQ maven plugin will be generating the database schema in compilation time. Integrating jOOQ with PostgreSQL: Partitioning This article described how jOOQ in combination with the partitioning feature of PostgreSQL can be used to implement multi-tenancy and improve database performance. PostgreSQL’s documentation states that for partitioning “the benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.” Achieving support for partitioning with jOOQ is as easy as adding configuration or a small utility class, jOOQ is then able to support partitioning with or without multi-tenancy and without sacrificing type safety. Apart from Java-level integration, the described solution also smoothly integrates into your build and test process. You may want to look at the sources of the partitioner utility class which also includes a test-class so that you can see the behavior and integration in more detail. Please let us know if you need support for this or other jOOQ integrations within your environment. UWS Software Service (UWS) is an official jOOQ integration partner.Reference: Integrating jOOQ with PostgreSQL: Partitioning from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

Feature Toggles (Feature Switches or Feature Flags) vs Feature Branches

Feature Branches If you are using branches, you are not doing Continuous Integration / Deployment / Delivery! You might have great Code Coverage with unit tests, you might be doing TDD, you might have functional and integrations tests written in BDD format and you might run all of them on every commit to the repository. However, if you are having branches, integration is delayed until they are merged and that means that there is no continuous integration.   People tend to like feature branches. They provide flexibility to decide what to release and when. All there is to do is to merge features that will be released into the main branch and deploy to production. The problem with this approach is the delay. Integration is delayed until the merge. We cannot know whether all those separately developed features work well together until the merge is done and all the unit, functional, integration and manual tests are run. On top of that, there is the problem caused with the pain of merge itself. Days, weeks or even months of work are suddenly merged into the main branch and from there to all not-yet-to-be-released branches. The idea of Continuous Integration is to detect problems as soon as possible. Minimum delay until problems are found. The most common way to mitigate this problem is to have feature branches small and short-lived. They can be merged into the main branch after only a day or two. However, in many cases this is simply not possible. Feature might be too big to be developed in only few days. Business might require us to release it only in conjunction with other features. In other words, feature branches are great for allowing us to decide what to release and when. They provide us the flexibility but they prevent us from doing Continuous Integration. If there would be only one branch, we could do continuous integration but the release problems that were solved with feature branches would come back to haunt us. Feature Toggles Feature Toggles (sometimes called Feature Switches or Feature Flags) should solve the need to be able to deploy only selected set of features while keeping only one (main) branch. With them we can do all the work in one branch, have continuous integration taking care of the quality of our code and use flags to turn features off until they are ready to be released. We can have all the benefits of Continuous Integration together with the flexibility to choose which features will be available and which will be hidden. Moreover, it is a step towards Continuous Deployment. If we do have satisfying automated test coverage and can switch features off and on, there is nothing really preventing us from deploying to production each commit that passed all verification. Even if some bug sneaks into the production, with Feature Toggles if would be very easy to turn that feature off until it’s fixed. The basic idea is to have a configuration file that defines toggles for features that are pending to be fully released. Alternative to a configuration file can be a database table. The application should use those toggles to decide whether to make some feature available to the users or not. They might even be used to display some functionality to a subset of users based on their role, geographic location or random sample. There are only few rules to be followed:Use toggles only until they are fully deployed and proven to work. Otherwise, you might end up with “spaghetti code” full of if/else statements containing old toggles that are not in use any more. Do not spend too much time testing toggles. It is in most cases enough to confirm that the entry point into some new feature is not visible. That can be, for example, link to that new feature. Do not overuse toggles. Do not use them when there is no need for them. For example, you might be developing a new screen that is accessible through the link in the home page. If that link is added at the end, there might be no need to have toggle that hides it.Examples There are many libraries that provide the solution for Feature Toggles. However, implementing them is so easy that you might choose to do it yourself. Here are few examples of possible implementations of Feature Toggles. They use AngularJS but the logic behind them could be applied to any other language or framework. [JSON with Feature Toggles] { "feature1": { "displayed": false }, "feature2": { "displayed": true }, "feature3": { "displayed": false } } I tend to have more values in my toggles JSON. Some of them as disabled, description, allowed_users, etc. The above example is only the bare bones minimal solution. Next, we should load the JSON into, in this example, AngularJS scope. [AngularJS that loads Feature Toggles into the scope] $http.get('/api/v1/data/features').then(function(response) { $scope.features = response.data; }); Once Feature Toggles are in the scope, the rest is fairly easy. Following AngularJS example hides the feature when displayed is set to false. [AngularJS HTML that hides some element] <div ng-show="features.feature1.displayed"> <!--Feature HTML--> </div> In some cases, you might be working on a completely new screen that will replace the old one. In that scenario something like the following might be the solution. [AngularJS that returns URL depending on Feature Toggle] $scope.getMyUrl = function() { if($scope.features.feature2.displayed) { return 'myNewUrl'; } else { return 'myOldUrl'; } } Then, from the HTML, it would be something like the example below. [AngularJS HTML that toggles URL] <a href="{{getMyUrl()}}"> This link leads somewhere </a> In some other cases change might be on the server. Following REST API best practices, you’d create a new API version and use Feature Toggle to decide which one to use. Code could be like following. [AngularJS that performs request that depends on Feature Toggle] var apiUrl; if($scope.features.feature2.displayed) { apiUrl = '/api/v2/myFancyFeature'; } else { apiUrl = '/api/v1/myFancyFeature'; } $http.get(apiUrl).then(function(response) { // Do something with the response }); Same logic applied to the front-end could be applied to the back-end or any other type of the application. In most cases it boils down to simple if/else statements.Reference: Feature Toggles (Feature Switches or Feature Flags) vs Feature Branches from our JCG partner Viktor Farcic at the Technology conversations blog....

Monitoring and Filtering Application Log to Mail with log4j

In today’s post I’m going to show you how to filter log statements into a warning email. This came out of a necessity to monitor a few critical points of one application I was working on. There are tools that you can use to perform application monitoring. I’m not going into details about those tools, but sometimes it’s just easier to have the application send a warning email. I mostly use log4j for my logging requirements. Unfortunately, since there are so many logging frameworks in the Java ecosystem this post only covers a piece of it. I might do something for the others in the future, but I would like to reinforce a old post of António Gonçalves about standardize a logging API: I need you for Logging API Spec Lead !. The sample covered here is for log4j, but the github project also contains a log4j2 sample. Use Case To give a little more detail, I want to be informed when the application generates erros, but also ignore erros that are already handled by the application itself. For a more concrete example, I had a case where a database insert could generate a constraint violation exception, but this error was handled specifically by the application. Even so, the JDBC driver logs the exception. For this case, I was not interested in getting a notification. Setting up SMTPAppender Anyway, looking into log4j, you can create an appender that sends all your log to the email, just check SMTPAppender. It looks like this: log4j-SMTPAppender <appender name="SMTP" class="org.apache.log4j.net.SMTPAppender"> <errorHandler class="org.apache.log4j.helpers.OnlyOnceErrorHandler"/><param name="Threshold" value="ERROR"/> <param name="To" value="someone@somemail.com"/> <param name="From" value="someonelse@somemail.com"/> <param name="Subject" value="Log Errors"/> <param name="SMTPHost" value="smtp.somemail.com"/> <param name="SMTPUsername" value="username"/> <param name="SMTPPassword" value="password"/> <param name="BufferSize" value="1"/> <param name="SMTPDebug" value="true"/><layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d{ABSOLUTE} %-5p [%c{1}:%L] %m%n"/> </layout> </appender> Filtering Our filtering needs are not available in the standard log4j lib. You need to use the log4j-extras which provide you with ExpressionFilter that supports filtering of complex expressions. We are also using StringMatchFilter from the regular log4j lib. Now, we can add a triggeringPolicy to the SMTPAppender: log4j-triggeringPolicy <triggeringPolicy class="org.apache.log4j.rolling.FilterBasedTriggeringPolicy"> <filter class="org.apache.log4j.varia.StringMatchFilter"> <param name="StringToMatch" value="ERROR01"/> <param name="AcceptOnMatch" value="false"/> </filter><filter class="org.apache.log4j.filter.ExpressionFilter"> <param name="expression" value="CLASS LIKE .*Log4jExpressionFilter.*"/> <param name="acceptOnMatch" value="false"/> </filter><filter class="org.apache.log4j.filter.LevelRangeFilter"> <param name="levelMin" value="ERROR"/> <param name="levelMax" value="FATAL"/> </filter> </triggeringPolicy> This configuration will filter the log to email only the ERROR and FATAL thresholds which are NOT logged in classes with Log4jExpressionFilter in it’s name and DON’T have ERROR01 in the log message. Have a look into LoggingEventFieldResolver to see which others expressions you can use with ExpressionFilter. You can use EXCEPTION, METHOD and a few others which are very useful. Testing Testing a SMTPAppender is not easy if you rely on real servers. Fortunately, you can use mock-javamail and you don’t even have to worry about polluting a SMTP server. This is also included in the github project. Resources You can clone a full working copy from my github repository for log4j and log4j2. Log4j Mail Filter Since I may modify the code in the future, you can download the original source of this post from the release 1.0. In alternative, clone the repo, and checkout the tag from release 1.0 with the following command: git checkout 1.0.Reference: Monitoring and Filtering Application Log to Mail with log4j from our JCG partner Roberto Cortez at the Roberto Cortez Java Blog blog....

Parameterized Test Runner in JUnit

We all have written unit tests where in a single test tests for different possible input-output combinations. Lets look how its done by taking a simple fibonacci series example. The below code computes the fibonacci series for number of elements mentioned:             import java.math.BigInteger; import java.util.ArrayList; import java.util.List;public class Fibonacci{public List<Integer> getFiboSeries(int numberOfElements) { List<Integer> fiboSeries = new ArrayList<>(numberOfElements); for (int i = 0; i < numberOfElements; i++) { //First 2 elements are 1,1 if (i == 0 || i == 1) { fiboSeries.add(i, 1); } else { int firstPrev = fiboSeries.get(i - 2); int secondPrev = fiboSeries.get(i - 1); int fiboElement = firstPrev + secondPrev; fiboSeries.add(i, fiboElement); } } return fiboSeries; }} Lets see the conventional way of testing the above code with multiple input values import java.util.List; import org.junit.Test; import java.util.Arrays; import static org.junit.Assert.*;public class FibonacciCachedTest {/** * Test of getFiboSeries method, of class Fibonacci. */ @Test public void testGetFiboSeries() { System.out.println("getFiboSeries"); int numberOfElements = 5; Fibonacci instance = new Fibonacci(); List<Integer> expResult = Arrays.asList(1, 1, 2, 3, 5); List<Integer> result = instance.getFiboSeries(numberOfElements); assertEquals(expResult, result);numberOfElements = 10; expResult = Arrays.asList(1, 1, 2, 3, 5, 8, 13, 21, 34, 55); result = instance.getFiboSeries(numberOfElements); assertEquals(expResult, result);} } So we have been able to test for 2 inputs, imagine extending the above for more number of inputs? Unnecessary bloat up in the test code. JUnit provides a different Runner called Parameterized runner which exposes a static method annotated with @Parameters. This method has to be implemented to return the inputs and expected output collection which will be used to run the test defined in the class. Lets look at the code which does this: import java.util.Arrays; import java.util.Collection; import java.util.List; import static org.junit.Assert.assertEquals; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runners.Parameterized;@RunWith(Parameterized.class) public class ParametrizedFiboTest {private final int number; private final List<Integer> values;public ParametrizedFiboTest(FiboInput input) { this.number = input.number; this.values = input.values; }@Parameterized.Parameters public static Collection<Object[]> fiboData() { return Arrays.asList(new Object[][]{ {new FiboInput(1, Arrays.asList(1))}, {new FiboInput(2, Arrays.asList(1, 1))}, {new FiboInput(3, Arrays.asList(1, 1, 2))}, {new FiboInput(4, Arrays.asList(1, 1, 2, 3))}, {new FiboInput(5, Arrays.asList(1, 1, 2, 3, 5))}, {new FiboInput(6, Arrays.asList(1, 1, 2, 3, 5, 8))} }); }@Test public void testGetFiboSeries() { FibonacciUncached instance = new FibonacciUncached(); List<Integer> result = instance.getFiboSeries(this.number); assertEquals(this.values, result); }}class FiboInput {public int number; public List<Integer> values;public FiboInput(int number, List<Integer> values) { this.number = number; this.values = values; } } This way we would just need to add a new input and expected output in the fiboData() method to get this working!Reference: Parameterized Test Runner in JUnit from our JCG partner Mohamed Sanaulla at the Experiences Unlimited blog....

How to successfully attack a software dinosaur?

We all have “enjoyed” working with some software that was purchased because “You can’t get fired because you bought…”. This software is known for being the industry leader. Not because it is easy to use, easy to integrate, easy to scale, easy to do anything with,… It often is quite the opposite. So why do people buy it? First of all it is easy to find experts. There are people out there that have been “enjoying” working with this solution for the last 10 years. It is relatively stable and reliable. There is a complete solution for it with hundreds or thousands of partner solutions. People have just given up on trying to convince their bosses on trying something different.   5 steps to disrupt the Dinosaur Step 1: the basic use cases The Pareto rule. What are the 80% of the use cases that only reflect 20% of the functionality. Step 2: the easy & beautiful & horizontally scalable & multi-tenant clone Make a solution that reflects 80% of these use cases but make it beautiful and incredibly easy to use. Use the latest horizontally scalable backends, e.g. Cassandra. Build multi-tenancy into the software from day 1. Step 3: make it open source Release the “improved clone” as an open source product. Step 4: the plugin economy Add a plugin mechanism that allows others to create plugins to fill in the 20% use case gap. Create a marketplace hence others can make money with their plugins. Make money by being the broker. Think App Store but ideally improve the model. Step 5: the SaaS version Create a SaaS version and attack the bottom part of the market. Focus on the enterprises that could never afford the original product. Slowly move upwards towards the top segment. The expected result You will make have a startup or a new business unit that will make money pretty quickly and will soon be the target of a big purchase offer from the Dinosaur or one of its competitors. You will spend a lot less sleepless nights trying to make money this way then via the creation of the next Angry Bird, Uber 0r Facebook clone.Reference: How to successfully attack a software dinosaur? from our JCG partner Maarten Ectors at the Telruptive blog....

Capacity Planning and the Project Portfolio

I was problem-solving with a potential client the other day. They want to manage their project portfolio. They use Jira, so they think they can see everything everyone is doing. (I’m a little skeptical, but, okay.) They want to know how much the teams can do, so they can do capacity planning based on what the teams can do. (Red flag #1) The worst part? They don’t have feature teams. They have component teams: front end, middleware, back end. You might, too. (Red flag #2) Problem #1: They have a very large program, not a series of unrelated projects. They also have projects.   Problem #2: They want to use capacity planning, instead of flowing work through teams. They are setting themselves up to optimize at the lowest level, instead of optimizing at the highest level of the organization. If you read Manage Your Project Portfolio: Increase Your Capacity and Finish More Projects, you understand this problem. A program is a strategic collection of projects where the business value of the all of the projects is greater than any one of the projects itself. Each project has value. Yes. But all together, the program, has much more value. You have to consider the program has a whole. Don’t Predict the Project Portfolio Based on Capacity If you are considering doing capacity planning on what the teams can do based on their estimation or previous capacity, don’t do it. First, you can’t possibly know based on previous data. Why? Because the teams are interconnected in interesting ways. When you have component teams, not feature teams, their interdependencies are significant and unpredictable. Your ability to predict the future based on past velocity? Zero. Nada. Zilch. This is legacy thinking from waterfall. Well, you can try to do it this way. But you will be wrong in many dimensions:You will make mistakes because of prediction based on estimation. Estimates are guesses. When you have teams using relative estimation, you have problems. Your estimates will be off because of the silent interdependencies that arise from component teams. No one can predict these if you have large stories, even if you do awesome program management. The larger the stories, the more your estimates are off. The longer the planning horizon, the more your estimates are off. You will miss all the great ideas for your project portfolio that arise from innovation that you can’t predict in advance. As the teams complete features, and as the product owners realize what the teams do, the teams and the product owners will have innovative ideas. You, the management team, want to be able to capitalize on this feedback.It’s not that estimates are bad. It’s that estimates are off. The more teams you have, the less your estimates are normalized between teams. Your t-shirt sizes are not my Fibonacci numbers, are not that team’s swarming or mobbing. (It doesn’t matter if you have component teams or feature teams for this to be true.) When you have component teams, you have the additional problem of not knowing how the interdependencies affect your estimates. Your estimates will be off, because no one’s estimates take the interdependencies into account. You don’t want to normalize estimates among teams. You want to normalize story size. Once you make story size really small, it doesn’t matter what the estimates are. When you  make the story size really small, the product owners are in charge of the team’s capacity and release dates. Why? Because they are in charge of the backlogs and the roadmaps. The more a program stops trying to estimate at the low level and uses small stories and manages interdependencies at the team level, the more the program has momentum. The part where you gather all the projects? Do that part. You need to see all the work. Yes. that part works and helps the program see where they are going. Use Value for the Project Portfolio Okay, so you try to estimate the value of the features, epics, or themes in the roadmap of the project portfolio. Maybe you even use the cost of delay as Jutta and I suggest in Diving for Hidden Treasures: Finding the Real Value in Your Project Portfolio (yes, this book is still in progress). How will you know if you are correct? You don’t. You see the demos the teams provide, and you reassess on a reasonable time basis. What’s reasonable? Not every week or two. Give the teams a chance to make progress. If people are multitasking, not more often than once every two months, or every quarter. They have to get to each project. Hint: stop the multitasking and you get tons more throughput.Reference: Capacity Planning and the Project Portfolio from our JCG partner Johanna Rothman at the Managing Product Development blog....

Infeasible software projects are launched all the time

Infeasible software projects are launched all the time and teams are continually caught up in them, but what is the real source of the problem? There are 2 year actual projects for which the executives set a 6 month deadline. The project is guaranteed to fail but is this due to executive ignorance or IT impotence?      There is no schedule risk in an infeasible project because the deadline will be missed.  Schedule risk only exists in the presence of uncertainty (see Schedule Risk is a Red Herring!!!) As you might expect, all executives and IT manager share responsibility for infeasible projects that turn into death marches.  Learn about the nasty side effects Death March Calculus. The primary causes for infeasible projects are:Rejection of formal estimates No estimation or improper estimation methods are usedRejecting Formal EstimatesThis situation occurs frequently; an example would be the Denver Baggage Handling System (see Case Study). The project was automatically estimated (correctly) to take 2 years; however, executives declared that IT would only have 1 year to deliver. Of course, they failed 1. The deadline was rejected by executives because it did not fit their desires.  They could not have enjoyed the subsequent software disaster and bad press.        When executives ignore formal estimates they get what they deserve.  Formal estimates are ignored because executives believe through sheer force of will that they can set deadlines. If IT managed to get the organization to pay for formal tools for estimating then it is not their problem that the executives refuse to go along with it.   Improper Estimation Methods The next situation that occurs frequently is using estimation processes that have low validity.  Estimation has been extensively studied and documented by Tom DeMarco, Capers Jones, Ed Yourdon, and others. Improper estimation methods will underestimate a software project every time. Fast estimates will be based on what you can think of, unfortunately, software is not tangible and so what you are aware of is like the tip of an iceberg. None of this prevents executives demanding fast estimates from development.  Even worse, development managers will cave in to ridiculous demands and actually give fast estimates. Poor estimates are guaranteed to lead to infeasible projects (see Who needs Formal Measurement?) Poor estimates are delivered by IT managers that:Can’t convince executives to use formal tools Give in to extreme pressure for fast estimatesInfeasible projects that result from poor estimates are a matter of IT impotence. ConclusionBoth executive ignorance and IT impotence lead to infeasible projects on a regular basis because of poor estimates and rejecting estimates; so there is no surprise here. However, infeasible projects are a failure of executives and IT equally because we are all on the same team. It is not possible for part of the organization to succeed if the other one fails. Possibly a greater share of problem is with IT management.  After all, whose responsibility is a bad decision — the guys that know what the issues are or the ones that don’t.   If a child wants ice cream before they eat dinner then whose fault is it if you cave in and give them the ice cream? Unfortunately, even after 60 years of developing software projects, IT managers are either as ignorant as the executives or simply have no intestinal fortitude.Even when IT managers convince executives of the importance of estimating tools, the estimates are routinely discarded because they do not meet executive expectations. Rejection of automated estimates: productivity -16%, quality -22% Until we can get a generation of IT managers that are prepared to educate executives on the necessity of proper estimation and be stubborn about holding to those estimates, we are likely to continue to have an estimated $3 trillion in failures of software projects every year.   End Notes 1For inquiring minds, good automated estimation systems have been shown to be within 5% of time and cost on a regular basis. Contact me for additional information. ReferencesJones, Capers. SCORING AND EVALUATING SOFTWARE METHODS, PRACTICES, AND RESULTS. 2008. Jones, Capers. The Economics of Software Quality. 2011 Kahneman, Daniel. Thinking, Fast and Slow. 2011Reference: Infeasible software projects are launched all the time from our JCG partner Dalip Mahal at the Accelerated Development blog....

Don’t waste time on Code Reviews

Less than half of development teams do code reviews and the other half are probably not getting as much out of code reviews as they should. Here’s how to not waste time on code reviews. Keep it Simple Many people still think of code reviews as expensive formal code inspection meetings, with lots of prep work required before a room full of reviewers can slowly walk through the code together around a table with the help of a moderator and a secretary. Lots of hassles and delays and paperwork. But you don’t have to do code reviews this way – and you shouldn’t. There are several recent studies which prove that setting up and holding formal code review meetings add to development delays and costs without adding value. While it can take weeks to schedule a code review meeting, only 4% of defects are found in the meeting itself – the rest are all found by reviewers looking through code on their own. At shops like Microsoft and Google, developers don’t attend formal code review meetings. Instead, they take advantage of collaborative code review platforms like Gerrit, CodeFlow, Collaborator, or ReviewBoard or Crucible, or use e-mail to request reviews asynchronously and to exchange information with reviewers. These light weight reviews (done properly) are just as effective at finding problems in code as inspections, but much less expensive and much easier to schedule and manage. Which means they are done more often. And these reviews fit much better with iterative, incremental development, providing developers with faster feedback (within a few hours or at most a couple of days, instead of weeks for formal inspections). Keep the number of reviewers small Some people believe that if two heads are better than one, then three heads are even better, and four heads even more better and so on… So why not invite everyone on the team into a code review? Answer: because it is a tragic waste of time and money. As with any practice, you will quickly reach a point of diminishing returns as you try to get more people to look at the same code. On average, one reviewer will find roughly half of the defects on their own. In fact, in a study at Cisco, developers who double-checked their own work found half of the defects without the help of a reviewer at all! A second reviewer will find ½ as many new problems as the first reviewer. Beyond this point, you are wasting time and money. One study showed no difference in the number of problems found by teams of 3, 4 or 5 individuals, while another showed that 2 reviewers actually did a better job than 4. This is partly because of overlap and redundancy – more reviewers means more people looking for and finding the same problems (and more people coming up with false positive findings that the author has to sift through). And as Geoff Crain at Atlassian explains, there is a “social loafing” problem: complacency and a false sense of security set in as you add more reviewers. Because each reviewer knows that somebody else is looking at the same code, they are under less pressure to find problems. This is why at shops like Google and Microsoft where reviews are done successfully, the median number of reviewers is 2 (although there are times when an author may ask for more input, especially when the reviewers don’t agree with each other). But what’s even more important than getting the right number of reviewers is getting the right people to review your code. Code Reviews shouldn’t be done by n00bs – but they should be done for n00bs By reviewing other people’s code a developer will get exposed to more of the code base, and learn some new ideas and tricks. But you can’t rely on new team members to learn how the system works or to really understand the coding conventions and architecture just by reviewing other developers’ code. Asking a new team member to review other people’s code is a lousy way to train people, and a lousy way to do code reviews. Research backs up what should be obvious: the effectiveness of code reviews depend heavily on the reviewer’s skill and familiarity with the problem domain and with the code. Like other areas in software development, the differences in revew effectiveness can be huge, as much as 10x between best and worst performers. A study on code reviews at Microsoft found that reviewers from outside of the team or who were new to the team and didn’t know the code or the problem area could only do a superficial job of finding formatting issues or simple logic bugs. This means that your best developers, team leads and technical architects will spend a lot of time reviewing code – and they should. You need reviewers who are good at reading code and good at debugging, and who know the language, framework and problem area well. They will do a much better job of finding problems, and can provide much more valuable feedback, including suggestions on how to solve the problem in a simpler or more efficient way, or how to make better use of the language and frameworks. And they can do all of this much faster. If you want new developers to learn about the code and coding conventions and architecture, it will be much more effective to pair new developers up with an experienced team member in pair programming or pair debugging. If you want new, inexperienced developers to do reviews (or if you have no choice), lower your expectations. Get them to review straightforward code changes (which don’t require in depth reviews), or recognize that you will need to depend a lot more on static analysis tools and another reviewer to find real problems. Substance over Style Reviewing code against coding standards is a sad way for a developer to spend their valuable time. Fight the religious style wars early, get everyone to use the same coding style templates in their IDEs and use a tool like Checkstyle to ensure that code is formatted consistently. Free up reviewers to focus on the things that matter: helping developers write better code, code that works correctly and that is easy to maintain.“I’ve seen quite a few code reviews where someone commented on formatting while missing the fact that there were security issues or data model issues.” Senior developer at Microsoft, from a study on code review practices Correctness – make sure that the code works, look for bugs that might be hard to find in testing:Functional correctness: does the code do what it is supposed to do – the reviewer needs to know the problem area, requirements and usually something about this part of the code to be effective at finding functional correctness issues Coding errors: low-level coding mistakes like using <= instead of <, off-by-one errors, using the wrong variable (like mixing up lessee and lessor), copy and paste errors, leaving debugging code in by accident Design mistakes: errors of omission, incorrect assumptions, messing up architectural and design patterns like MVC, abuse of trust Safety and defensiveness: data validation, threading and concurrency (time of check/time of use mistakes, deadlocks and race conditions), error handling and exception handling and other corner cases Malicious code: back doors or trap doors, time bombs or logic bombs Security: properly enforcing security and privacy controls (authentication, access control, auditing, encryption)Maintainability:Clarity: class and method and variable naming, comments, … Consistency: using common routines or language/library features instead of rolling your own, following established conventions and patterns Organization: poor structure, duplicate or unused/dead code Approach: areas where the reviewer can see a simpler or cleaner or more efficient implementationWhere should reviewers spend most of their time? Research shows that reviewers find far more maintainability issues than defects (a ratio of 75:25) and spend more time on code clarity and understandability problems than correctness issues. There are a few reasons for this. Finding bugs in code is hard. Finding bugs in someone else’s code is even harder. In many cases, reviewers don’t know enough to find material bugs or offer meaningful insight on how to solve problems. Or they don’t have time to do a good job. So they cherry pick easy code clarity issues like poor naming or formatting inconsistencies. But even experienced and serious reviewers can get caught up in what at first seem to be minor issues about naming or formatting, because they need to understand the code before they can find bugs, and code that is unnecessarily hard to read gets in the way and distracts them from more important issues. This is why programmers at Microsoft will sometimes ask for 2 different reviews: a superficial “code cleanup” review from one reviewer that looks at standards and code clarity and editing issues, followed by a more in depth review to check correctness after the code has been tidied up. Use static analysis to make reviews more efficient Take advantage of static analysis tools upfront to make reviews more efficient. There’s no excuse not to at least use free tools like Findbugs and PMD for Java to catch common coding bugs and inconsistencies, and sloppy or messy code and dead code before submitting the code to someone else for review. This frees the reviewer up from having to look for micro-problems and bad practices, so they can look for higher-level mistakes instead. But remember that static analysis is only a tool to help with code reviews – not a substitute. Static analysis tools can’t find functional correctness problems or design inconsistencies or errors of omission, or help you to find a better or simpler way to solve a problem. Where’s the risk? We try to review all code changes. But you can get most of the benefits of code reviews by following the 80:20 rule: focus reviews on high risk code, and high risk changes. High risk code:Network-facing APIs Plumbing (framework code, security libraries….) Critical business logic and workflows Command and control and root admin functions Safety-critical or performance-critical (especially real-time) sections Code that handles private or sensitive data Old code, code that is complex, code that has been worked on by a lot of different people, code that has had a lot of bugs in the past – error prone codeHigh risk changes:Code written by a developer who has just joined the team Big changes Large-scale refactoring (redesign disguised as refactoring)Get the most out of code reviews Code reviews add to the cost of development, and if you don’t do them right they can destroy productivity and alienate the team. But they are also an important way to find bugs and for developers to help each other to write better code. So do them right. Don’t waste time on meetings and moderators and paper work. Do reviews early and often. Keep the feedback loops as tight as possible. Ask everyone to take reviews seriously – developers and reviewers. No rubber stamping, or letting each other off of the hook. Make reviews simple, but not sloppy. Ask the reviewers to focus on what really matters: correctness issues, and things that make the code harder to understand and harder to maintain. Don’t waste time arguing about formatting or style. Make sure that you always review high risk code and high risk changes. Get the best people available to do the job – when it comes to reviewers, quality is much more important than quantity. Remember that code reviews are only one part of a quality program. Instead of asking more people to review code, you will get more value by putting time into design reviews or writing better testing tools or better tests. A code review is a terrible thing to waste.Reference: Don’t waste time on Code Reviews from our JCG partner Jim Bird at the Building Real Software blog....

Route 53 Benchmark: The New AWS Geolocation’s Surprising Results

Latency vs. Geolocation: Testing DNS configurations across multiple EC2 regions using AWS Route 53 If you’re using the AWS stack, you’ve probably been through this: Deciding which EC2 instance to fire up and which region to deploy them on is tricky. Some of you might have started multiple EC2 instances behind a load balancer – but that’s almost never enough. Our Aussie friends shouldn’t have to wait for resources coming from Virginia. What we really need is an easy to use global solution. This is where Amazon’s Route 53 DNS routing comes in handy. Adding routing policies to your domain will help guarantee that users get the fastest responses, and as we all know, speed == happiness. At the end of July 2014, Amazon announced a new Route 53 routing policy: Geolocation. We are big advocates of latency-based routing so we wanted to put the new policy to the test. Results Since this benchmark’s goal is to make sense of DNS, we looked at the DNS name lookup time. We used curl to get a breakdown of how long each step lasted. Here are the average lookup times from EC2 regions:Full results can be found here. InsightsThe tests show that switching to geolocation adds 75ms on average – a very high figure, especially if you’re trying to optimize your cluster and first user experience. If we exclude Sao Paulo from the list altogether we get an impressive 127 ms average lookup time across other regions for both policies. I checked the numbers twice to make sure it’s not a mirage. Exactly 1 2 7 ms whether you go with geolocation or latency. On our EC2->S3 benchmark it was Sydney that was kicked out, with Route 53 it’s Sao Paulo. The biggest winner is Europe. It had the lowest latency-based lookup, the lowest geolocation-based lookup and the lowest difference between latency and geolocation – only 3 ms! At the bottom of the list – Sao Paulo performed the worst. It came in last in all three criteria: latency, geolocation and difference. Geolocation lookup in South America took 3x times more than the latency lookup. Zooming into North America, the fastest name lookup in both latency and geolocation was California. The slowest one was Virginia, which had the second biggest difference between latency and geolocation. Geolocation in our tests was around 1.5x times slower. Geolocation was faster in Oregon, California and Singapore. Latency was faster in Virginia, Europe, Japan and Brazil.Setting up the test EC2 – We deployed a simple Tomcat/Nginx web application into EC2 instances (m3.medium) on all AWS regions available (excluding China’s beta region). The webapp contained several Java Servlets that returned HTTP_OK 200 upon request. Route 53 – We purchased two domains beforehand. One for latency and one for geolocation. AWS has great docs on how to setup the record sets for latency-based and geolocation-based routing. For the latency-based we redirected the domain to all regions where we have EC2 running. For geolocation we redirected each continent to the closest region. Bash – After setting up all instances, we ran this snippet to test for lookup time for the domains. We decided to look at lookup times alone, since the connect time shown by curl was ~1ms and didn’t change the results. sudo /etc/init.d/nscd restart ## Restarting the DNS on Ubuntu so we won’t have cache curl –no-sessionid -s -w ‘\nLookup time:\t%{time_namelookup}\n’ -o /dev/null http://takipi-route53-testing-{latency|geo}.com/webapp/speed/normal ## Measuring name lookup time Conclusion There was no knock-out winner here. Although latency-based routing proved to be faster, there were some cases where geolocation-based routing performed better. The fastest average lookup was latency-based originating from Europe. In the end, unless you require some country-specific routing, DNS routing policy’s sweet spot is (still) latency-based routing.Reference: Route 53 Benchmark: The New AWS Geolocation’s Surprising Results from our JCG partner Chen Harel at the Takipi blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: