Featured FREE Whitepapers

What's New Here?


Hosting a Maven repository on github (with sources and javadoc)

How to make a small open sourced library available to other developers via maven? One way is to deploy it on Maven Central Repository. What I’d like to do is to deploy it to github, so I can modify it freely. This post will tell you how to do that. The typical way I deploy artifacts to a github is to use mvn deploy. Here are steps:Use site-maven-plugin to push the artifacts to github Use maven-javadoc-plugin to push the javadoc Use maven-source-plugin to push the source Configure maven to use the remote mvn-repo as a maven repository  Configure maven-deploy-plugin First, I add the following snippnet to tell maven to deploy artifacts to a temporary location inside my target directory: <distributionManagement> <repository> <id>internal.repo</id> <name>Temporary Staging Repository</name> <url>file://${project.build.directory}/mvn-repo</url> </repository> </distributionManagement> <plugins> <plugin> <artifactId>maven-deploy-plugin</artifactId> <version>2.8.1</version> <configuration> <altDeploymentRepository> internal.repo::default::file://${project.build.directory}/mvn-repo </altDeploymentRepository> </configuration> </plugin> </plugins> Configure maven Then I add my github.com authentication information to ~/.m2/settings.xml so that the github site-maven-plugin can push it to github: <settings> <servers> <server> <id>github</id> <password>OAUTH2TOKEN</password> </server> </servers> </settings> or <settings> <servers> <server> <id>github</id> <username>GitHubLogin</username> <password>GitHubPassw0rd</password> </server> </servers> </settings> Personally, I prefer the first way, because it is safer (without explicitly showing the password). To get the OAUTH2TOKEN of the github project, please go to settings --> Applications --> Genreate new token Configure the site-maven-plugin Configure the site-maven-plugin to upload from my temporary location to the mvn-repo branch on github: <plugin> <groupId>com.github.github</groupId> <artifactId>site-maven-plugin</artifactId> <version>0.9</version> <configuration> <message>Maven artifacts for ${project.version}</message> <noJekyll>true</noJekyll> <outputDirectory>${project.build.directory}/mvn-repo </outputDirectory> <branch>refs/heads/mvn-repo</branch> <includes> <include>**/*</include> </includes> <repositoryName>pengyifan-commons</repositoryName> <repositoryOwner>yfpeng</repositoryOwner> <server>github</server> </configuration> <executions> <execution> <goals> <goal>site</goal> </goals> <phase>deploy</phase> </execution> </executions> </plugin> When this post was writen, there was a bug in version 0.9 of site-maven-plugin. To work around, please git clone the 0.10-SNAPSHOT version and mvn install it mannually. Configure maven-source-plugin To add source code package into the mvn-repo, we need to configure the maven-source-plugin. Add the following code in pom.xml: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-source-plugin</artifactId> <version>2.3</version> <executions> <execution> <id>attach-sources</id> <goals> <goal>jar</goal> </goals> </execution> </executions> </plugin> Configure maven-javadoc-plugin To add java doc package into the mvn-repo, we need to configure the maven-javadoc-plugin. Add the following code in pom.xml: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-javadoc-plugin</artifactId> <executions> <execution> <id>attach-javadocs</id> <goals> <goal>jar</goal> </goals> </execution> </executions> </plugin> Now run mvn clean deploy. I saw maven-deploy-plugin “upload” the files to my local staging repository in the target directory, then site-maven-plugin commit those files and push them to the server. To verfy all binaries are there, visit github in the browser, and select the mvn-repo branch. Configure maven to use the remote mvn-repo as a maven repository There’s one more step we should take, which is to configure any poms to know where our repository is. We can add the following snippet to any project’s pom.xml: <repositories> <repository> <id>PROJECT-NAME-mvn-repo</id> <url>https://raw.github.com/USERNAME/PROJECT-NAME/mvn-repo/</url> <snapshots> <enabled>true</enabled> <updatePolicy>always</updatePolicy> </snapshots> </repository> </repositories>Reference: Hosting a Maven repository on github (with sources and javadoc) from our JCG partner Yifan Peng at the PGuru blog....

Testing mail code in Spring Boot application

Whilst building a Spring Boot application you may encounter a need of adding a mail configuration. Actually, configuring the mail in Spring Boot does not differ much from configuring it in Spring Bootless application. But how to test that mail configuration and submission is working fine? Let’s have a look. I assume that we have a simple Spring Boot application bootstrapped. If not, the easiest way to do it is by using the Spring Initializr. Adding javax.mail dependency We start by adding javax.mail dependency to build.gradle: compile 'javax.mail:mail:1.4.1'. We will also need Spring Context Support (if not present) that contains JavaMailSender support class. The dependency is: compile("org.springframework:spring-context-support") Java-based Configuration Spring Boot favors Java-based configuration. In order to add mail configuration, we add MailConfiguration class annotated with @Configuration annotation. The properties are stored in mail.properties (it is not required, though). Property values can be injected directly into beans using the @Value annotation: @Configuration @PropertySource("classpath:mail.properties") public class MailConfiguration {@Value("${mail.protocol}") private String protocol; @Value("${mail.host}") private String host; @Value("${mail.port}") private int port; @Value("${mail.smtp.auth}") private boolean auth; @Value("${mail.smtp.starttls.enable}") private boolean starttls; @Value("${mail.from}") private String from; @Value("${mail.username}") private String username; @Value("${mail.password}") private String password;@Bean public JavaMailSender javaMailSender() { JavaMailSenderImpl mailSender = new JavaMailSenderImpl(); Properties mailProperties = new Properties(); mailProperties.put("mail.smtp.auth", auth); mailProperties.put("mail.smtp.starttls.enable", starttls); mailSender.setJavaMailProperties(mailProperties); mailSender.setHost(host); mailSender.setPort(port); mailSender.setProtocol(protocol); mailSender.setUsername(username); mailSender.setPassword(password); return mailSender; } } The @PropertySource annotation makes mail.properties available for injection with @Value. annotation. If not done, you may expect an exception: java.lang.IllegalArgumentException: Could not resolve placeholder '<name>' in string value "${<name>}". And the mail.properties: mail.protocol=smtp mail.host=localhost mail.port=25 mail.smtp.auth=false mail.smtp.starttls.enable=false mail.from=me@localhost mail.username= mail.password= Mail endpoint In order to be able to send an email in our application, we can create a REST endpoint. We can use Spring’s SimpleMailMessage in order to quickly implement this endpoint. Let’s have a look: @RestController class MailSubmissionController {private final JavaMailSender javaMailSender;@Autowired MailSubmissionController(JavaMailSender javaMailSender) { this.javaMailSender = javaMailSender; }@RequestMapping("/mail") @ResponseStatus(HttpStatus.CREATED) SimpleMailMessage send() { SimpleMailMessage mailMessage = new SimpleMailMessage(); mailMessage.setTo("someone@localhost"); mailMessage.setReplyTo("someone@localhost"); mailMessage.setFrom("someone@localhost"); mailMessage.setSubject("Lorem ipsum"); mailMessage.setText("Lorem ipsum dolor sit amet [...]"); javaMailSender.send(mailMessage); return mailMessage; } } Running the application We are now ready to run the application. If you use CLI, type: gradle bootRun, open the browser and navigate to localhost:8080/mail. What you should see is actually an error, saying that mail server connection failed. As expected. Fake SMTP Server FakeSMTP is a free Fake SMTP Server with GUI, written in Java, for testing emails in applications. We will use it to verify if the submission works. Please download the application and simply run it by invoking: java -jar fakeSMTP-<version>.jar. After launching Fake SMTP Server, start the server. Now you can invoke REST endpoint again and see the result in Fake SMTP! But by testing I did not mean manual testing! The application is still useful, but we want to automatically test mail code. Unit testing mail code To be able to automatically test the mail submission, we will use Wiser – a framework / utility for unit testing mail based on SubEtha SMTP. SubEthaSMTP’s simple, low-level API is suitable for writing almost any kind of mail-receiving application. Using Wiser is very simple. Firstly, we need to add a test dependency to build.gradle: testCompile("org.subethamail:subethasmtp:3.1.7"). Secondly, we create an integration test with, JUnit, Spring and and Wiser: @RunWith(SpringJUnit4ClassRunner.class) @SpringApplicationConfiguration(classes = Application.class) @WebAppConfiguration public class MailSubmissionControllerTest {private Wiser wiser;@Autowired private WebApplicationContext wac; private MockMvc mockMvc;@Before public void setUp() throws Exception { wiser = new Wiser(); wiser.start(); mockMvc = MockMvcBuilders.webAppContextSetup(wac).build(); }@After public void tearDown() throws Exception { wiser.stop(); }@Test public void send() throws Exception { // act mockMvc.perform(get("/mail")) .andExpect(status().isCreated()); // assert assertReceivedMessage(wiser) .from("someone@localhosts") .to("someone@localhost") .withSubject("Lorem ipsum") .withContent("Lorem ipsum dolor sit amet [...]"); } } The SMTP server is initialized, started in @Before method and stopped in @Teardown method. After sending a message, the assertion is made. The assertion needs to be created, as the framework does not provide any. As you will notice, we need to operate on Wiser object, that provides a list of received messages: public class WiserAssertions {private final List<WiserMessage> messages;public static WiserAssertions assertReceivedMessage(Wiser wiser) { return new WiserAssertions(wiser.getMessages()); }private WiserAssertions(List<WiserMessage> messages) { this.messages = messages; }public WiserAssertions from(String from) { findFirstOrElseThrow(m -> m.getEnvelopeSender().equals(from), assertionError("No message from [{0}] found!", from)); return this; }public WiserAssertions to(String to) { findFirstOrElseThrow(m -> m.getEnvelopeReceiver().equals(to), assertionError("No message to [{0}] found!", to)); return this; }public WiserAssertions withSubject(String subject) { Predicate<WiserMessage> predicate = m -> subject.equals(unchecked(getMimeMessage(m)::getSubject)); findFirstOrElseThrow(predicate, assertionError("No message with subject [{0}] found!", subject)); return this; }public WiserAssertions withContent(String content) { findFirstOrElseThrow(m -> { ThrowingSupplier<String> contentAsString = () -> ((String) getMimeMessage(m).getContent()).trim(); return content.equals(unchecked(contentAsString)); }, assertionError("No message with content [{0}] found!", content)); return this; }private void findFirstOrElseThrow(Predicate<WiserMessage> predicate, Supplier<AssertionError> exceptionSupplier) { messages.stream().filter(predicate) .findFirst().orElseThrow(exceptionSupplier); }private MimeMessage getMimeMessage(WiserMessage wiserMessage) { return unchecked(wiserMessage::getMimeMessage); }private static Supplier<AssertionError> assertionError(String errorMessage, String... args) { return () -> new AssertionError(MessageFormat.format(errorMessage, args)); }public static <T> T unchecked(ThrowingSupplier<T> supplier) { try { return supplier.get(); } catch (Throwable e) { throw new RuntimeException(e); } }interface ThrowingSupplier<T> { T get() throws Throwable; } } Summary With just couple of lines of code we were able to automatically test mail code. The example presented in this article is not sophisticated but it shows how easy it is to get started with SubEtha SMTP and Wiser. How do you test your mail code?Reference: Testing mail code in Spring Boot application from our JCG partner Rafal Borowiec at the Codeleak.pl blog....

Getters/Setters. Evil. Period.

There is an old debate, started in 2003 by Allen Holub in this Why getter and setter methods are evil famous article, about whether getters/setters is an anti-pattern and should be avoided or if it is something we inevitably need in object-oriented programming. I’ll try to add my two cents to this discussion. The gist of the following text is this: getters and setters is a terrible practice and those who use it can’t be excused. Again, to avoid any misunderstanding, I’m not saying that get/set should be avoided when possible. No. I’m saying that you should never have them near your code. Arrogant enough to catch your attention? You’ve been using that get/set pattern for 15 years and you’re a respected Java architect? And you don’t want to hear that nonsense from a stranger? Well, I understand your feelings. I felt almost the same when I stumbled upon Object Thinking by David West, the best book about object-oriented programming I’ve read so far. So please. Calm down and try to understand while I try to explain. Existing Arguments There are a few arguments against “accessors” (another name for getters and setters), in an object-oriented world. All of them, I think, are not strong enough. Let’s briefly go through them. Ask, Don’t Tell: Allen Holub says, “Don’t ask for the information you need to do the work; ask the object that has the information to do the work for you”. Violated Encapsulation Principle: An object can be teared apart by other objects, since they are able to inject any new data into it, through setters. The object simply can’t encapsulate its own state safely enough, since anyone can alter it. Exposed Implementation Details: If we can get an object out of another object, we are relying too much on the first object’s implementation details. If tomorrow it will change, say, the type of that result, we have to change our code as well. All these justifications are reasonable, but they are missing the main point. Fundamental Misbelief Most programmers believe that an object is a data structure with methods. I’m quoting Getters and Setters Are Not Evil, an article by Bozhidar Bozhanov: But the majority of objects for which people generate getters and setters are simple data holders. This misconception is the consequence of a huge misunderstanding! Objects are not “simple data holders”. Objects are not data structures with attached methods. This “data holder” concept came to object-oriented programming from procedural languages, especially C and COBOL. I’ll say it again: an object is not a set of data elements and functions that manipulate them. An object is not a data entity. What is it then? A Ball and A Dog In true object-oriented programming, objects are living creatures, like you and me. They are living organisms, with their own behaviour, properties and a life cycle. Can a living organism have a setter? Can you “set” a ball to a dog? Not really. But that is exactly what the following piece of software is doing: Dog dog = new Dog(); dog.setBall(new Ball()); How does that sound? Can you get a ball from a dog? Well, you probably can, if she ate it and you’re doing surgery. In that case, yes, we can “get” a ball from a dog. This is what I’m talking about: Dog dog = new Dog(); Ball ball = dog.getBall(); Or an even more ridiculous example: Dog dog = new Dog(); dog.setWeight("23kg"); Can you imagine this transaction in the real world? Does it look similar to what you’re writing every day? If yes, then you’re a procedural programmer. Admit it. And this is what David West has to say about it, on page 30 of his book: Step one in the transformation of a successful procedural developer into a successful object developer is a lobotomy. Do you need a lobotomy? Well, I definitely needed one and received it, while reading West’s Object Thinking. Object Thinking Start thinking like an object and you will immediately rename those methods. This is what you will probably get: Dog dog = new Dog(); dog.take(new Ball()); Ball ball = dog.give(); Now, we’re treating the dog as a real animal, who can take a ball from us and can give it back, when we ask. Worth mentioning is that the dog can’t give NULL back. Dogs simply don’t know what NULL is! Object thinking immediately eliminates NULL references from your code.Besides that, object thinking will lead to object immutability, like in the “weight of the dog” example. You would re-write that like this instead: Dog dog = new Dog("23kg"); int weight = dog.weight(); The dog is an immutable living organism, which doesn’t allow anyone from the outside to change her weight, or size, or name, etc. She can tell, on request, her weight or name. There is nothing wrong with public methods that demonstrate requests for certain “insides” of an object. But these methods are not “getters” and they should never have the “get” prefix. We’re not “getting” anything from the dog. We’re not getting her name. We’re asking her to tell us her name. See the difference? We’re not talking semantics here, either. We are differentiating the procedural programming mindset from an object-oriented one. In procedural programming, we’re working with data, manipulating them, getting, setting, and deleting when necessary. We’re in charge, and the data is just a passive component. The dog is nothing to us — it’s just a “data holder”. It doesn’t have its own life. We are free to get whatever is necessary from it and set any data into it. This is how C, COBOL, Pascal and many other procedural languages work(ed). On the contrary, in a true object-oriented world, we treat objects like living organisms, with their own date of birth and a moment of death — with their own identity and habits, if you wish. We can ask a dog to give us some piece of data (for example, her weight), and she may return us that information. But we always remember that the dog is an active component. She decides what will happen after our request. That’s why, it is conceptually incorrect to have any methods starting with set or get in an object. And it’s not about breaking encapsulation, like many people argue. It is whether you’re thinking like an object or you’re still writing COBOL in Java syntax. PS. Yes, you may ask, — what about JavaBeans, JPA, JAXB, and many other Java APIs that rely on the get/set notation? What about Ruby’s built-in feature that simplies the creation of accessors? Well, all of that is our misfortune. It is much easier to stay in a primitive world of procedural COBOL than to truly understand and appreciate the beautiful world of true objects. PPS. Forgot to say, yes, dependency injection via setters is also a terrible anti-pattern. About it, in one of the next posts! Related Posts You may also find these posts interesting:Anti-Patterns in OOP Avoid String Concatenation Objects Should Be Immutable Why NULL is Bad? OOP Alternative to Utility ClassesReference: Getters/Setters. Evil. Period. from our JCG partner Yegor Bugayenko at the About Programming blog....

Nightmare on Agile Street 2: Managed Agile

Blow me down, its happening again… I’m awake. I’m wet, its a cold sweat. Its the small hours of the morning and the dream is horrid…. I’ve been sent to Coventry. I’m in a clients office waiting for a meeting to start. The development manager is telling me she has selected me to help them become Agile, she checked me out online and recognises that I am pragmatic. Thats why they chose a new tool called Kjsb, its pragmatic too. Pragmatic. God does she know how much I hate that word? Pragmatic to me? I recognise that Agile and Waterfall are points on a spectrum and that most organizations, for better or worse fall somewhere in-between. I recognise that every organisation exists within a context and you need to consider that. And even change the context. But pragmatic? Pragmatic is the Satan’s way of saying “Heaven doesn’t work in the Real World(TM)”. The CTO enters and is putting down redlines. He knows all Agile, but his people… its his people you see … you can’t trust them, they are like children, you can’t let them have too much say. They need a strong man to lead them. They had a developer here once who practiced Agile. He did that test driven stuff. He didn’t give out dates. He gave Agile a bad name in the company. The PMO will never accept that. Fortunately they have just bought Kjsb. This wonderful tool will fix everything. Kjsb has a feature that translates burn-downs into Gantt charts at the click-of-a-mouse. And back again. The problem is: teams still aren’t shipping on schedule. They need predictability. Predictability is what the one thing really need. And flexibility. Flexibility is important. Flexibility and predictability, the two things they really need. And now variation in features. They can’t trade features for time. Fixed scope, Flexibility and Predictability are the three things they need. But… they have unforeseen technical problems – not bugs you understand, but unforeseen technical problems. They really need to be able to deal with those things. Technical fixes, fixed scope, Flexibility and Predictability are the four things they need. Nobody expects… I want to explain queuing theory… a grasp of basic queuing theory is the one thing they need – stick their feet on the ground and cement them to it. One of the teams runs Agile. It is run by the CTO himself and its good. The other teams… well they don’t really have that much experience. Though the managers are going to get their Scrum certificates real soon now. How, he asks, can we get everyone else to buy in? How can we get the PMO to buy? How can they make the Product Owners buy in? Mention of the PMO stirs the old guy in the corner, the one who’s hiding behind his laptop, the widescreen laptop with the numeric keypad. And mention of the Product Owners causes the Analyst in the other corner – the one hiding behind the ultra thin laptop – to raise an eyebrow. Now I see they all have laptops out in front of them… and some of them phones too. In between moving their mouths each of them is staring at their screens. I’d better say something. “Well,” I start…., “how about we get people from the team who are doing this well to talk about their experience?” Blank looks all round, are they listening? Or doing e-mailing? “Could you them your own case study?” No – that won’t work because that teams are so very different from everyone else in the company nobody will believe it. They are all individuals. Besides, the developers won’t be at the buy-in meeting. Its for managers to buy in. Once the managers buy in the developers will be told what to do. …. I try a different approach: “Instead of talking to the PMO one day, and the Product Managers the next day, and the Development Managers the day after… why don’t we go vertical and take each development team in turn, with the appropriate project, product and development managers?” No. Managers manage, they are the only ones who need to know. And they are the ones who will be allocating the work with Kjsb. “Need to know” – “Allocating work” Did I really just hear those words? Whose version of Agile have they been reading? O my god, these guys are going on a Scrum Master course next week, there is going to be a bun fight, I don’t know who I worry about most these guys or the poor sod who is teaching the class…. “Can I just check,” I ask, “each team has a project manager assigned, a product manager, a team lead, they will soon have a Scrum Master too?” Heads nod, “and… there are several development managers spanning several teams each?” Yes. “So if I’m counting right…. each team contains about 4 developers and 1 tester? (Plus UAT cycle lagging several weeks later)” Yes. “O see…” Am I keeping a straight face? Does my face hide my horror? 3+ managers for every 5 workers? – either this business prints cash or they will soon be bust. …. “Really,” says the development manager, “we are talking about change, I have 12 years change management experience from call centres to financial services, the CTO hand picked me to lead this change, software development is just the same as any other change”. When did Fred Brooks come into the room, in fact what is he doing in Coventry, he lives in Carolina, why is he wearing a dog collar? And why is it 1974? He’s now standing at the lectern reading from a tatted copy of Mythical Man Month. “In many ways” says Brooks, “managing a large computer programming project is like managing any other large undertaking – in more ways than most programmers believe. But in many other ways it is different – in more ways than most professional managers expect.” Well this is a dream, what do I expect? Its 2014 again… “The key is to set the framework,” she continues, “establish boundaries so people know what their responsibilities are then we can empower them”. Fred has gone, standing at the lectern in dog collar is Henry Mintzberg – my management hero – he is reading from another tattered book entitled Managing: “the later term empowerment did not change [manager control], because the term itself indicated that the power remained with the manager. Truly empowered workers, such as doctors in a hospital, even bees in the hive, do not await gifts from their managerial gods; they know what they are there to do and just do it.” Empowerment is dis-empowered: using the words say one thing but the message given by using those words is the opposite. “What we want is consistency across teams” says the CTO who now resembles Basil Fawlty. (What happened to “all my teams are different”?) “And a stage gate process” says the PMO man, or is it Terry Jones? “And clear roles and responsibilities” says the Cardinal Fang. “Nobody expects the Spanish Inquisition” says Michael Palin – where did he come from? …. “It seems to me” starts the Product Owner “that we are making a lot more paperwork for ourselves here”. O the voice of sanity! “Yes” I begin…. “if you attempt to run both an Agile and a Waterfall process that is what you will have!” Silence. I continue, “Over time, as you see Agile work, as people understand it, I would expect you will move to a more Agile approach in general and be able to reduce the documentation.” “No.” The PMO seems quite certain of this, “I don’t think that will happen here, we need the control and certainty that the waterfall and our stage gates provide. We won’t be doing that.” Poor Product Owner, if he is lucky he’ll be shown the door, if he’s unlucky he’ll be retained. … “If you want people to buy in” I suggest, “we must let people have their say.” The PMO is ready for this: “Yes, we know that, we’ve already arranged for a survey” and she reads the questions: Q1: “Do you agree our development process needs to change?” Yes or No. Q2: “Our organization wishes to remain in control but we want the benefits of Agile, do you think we should:Embrace Marxism in its entirety, Mandate Waterfall throughout the organization or Create a Managed Agile process?”Q3: “Have you seen the features provided by Kjsb?” Yes or No. O my god, its a North Korean election. I suggest the questions are a little bit leading. “Well we don’t want people being awkward” chips in the CTO. … We get up to leave. “You know,” I say, “when you’ve had a chance to run this process for a while you will want to inspect it and modify it” – but while I’m saying that I’m think “No plan survives contact with the enemy, start small, see what happens.” “O we’ve already done that. This process is the result of doing that. We won’t be changing it.” … Back in my kitchen, a warm milk in my hand. A bad dream. It was a bad dream. That stuff never happened. How could it? The contradictions are so obvious even a small child could see them. Couldn’t they? As I climb the stairs back to bed a terrible thought: what if it wasn’t a nightmare? what if it was real? and what if they call me back for help? Could anyone help such people?Reference: Nightmare on Agile Street 2: Managed Agile from our JCG partner Allan Kelly at the Agile, Lean, Patterns blog....

Agile is a simple topic

Agile manifesto is probably one of the best ever written manifestos in software development if not the best. Simple and elegant. Good vs Bad 1 2 3 4, done. It is so simple that I am constantly disappointed by the amount of stuff that floating on the Internet about, what is agile what is not, how to do agile, Scrum, Kanban and who knows what will pop up next year claiming to to be another king of agile. If I ever tell you we are the purist agile team and we don’t have sprint, we don’t have stand up meetings, we don’t story board, we don’t have burn down charts, we don’t have planning poker cards, we don’t have any of the buzzwords, most of the so called IT consultants will hang me on the spot.   Let’s face it, being pure isn’t about what you have, it is about what you don’t! The pure gold has nothing but gold that’s why it is super valuable. We should build our teams on developers, codes and business needs. The three pure ingredient of a team, any one taken away a team is no more.Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. Antoine de Saint-ExuperyExactly the manifesto is saying “we value less on processes and tools” and yet we have seen all kinds of weird super imposed processes and tools everywhere. “Look, we have standups, we have sprints, we have story boards therefore we agile”. NO, absolutely NOT. You can walk like a duck, quack like duck, but you are still not a duck. But why the hype anyway? Partly the consulting companies are to be blamed, they try to sell the buzz words to the management so that they can make $$$ by simply asking the developers to do what they already know, writing codes, but in a different way. The biggest enemies are all the developers especially the team leaders and managers. Because they are lazy to know the developers (the people), lazy to learn the codes (the working software) the lazy to analyse the business needs. Because “in the end of the day I need to show my developers that I am doing a manager’s work”, “what is the shortcut?”, “look, I just got this scrum from a random blog post, standups 5 mins, no problems. Poker cards, easy. Story boards, no big deal … “. “Done, now we are scrum, now we are agile, if the things fail, it is the developers problem”. Goodbye, there goes a team. So now you question me, “you said agile is simple, why it looks so hard now?”Any fool can make something complicated. It takes a genius to make it simple.Woody Guthrie People are born equal, a genius doesn’t magically popup, it takes real hard work to reach that level. Let’s go back to the origin, the mighty manifesto. Get rid of all unnecessary processes and tools, and go talk to people. “What is Jimmy’s strength? What can we do to make up for Sam’s weakness? Is David and Carl a good pair?”. Stop typing inside Words or Excel, go read the real codes, “What can we do the enhance the clarity of the codes, how to improve the performance without too much sacrifice, what are the alternative ways to extend our software”. Stop coming up with imaginary use cases, go meet the customer “What are your point points, what are the 3 most important features that need to be enhanced and delivered. Based on our statistics, we believe if we build feature X in such a way, the business can grow in Y%, do you think we should do this?” Stop wasting our life on keeping a useless backlog, go see the 3 biggest opportunities and threats and work on them, rise and repeat. If fact that is exactly how evolution bring our human to this stage, “eliminate the immediate threat to ensure the short term survival, and seek the opportunities for long term growth”. As we all decadents of the mother nature, we are incapable of out smart her, so learn from her. real process/methodology grows from the team not super imposed on to the team real process/methodology does not have a name because it is unique to each team Grow your own dream team! Thanks for wasting your time reading my rant.Reference: Agile is a simple topic from our JCG partner Dapeng Liu at the Developers Corner blog....

Some quality metrics that helped my teams

I’ve been asked the question “what are the best metrics to improve software quality?” (or similar) a million times, this blog post is a selfish time saver, you are probably reading this because you asked me a similar question and I sent you here. Firstly, i am not a fan of metrics and I consider a good 99% of the recommended software quality metrics pure rubbish. Having said that there are a few metrics that have helped teams I worked with and these are the ones I will share. Secondly, metrics should be used to drive change. I believe it is fundamental that the metric tracked is clearly associated to the reason why the metric is tracked so that people don’t focus on the number but on the benefit that observing the number will drive. Good metric#1: In order to be able to re-factor without worrying about breaking what we have already built we decided to raise the unit test coverage to >95% and measure it. Builds would fail if the metric was not respected. Good metric#2: In order to reduce code complexity, improve readability and make changes easier, we set a limit and measured the maximum size of each method (15 lines) and the cyclomatic complexity (don’t remember the number but I think it was <10). Builds would fail if the metric was not respected. Good metric#3: In order to continuously deliver low complexity easily testable units of work and help with predictability we started measuring the full cycle time of user stories from inception to production with the goal of keeping it between 3 and 5 days. When we had user stories that took more than 5 days we retrospected and examined the reasons. In the 3 cases above, the focus is on the goal, the number is what we think will drive the change and can always be changed. If people don’t understand why they write unit tests, they will achieve unit test coverage without guaranteeing the ability to refactor, for example by writing fake tests that don’t have assertions. We should never decouple the metric from the reason we are measuring something. These are the good metrics, for me. If you want to see some of the bad ones, have a look at this article I wrote some time ago on confrontational metrics and delivery teams that don’t give a damn about their customers. http://mysoftwarequality.wordpress.com/2012/12/27/the-wrath-of-the-mighty-metric/Reference: Some quality metrics that helped my teams from our JCG partner Augusto Evangelisti at the mysoftwarequality blog....

Scala and Java 8 type inference in higher order functions sample

One of the concepts mentioned in the Functional Programming in Scala is about the type inference in higher order functions in Scala and how it fails in certain situations and a workaround for the same. So consider a sample higher order function, purely for demonstration:               def filter[A](list: List[A], p: A => Boolean):List[A] = { list.filter(p) } Ideally, passing in a list of say integers, you would expect the predicate function to not require an explicit type: val l = List(1, 5, 9, 20, 30)filter(l, i => i < 10) Type inference does not work in this specific instance however, the fix is to specify the type explicitly: filter(l, (i:Int) => i < 10) Or a better fix is to use currying, then the type inference works! def filter[A](list: List[A])(p: A=>Boolean):List[A] = { list.filter(p) }filter(l)(i => i < 10) //OR filter(l)(_ < 10) I was curious whether Java 8 type inference has this issue and tried a similar sample with Java 8 Lambda expression, the following is an equivalent filter function: public <A> List<A> filter(List<A> list, Predicate<A> condition) { return list.stream().filter(condition).collect(toList()); } and type inference for the predicate works cleanly: List ints = Arrays.asList(1, 5, 9, 20, 30); List lessThan10 = filter(ints, i -> i < 10);Another blog entry on a related topic by the author of the “Functional Programming in Scala” book is available here – http://pchiusano.blogspot.com/2011/05/making-most-of-scalas-extremely-limited.htmlReference: Scala and Java 8 type inference in higher order functions sample from our JCG partner Biju Kunjummen at the all and sundry blog....


Autoboxing is clear for all Java developers since Java 1.5 Well, I may be too optimistic. At least all developers are supposed to be ok with autoboxing. After all there is a good tutorial about it on the page of ORACLE. Autoboxing is the phenomena when the Java compiler automatically generates code creating an object from a primitive type when it is needed. For example you can write:           Integer a = 42; and it will automatically generate JVM code that puts the value int 42 into an Integer object. This is so nice of the compiler to do it for us that after a while we, programmers just tend to forget about the complexity behind it and from time to time we run against the wall. For example we have double.class and Double.class. Both of them are objects (as being a class and each class itself is an object in permgen or just on the heap in post-permgen version of JVM). Both of these objects are of type Class. What is more: since Java 1.5 both of them are of type Class<Double>. If two objects have the same type, they also have to be assignment compatible aren’t they. Seems to be an obvious statement. If you have object O a and object O b then you can assign a = b. Looking at the code, however we may realize being oblivious instead of obvious: public class TypeFun { public static void main(String[] args) { // public static final Class<Double> TYPE = (Class<Double>)Class.getPrimitiveClass("double"); System.out.println("Double.TYPE == double.class: " + (Double.TYPE == double.class)); System.out.println("Double.TYPE == Double.class: " + (Double.TYPE == Double.class)); System.out.println("double.class.isAssignableFrom(Double.class): " + (double.class.isAssignableFrom(Double.class))); System.out.println("Double.class.isAssignableFrom(double.class): " + (Double.class.isAssignableFrom(double.class))); } } resulting: Double.TYPE == double.class: true Double.TYPE == Double.class: false double.class.isAssignableFrom(Double.class): false Double.class.isAssignableFrom(double.class): false This means that the primitive pair of Double is double.class (not surprising). Even though one can not be assigned from the other. We can look at the source at least of the one of the them. The source of the class Double is in the RT.jar and it is open source. There you can see that: public static final Class<Double> TYPE = (Class<Double>) Class.getPrimitiveClass("double"); Why does it use that weird Class.getPrimitiveClass("double") instead of double.class? That is the primitive pair of the type Double. The answer is not trivial and you can dig deep into the details of Java and JVM. Since double is not a class, there is nothing like double.class in reality. You can still use this literal in the Java source code though and this is where the Java language, compiler and the run-time has some strong bondage. The compiler knows that the class Double defines a field named TYPE denoting the primitive type of it. Whenever the compiler sees double.class in the source code it generates JVM code Double.TYPE (give it a try and then use javap to decode the generated code!). For this very reason the developer of the RT could not write: public static final Class<Double> TYPE = double.class; into the source of the class Double. It would compile to the code equivalent: public static final Class<Double> TYPE = TYPE; How is autoboxing going on then? The source: Double b = (double)1.0; results: 0: dconst_1 1: invokestatic #2 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double; 4: astore_1 however if we replace the two ‘d’ letters: double b = (Double)1.0; then we get: 0: dconst_1 1: invokestatic #2 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double; 4: invokevirtual #3 // Method java/lang/Double.doubleValue:()D 7: dstore_1 which ineed explains a lot of things. The instances of the class double.class the class Double.class are not assign compatible. Autoboxing solves this. Java 4 was a long time ago and we, luckily forgot it. Your homework: reread what happens related to autoboxing when you have overloaded methods that have arguments of the “class” type and the corresponding primitive type.Reference: Autoboxing from our JCG partner Peter Verhas at the Java Deep blog....

A beginner’s guide to database locking and the lost update phenomena

Introduction A database is highly concurrent system. There’s always a chance of update conflicts, like when two concurring transactions try to update the same record. If there would be only one database transaction at any time then all operations would be executed sequentially. The challenge comes when multiple transactions try to update the same database rows as we still have to ensure consistent data state transitions. The SQL standard defines three consistency anomalies (phenomena):    Dirty reads, prevented by Read Committed, Repeatable Read and Serializable isolation levels Non-repeatable reads, prevented by Repeatable Read and Serializable isolation levels Phantom reads, prevented by the Serializable isolation levelA lesser-known phenomena is the lost updates anomaly and that’s what we are going to discuss in this current article. Isolation levels Most database systems use Read Committed as the default isolation level (MySQL using Repeatable Read instead). Choosing the isolation level is about finding the right balance of consistency and scalability for our current application requirements. All the following examples are going to be run on PostgreSQL 9.3. Other database systems may behave differently according to their specific ACID implementation. PostgreSQL uses both locks and MVCC (Multiversion Concurrency Control). In MVCC read and write locks are not conflicting, so reading doesn’t block writing and writing doesn’t block reading either. Because most applications use the default isolation level, it’s very important to understand the Read Committed characteristics:Queries only see data committed before the query began and also the current transaction uncommitted changes Concurrent changes committed during a query execution won’t be visible to the current query UPDATE/DELETE statements use locks to prevent concurrent modificationsIf two transactions try to update the same row, the second transaction must wait for the first one to either commit or rollback, and if the first transaction has been committed, then the second transaction DML WHERE clause must be reevaluated to see if the match is still relevant.In this example Bob’s UPDATE must wait for Alice’s transaction to end (commit/rollback) in order to proceed further. Read Committed accommodates more concurrent transactions than other stricter isolation levels, but less locking leads to better chances of losing updates. Lost updates If two transactions are updating different columns of the same row, then there is no conflict. The second update blocks until the first transaction is committed and the final result reflects both update changes. If the two transactions want to change the same columns, the second transaction will overwrite the first one, therefore loosing the first transaction update. So an update is lost when a user overrides the current database state without realizing that someone else changed it between the moment of data loading and the moment the update occurs.In this example Bob is not aware that Alice has just changed the quantity from 7 to 6, so her UPDATE is overwritten by Bob’s change. The typical find-modify-flush ORM strategy Hibernate (like any other ORM tool) automatically translates entity state transitions to SQL queries. You first load an entity, change it and let the Hibernate flush mechanism syncronize all changes with the database. public Product incrementLikes(Long id) { Product product = entityManager.find(Product.class, id); product.incrementLikes(); return product; }public Product setProductQuantity(Long id, Long quantity) { Product product = entityManager.find(Product.class, id); product.setQuantity(quantity); return product; } As I’ve already pointed out, all UPDATE statements acquire write locks, even in Read Committed isolation. The persistence context write-behind policy aims to reduce the lock holding interval but the longer the period between the read and the write operations the more chances of getting into a lost update situation. Hibernate includes all row columns in an UPDATE statement. This strategy can be changed to include only the dirty properties (through the @DynamicUpdate annotation) but the reference documentation warns us about its effectiveness: Although these settings can increase performance in some cases, they can actually decrease performance in others. So let’s see how Alice and Bob concurrently update the same Product using an ORM framework:Alice Bobstore=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 7 (1 ROW) store=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 7 (1 ROW)store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 6 | 7 (1 ROW)store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 10 (1 ROW)store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 10 (1 ROW)Again Alice’s update is lost without Bob ever knowing he overwrote her changes. We should always prevent data integrity anomalies, so let’s see how we can overcome this phenomena. Repeatable Read Using Repeatable Read (as well as Serializable which offers a even stricter isolation level) can prevent lost updates across concurrent database transactions.Alice Bobstore=# BEGIN; store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 7 (1 ROW) store=# BEGIN; store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 7 (1 ROW)store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 6 | 7 (1 ROW)ERROR: could not serialize access due to concurrent update store=# SELECT * FROM PRODUCT WHERE ID = 1; ERROR: current transaction is aborted, commands ignored until end of transaction block (1 ROW)This time, Bob couldn’t overwrite Alice’s changes and his transaction was aborted. In Repeatable Read, a query will see the data snapshot as of the start of the current transaction. Changes committed by other concurrent transactions are not visible to the current transaction. If two transactions attempt to modify the same record, the second transaction will wait for the first one to either commit or rollback. If the first transaction commits, then the second one must be aborted to prevent lost updates. SELECT FOR UPDATE Another solution would be to use the FOR UPDATE with the default Read Committed isolation level. This locking clause acquires the same write locks as with UPDATE and DELETE statements.Alice Bobstore=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE; ID | LIKES | QUANTITY —-+——-+———- 1 | 5 | 7 (1 ROW) store=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE;store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1; store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; ID | LIKES | QUANTITY —-+——-+———- 1 | 6 | 7 (1 ROW)id | likes | quantity —-+——-+———- 1 | 6 | 7 (1 row)store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1; UPDATE 1 store=# COMMIT; COMMIT store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity —-+——-+———- 1 | 6 | 10 (1 row)Bob couldn’t proceed with the SELECT statement because Alice has already acquired the write locks on the same row. Bob will have to wait for Alice to end her transaction and when Bob’s SELECT is unblocked he will automatically see her changes, therefore Alice’s UPDATE won’t be lost. Both transactions should use the FOR UPDATE locking. If the first transaction doesn’t acquire the write locks, the lost update can still happen.Alice Bobstore=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity —-+——-+———- 1 | 5 | 7 (1 row)store=# BEGIN; store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE id | likes | quantity —-+——-+———- 1 | 5 | 7 (1 row)store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1; store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity —-+——-+———- 1 | 6 | 10 (1 row) store=# COMMIT;store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity —-+——-+———- 1 | 6 | 7 (1 row) store=# COMMIT;store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity —-+——-+———- 1 | 6 | 7 (1 row)Alice’s UPDATE is blocked until Bob releases the write locks at the end of his current transaction. But Alice’s persistence context is using a stale entity snapshot, so she overwrites Bob changes, leading to another lost update situation. Optimistic Locking My favorite approach is to replace pessimistic locking with an optimistic locking mechanism. Like MVCC, optimistic locking defines a versioning concurrency control model that works without acquiring additional database write locks. The product table will also include a version column that prevents old data snapshots to overwrite the latest data.Alice Bobstore=# BEGIN; BEGIN store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity | version —-+——-+———-+——— 1 | 5 | 7 | 2 (1 row) store=# BEGIN; BEGIN store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity | version —-+——-+———-+——— 1 | 5 | 7 | 2 (1 row)store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (6, 7, 3) WHERE (ID, VERSION) = (1, 2); UPDATE 1store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (5, 10, 3) WHERE (ID, VERSION) = (1, 2);store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity | version —-+——-+———-+——— 1 | 6 | 7 | 3 (1 row)UPDATE 0 store=# COMMIT; store=# SELECT * FROM PRODUCT WHERE ID = 1; id | likes | quantity | version —-+——-+———-+——— 1 | 6 | 7 | 3 (1 row)Every UPDATE takes the load-time version into the WHERE clause, assuming no one has changed this row since it was retrieved from the database. If some other transaction manages to commit a newer entity version, the UPDATE WHERE clause will no longer match any row and so the lost update is prevented. Hibernate uses the PreparedStatement#executeUpdate result to check the number of updated rows. If no row was matched, it then throws a StaleObjectStateException (when using Hibernate API) or an OptimisticLockException (when using JPA). Like with Repeatable Read the current transaction and the persistence context are aborted, in respect to atomicity guarantees. Conclusion Lost updates can happen unless you plan for preventing such situations. Other than optimistic locking, all pessimistic locking approaches are effective only in the scope of the same database transaction, when both the SELECT and the UPDATE statements are executed in the same physical transaction. In my next post I will explain why optimistic locking is the only viable solution when using application-level transactions, like it’s the case for most web applications.Reference: A beginner’s guide to database locking and the lost update phenomena from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....

Can Static Analysis replace Code Reviews?

In my last post, I explained how to do code reviews properly. I recommended taking advantage of static analysis tools like Findbugs, PMD, Klocwork or Fortify to check for common mistakes and bad code before passing the code on to a reviewer, to make the reviewer’s job easier and reviews more effective. Some readers asked whether static analysis tools can be used instead of manual code reviews. Manual code reviews add delays and costs to development, while static analysis tools keep getting better, faster, and more accurate. So can you automate code reviews, in the same way that many teams automate functional testing? Do you need to do manual reviews too, or can you rely on technology to do the job for you?   Let’s start by understanding what static analysis bug checking tools are good at, and what they aren’t. What static analysis tools can do – and what they can’t do In this article, Paul Anderson at GrammaTech does a good job of explaining how static analysis bug finding works, the trade-offs between recall (finding all of the real problems), precision (minimizing false positives) and speed, and the practical limitations of using static analysis tools for finding bugs. Static analysis tools are very good at catching certain kinds of mistakes, including memory corruption and buffer overflows (for C/C++), memory leaks, illegal and unsafe operations, null pointers, infinite loops, incomplete code, redundant code and dead code. A static analysis tool knows if you are calling a library incorrectly (as long as it recognizes the function), if you are using the language incorrectly (things that a compiler could find but doesn’t) or inconsistently (indicating that the programmer may have misunderstood something). And static analysis tools can identify code with maintainability problems, code that doesn’t follow good practice or standards, is complex or badly structured and a good candidate for refactoring. But these tools can’t tell you when you have got the requirements wrong, or when you have forgotten something or missed something important – because the tool doesn’t know what the code is supposed to do. A tool can find common off-by-one mistakes and some endless loops, but it won’t catch application logic mistakes like sorting in descending order instead of ascending order, or dividing when you meant to multiply, referring to buyer when it should have been seller, or lessee instead of lessor. These are mistakes that aren’t going to be caught in unit testing either, since the same person who wrote the code wrote the tests, and will make the same mistakes. Tools can’t find missing functions or unimplemented features or checks that should have been made but weren’t. They can’t find mistakes or holes in workflows. Or oversights in auditing or logging. Or debugging code left in by accident. Static analysis tools may be able to find some backdoors or trapdoors – simple ones at least. And they might find some concurrency problems – deadlocks, races and mistakes or inconsistencies in locking. But they will miss a lot of them too. Static analysis tools like Findbugs can do security checks for you: unsafe calls and operations, use of weak encryption algorithms and weak random numbers, using hard-coded passwords, and at least some cases of XSS, CSRF, and simple SQL injection. More advanced commercial tools that do inter-procedural and data flow analysis (looking at the sources, sinks and paths between) can find other bugs including injection problems that are difficult and time-consuming to trace by hand. But a tool can’t tell you that you forgot to encrypt an important piece of data, or that you shouldn’t be storing some data in the first place. It can’t find logic bugs in critical security features, if sensitive information could be leaked, when you got an access control check wrong, or if the code could fail open instead of closed. And using one static analysis tool on its own to check code may not be enough. Evaluations of static analysis tools, such as NIST’s SAMATE project (a series of comparative studies, where many tools are run against the same code), show almost no overlap between the problems found by different tools (outside of a few common areas like buffer errors) even when the tools are supposed to be doing the same kinds of checks. Which means that to get the most out of static analysis, you will need to run two or more tools against the same code (which is what SonarQube, for example, which integrates its own static analysis results with other tools, including popular free tools, does for you). If you’re paying for commercial tools, this could get very expensive fast. Tools vs. Manual Reviews Tools can find cases of bad coding or bad typing – but not bad thinking. These are problems that you will have to find through manual reviews. A 2005 study Comparing Bug Finding Tools with Reviews and Tests used Open Source bug finding tools (including Findbugs and PMD) on 5 different code bases, comparing what the tools found to what was found through code reviews and functional testing. Static analysis tools found only a small subset of the bugs found in manual reviews, although the tools were more consistent – manual reviewers missed a few cases that the tools picked up. Just like manual reviews, the tools found more problems with maintainability than real defects (this is partly because one of the tools evaluated – PMD – focuses on code structure and best practices). Testing (black box – including equivalence and boundary testing – and white box functional testing and unit testing) found fewer bugs than reviews. But different bugs. There was no overlap at all between bugs found in testing and the bugs found by the static analysis tools. Finding problems that could happen – or do happen Static analysis tools are good at finding problems that “could happen”, but not necessarily problems that “do happen”. Researchers at Colorado State University ran static analysis tools against several releases of different Open Source projects, and compared what the tools found against the changes and fixes that developers actually made over a period of a few years – to see whether the tools could correctly predict the fixes that needed to be made and what code needed to be refactored. The tools reported hundreds of problems in the code, but found very few of the serious problems that developers ended up fixing. One simple tool (Jlint) did not find anything that was actually fixed or cleaned up by developers. Of 112 serious bugs that were fixed in one project, only 3 were also found by static analysis tools. In another project, only 4 of 136 bugs that were actually reported and fixed were found by the tools. Many of the bugs that developers did fix were problems like null pointers and incorrect string operations – problems that static analysis tools should be good at catching, but didn’t. The tools did a much better job of predicting what code should be refactored: developers ended up refactoring and cleaning up more than 70% of the code structure and code clarity issues that the tools reported (PMD, a free code checking tool, was especially good for this). Ericsson evaluated different commercial static analysis tools against large, well-tested, mature applications. On one C application, a commercial tool found 40 defects – nothing that could cause a crash, but still problems that needed to be fixed. On another large C code base, 1% of the tool’s findings turned out to be bugs serious enough to fix. On the third project, they ran 2 commercial tools against an old version of a C system with known memory leaks. One tool found 32 bugs, another 16: only 3 of the bugs were found by both tools. Surprisingly, neither tool found the already known memory leaks – all of the bugs found were new ones. And on a Java system with known bugs they tried 3 different tools. None of the tools found any of the known bugs, but one of the tools found 19 new bugs that the team agreed to fix. Ericsson’s experience is that static analysis tools find bugs that are extremely difficult to find otherwise. But it’s rare to find stop-the-world bugs – especially in production code – using static analysis. This is backed up by another study on the use of static analysis (Findbugs) at Google and on the Sun JDK 1.6.0. Using the tool, engineers found a lot of bugs that were real, but not worth the cost of fixing: deliberate errors, masked errors, infeasible situations, code that was already doomed, errors in test code or logging code, errors in old code that was “going away soon” or other relatively unimportant cases. Only around 10% of medium and high priority correctness errors found by the tool were real bugs that absolutely needed to be fixed. The Case for Security So far we’ve mostly looked at static analysis checking for run-time correctness and general code quality, not security. Although security builds on code quality – vulnerabilities are just bugs that hackers look for and exploit – checking code for correctness and clarity isn’t enough for a secure app. A lot of investment in static analysis technology over the past 5-10 years has been in finding security problems in code, such as common problems listed in OWASP’s Top 10 or the SANS/CWE Top 25 Most Dangerous Software Errors. A couple of studies have looked at the effectiveness of static analysis tools compared to manual reviews in finding security vulnerabilities. The first study was on a large application that had 15 known security vulnerabilities found through a structured manual assessment done by security experts. Two different commercial static analysis tools were run across the code. The tools together found less than half of the known security bugs – only the simplest ones, the bugs that didn’t require a deep understanding of the code or the design. And of course the tools reported thousands of other issues that needed to be reviewed and qualified or thrown away as false positives. These other issues including some run-time correctness problems, null pointers and resource leaks, and code quality findings (dead code, unused variables), but no other real security vulnerabilities beyond those already found by the manual security review. But this assumes that you have a security expert around to review the code. To find security vulnerabilities, a reviewer needs to understand the code (the language and the frameworks), and they also need to understand what kind of security problems to look for. Another study shows how difficult this is. Thirty developers were hired to do independent security code reviews of a small web app (some security experts, others web developers). They were not allowed to use static analysis tools. The app had 6 known vulnerabilities. 20% of the reviewers did not find any of the known bugs. None of the reviewers found all of the known bugs, although several found a new XSS vulnerability that the researchers hadn’t known about. On average, 10 reviewers would have had only an 80% chance of finding all of the security bugs. And, not Or Static analysis tools are especially useful for developers working in unsafe languages like C/C++ (where there is a wide choice of tools to find common mistakes) or dynamically typed scripting languages like Javascript or PHP (where unfortunately the tools aren’t that good), and for teams starting off learning a new language and framework. Using static analysis is (or should be) a requirement in highly regulated, safety critical environments like medical devices and avionics. And until more developers get more training and understand more about how to write secure software, we will all need to lean on static analysis (and dynamic analysis) security testing tools to catch vulnerabilities. But static analysis isn’t a substitute for code reviews. Yes, code reviews take extra time and add costs to development, even if you are smart about how you do them – and being smart includes running static analysis checks before you do reviews. If you want to move fast and write good, high-quality and secure code, you still have to do reviews.You can’t rely on static analysis alone.Reference: Can Static Analysis replace Code Reviews? from our JCG partner Jim Bird at the Building Real Software blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: