Featured FREE Whitepapers

What's New Here?


Okay, everybody who touches Java bytecode

The Oracle v. Google holds that copying the Structure, Sequence, and Organization of the Java APIs is a copyright violation. And a copyright violation is not just the act of copying, but also applies to all the intermediate parties that have a copy of the work. That’s anybody who writes/compiles any JVM language and anyone who has a JAR file on any device they posses… including a Java ME applet on your old Motorola flip phone. In fact, the JVM in all its incarnations is so pervasive, it’s likely that every adult in every industrialized nation has some JVM running someplace. And every non-Sun/Oracle JAR files has a copy of some or all of the Java API embedded in it because it’s technically necessary to include a shadow of the API in compiled bytecode in order to invoke the API. Let me demonstrate. Here’s a perfectly legal Java program that I wrote and I own the copyright to: public class HelloWorld {public static void main(String[] args) { int strlen = 0;for (int x = 0; x < args.length; x++) { strlen += args[x].length(); }System.out.println("Hello, world, you passed in "+args.length+" arguments, "+ "total size: "+strlen); }} Nothing in there looks infringing. I run the program through the OpenJDK Java compiler, javac which results in a HelloWorld.class file. According to how the industry has used Java and compilers in general, the resulting bytecode is a derivative work of the source code and I own the copyright in the source code. So, let’s take a look at the resulting bytecode, disassembled with javap: dpp@crown:~/proj/dpp-blog/images$ javap -c HelloWorld Compiled from "HelloWorld.java" public class HelloWorld { public HelloWorld(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: returnpublic static void main(java.lang.String[]); Code: 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: iload_2 5: aload_0 6: arraylength 7: if_icmpge 25 10: iload_1 11: aload_0 12: iload_2 13: aaload 14: invokevirtual #2 // Method java/lang/String.length:()I 17: iadd 18: istore_1 19: iinc 2, 1 22: goto 4 25: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream; 28: new #4 // class java/lang/StringBuilder 31: dup 32: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V 35: ldc #6 // String Hello, world, you passed in 37: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 40: aload_0 41: arraylength 42: invokevirtual #8 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 45: ldc #9 // String arguments, 47: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 50: ldc #10 // String total size: 52: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 55: iload_1 56: invokevirtual #8 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 59: invokevirtual #11 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 62: invokevirtual #12 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 65: return } Oh my… look, some of the Java APIs snuck right into the code. In fact, the JVM requires the call site (the place where code is called) to include information about the API that’s being called in order for the JVM to figure out the method to be called. And not just the method name, but also the parameter types passed in and the expected return type. So each and every compiled JAR contains some part of the Java API embedded in it. Each and every compiled JAR file is a copyright violation under the Oracle decision. “But,” you say, “the JAR file doesn’t contain all of the disputed API.” First, how much is enough. The Oracle court explicitly rejected the argument that the APIs were a small part of the overall work of the Java base classes and that percentage arguments were not persuasive. Second, for repositories like Maven Central that house tens of thousands of JAR files, substantially all of the Java APIs are copied into the collective works that are housed in those JAR files. What to do? If I were hosting a ton of JAR files, I’d be on the phone to my lawyers trying to figure out what to do. Yeah, maybe there’s an inducement argument because Oracle distributes javac and therefore is inducing me to copy the Java APIs. But still, it’s a technical violation of the Oracle court’s decision. If I were the Apache Software Foundation or the Free Software Foundation, I’d be filing an ex parte motion this morning to get a stay of the Oracle decision because it means that what we’ve been thinking is our software that we can license on our open terms in fact contains Oracle copyrighted code and we will have to suspend all of our JVM-related open source projects. Oh, and I should point out that if Oracle claims that the APIs copied into the JAR files are not covered by copyright, then all Google has to do is pull all the JAR files from Maven Central, find all the Java API references in all those JAR files and use that information to declare an API for Android. That’s about 10 man-days of effort, at most.  Reference: Okay, everybody who touches Java bytecode from our JCG partner David Pollak at the DPP’s Blog blog....

Programming for Change

It has become cliché to say that the only constant in life is change, and most people accept it as a given. However, we often don’t take it to heart when we code. We prototype something together using “Magic Number” hard-coded values, we use a new library by making calls directly into its functionality, we cut-and-paste a function that does almost what we need, changing one or two lines to get it to work, and so on. All those decisions seem harmless and allow us to get our prototype working more quickly, but they invariably come back to bite us (and if not us directly, the poor guy who has to maintain our code).   There are things we can take better advantage of to help us avoid a lot of these situations. You are probably aware of most (if not all) of them, but being reminded about them every once in a while helps us all remember to use them, even if it slows us down a bit. Note: this article is based on a presentation I originally prepared for a computer conference back in the early 2000s. However, based on some of my recent experiences, it is still quite relevant, so I thought I would resurrect it here. Avoiding “Magic Numbers” Most modern languages provide constructs for easily dealing with “Magic Numbers.” In Java, there are enums to hold collections of constant data. And now that we can extend them to add our own methods, they are even more powerful. If you have a constant that is pretty stand alone, make it a static. I usually put it in the interface of the class that needs it defined. For those things that are constant across an activation of your program, but perhaps not across all activations (database access parameters are a classic example of this), property files are best used. This allows you to take advantage of the constant-ness of the values in your code, and yet swap them out depending on the environment you are running in. Another good tool for managing these pseudo-constants is dependency injection. By programming to the interface we can swap out the implementation as desired, changing the “constants” as needed. Avoiding Duplicated Code This is one of those situations that we all mean to avoid, but often use when push comes to shove. One of my colleagues espouses the attitude that you should never type the same code twice. If you have two classes that have a similar method, should they have a common superclass? Our tools have improved much over the decades, and creating that superclass (if it doesn’t already exist) has become almost trivial, so why not do it? Oftentimes you need similar functionality across many different types of classes. Before you copy that method, think about what it is trying to accomplish. Would it make sense to have that capability encapsulated in a utility class? If so, take some time to explore the utilities that are already available to you. I recently ran into a situation where some older code was doing a somewhat specialized string comparison (multiple strings with unusual null handling). The code was long and somewhat convoluted (not to mention very specific to the problem domain). By making use of the null aware functions in org.apache.commons.lang.StringUtils, we were able to reduce the lines of code by an order of magnitude. Sometimes the similar functionality needs to interact with the surrounding code more than a simple utility method will allow for. This may be a good candidate for using the Strategy pattern. Utilizing this pattern allows you to use common code across many different classes, without having to worry about a complex or convoluted class hierarchy. Avoiding Library Dependency As more and more open source tools are made available, the likelihood that your team will want to change which library they depend upon goes up. As different libraries leap-frogging one another, you want a clean way to take advantage of the new features offered. If they all programmed to the same API, you could handle that with dependency injection, as was mentioned above. However, that is rarely the case between competing projects in relatively new problem spaces. So what’s the solution? Create your own interface. You can start out small, incorporating only the functionality you need. You can even “improve” the functionality from your point of view, combining several library calls into a single interface call. This can simplify a more complex and configurable interface to something more suited to your applications use. One recent client was using three different libraries for accessing Excel(tm) worksheets from Java. They didn’t really need all three, but they had coded directly to each as they used it. The problem arose when they wanted to use a new feature from one library in a project that had started in a different library. If they had coded to a common interface they could have made the change quite simply. None of these suggestions are revolutionary, but remembering to follow them from the beginning of a project will make everyone’s life easier in the long run.  Reference: Programming for Change from our JCG partner Keyhole Software at the Keyhole Software blog....

GIT Pull Requests Using GitHub

Old Habits We’ve been working with git for more than a year. The SCM was migrated from SVN, with all its history. Our habits were migrated as well. Our flow is (was) fairly simple: master branch is were we deploy our code from. When working on a feature, we create a feature branch. Several people can work on this branch. Some create private local branch. Some don’t. Code review is done one-on-one. One member asks another to join and walks through the code.       Introducing Pull Request to the Team Recently I introduced to the team, with the help of a teammate the Pull Requests concept. It takes some time to grasp the methodology and see the benefits. However, I already start seeing improvements in collaboration, code quality and coding behaviors. BenefitsBetter collaboration When a person does a change and calls for a pull request, the entire team can see the change. Everyone can comment and give remarks. Discuss changes before they are merged to the main branch. Code ownership Everyone knows about the change and anyone can check and comment. The result is that each one can “own” the code. It helps each team member to participate in coding and reviewing any piece of code. Branches organization There’s extra revision of the code before it is merged. Branches can be (IMHO should be) deleted after merging the feature. git history (the log) is clearer. (This one is totally dependent on the quality of comments) Improved code quality I see that it improves the code quality even before the code review. People don’t want to introduce bad code when knowing that everyone can watch it. Better code review We’ve been doing extensive code review since the beginning of the project. However, as I explained above, we did it one-on-one, which usually the writer explained the code to the reviewer. In my perspective, by doing that, we miss the advantages of code review. The quality of the code review is decreased when the writer explains the material to the reviewer. Using pull request, if the reviewer does not understand something, it means that perhaps the code is not clean enough. So more remarks and comments, thus, better code. Mentoring When a senior does code review to a junior, one-on-one, nobody else sees it. It’s more difficult for the senior to show case the expectations of how the code should look like and how code review should be performed. (there are of course other ways passing it, like code dojos. And, pair-programming, although it’s also one-on-one). By commenting review in the pull request, the team can see what’s important and how to review. Everyone benefits from review of other team members. Improved git usage habits When someone collaborates with the whole team, he/she will probably write better git comments. The commits will be smaller and more frequent, as no one wants to read huge amount of diff rows. So no one wants to “upset” the team. Using pull requests forces the usage of branches, which improves git history.Objections Others may call this section as disadvantages. But the way I see it, it’s more of complaints of “why do we need this? we’re good with how things were till now”I get too many email already Well, this is true. Using pull request, we start getting much more emails, which is annoying. There’s too much noise. I might not notice important emails. The answer for that is simple: If you are part of this feature, then this mail is important because it mentions code changes in some parts that you are working on. If you want to stop receiving emails for this particular pull request, you can ask to mute it.  If we start emailing, we’ll stop talking to each other I disagree with this statement. It will probably reduce the one-on-one review talks. But in my (short) experience, it improved our verbal discussions. The verbal discussion come after the reviewer watched the code change. If a reviewer did not understand something, only then she will approach the developer. The one-on-one discussions are much more efficient and ‘to the point’. Ahh ! I need to think on better commit comments. Now I have more to think of This is good, isn’t it? By using pull requests, each one of the team members need to improve the way comments are written in the commits. It will also improve git habits. In terms of smaller commits in shorter time. It’s harder to understand. I prefer that the other developer will explain to me the intentions Don’t we miss important advantages of code review by getting a walk though from the writer? I mean, if I need to have explanation of what the code does, then we better fix that code. So, if it’s hard to understand, I can write my comments until it improves.How? In this section I will explain briefly the way we chose to use pull requests. The screenshots are taken fron GitHub, although BitBucket supports it as well. Branching From the “main” Branch I did not write ‘master’ intentionally. Let’s say that I work on some feature in a branch called FEATURE_A (for me, this is the main branch). This branch was created from master. Let’s say that I need to implement some kind of sub feature in FEATURE_A. Example (extremely simple): Add toString to class Person. Then I will create a branch (locally out of FEATURE_A): # On branch FEATURE_A, after pull from remote do: # git checkout -b <branch-name-with-good-description> git checkout -b FEATURE_A_add_toString_Person# In order to push it to remote (GitHub), run this: # git push -u origin <branch-name-with-good-description> git push -u origin FEATURE_A_add_toString_Person # Pushing the branch can be later Doing a Pull Request After some work on the branch, and pushing it to GitHub, I can ask for Pull Request. There are a few ways doing it. The one I find “coolest” is using a button/link in GitHub for calling pull request. When entering GitHub’s repository in the web, it shows a clickable notation for the last branch that I pushed to. After sending the pull request, all team members will receive an email. You can also assign a specific person to that pull request if you want him/her do the actual code review.   Changing the Branch for the diff By default GitHub will ask to do pull request against master branch. As explained above, sometimes (usually?) we’ll want to diff/merge against some feature branch and not master. In the pull request dialog, you can select to which branch you want to compare your working branch.  Code Review and Discussion Any pushed code will be added to the pull request. Any team member can add comment. You can add at the bottom of the discussion. And, a really nice option, add comment on specific line of code.  Merging and Deleting the Branch After the discussion and more push code, everyone is satisfied and the code can be merged. GitHub will tell you whether your working branch can be merged to the main (diff) branch for that pull request. Sometimes the branches can’t be automatically merged. In that case, we’ll do a merge locally, fix conflicts (if any) and then push again. We try to remember doing it often, so usually GitHub will tell us that the branches can be automatically merged.   After the pull request is merged, it is automatically closed. If you are finished, you can delete the branch.  Who’s Responsible? People askedWho should merge? Who should delete the branch?We found out that it most sensible that the person who initiated the pull request would merge and delete. The merge will be only after the reviewer gave the OK. Helpful git Commands Here’s a list of helpful git commands we use. # Automatically merge from one branch (from remote) to another # On branch BRANCH_A and I want to merge any pushed change from BRANCH_B git pull origin BRANCH_B# show branches remotly git remote show origin# Verify which local branch, which is set to upstream can be deleted git remote prune origin --dry-run# Actual remove all tangled branches git remote prune origin# Delete the local branch git branch -d <branch-name> Resources https://help.github.com/articles/using-pull-requests https://www.atlassian.com/git/workflows#!pull-request Enjoy…  Reference: GIT Pull Requests Using GitHub from our JCG partner Eyal Golan at the Learning and Improving as a Craftsman Developer blog....

Writing Clean Tests – Naming Matters

It is pretty hard to figure out a good definition for clean code because everyone of us has our own definition for the word clean. However, there is one definition which seems to be universal:Clean code is easy to read.This might come as a surprise to some of you, but I think that this definition applies to test code as well. It is in our best interests to make our tests as readable as possible because:  If our tests are easy to read, it is easy to understand how our code works. If our tests are easy to read, it is easy to find the problem if a test fails (without using a debugger).It isn’t hard to write clean tests, but it takes a lot of practice, and that is why so many developers are struggling with it. I have struggled with this too, and that is why I decided to share my findings with you. This is the second part of my tutorial which describes how we can write clean tests. This time we will learn what kind of an effect naming has to the readability of our tests. We will also learn rules which help us to transform our test cases into executable specifications. The Devil Is in the Details It is relatively easy to write tests which seem clean. However, if we want to go the extra mile and change our tests into a executable specification, we have to pay extra attention to the naming of test classes, test methods, test class’ fields, and local variables. Let’s find out what this means. Naming Test Classes When we think about the different test classes which we create in a typical project, we notice that these classes can be divided into two groups:The first group contains tests which tests the methods of a single class. These tests can be either unit tests or integration tests written for our repositories. The second group contains integration tests which ensure that a single feature is working properly.A good name identifies the tested class or feature. In other words, we should name our test classes by following these rules:If the test class belongs to the first group, we should name it by using this formula: [The name of the tested class]Test. For example, if we are writing tests for the RepositoryUserService class, the name of our test class should be: RepositoryUserServiceTest. The benefit of this approach is that if a test fails, this rule helps us figure out which class is broken without reading the test code. If the class belongs to the second group, we should name it by using this formula: [The name of the tested feature]Test. For example, if we would be writing tests for the registration feature, the name of our test class should be RegistrationTest. The idea behind this rule is that if a test fails, using this naming convention helps us to figure out what feature is broken without reading the test code.Naming Test Methods I am big fan of the naming convention introduced by Roy Osherove. Its idea is to describe the tested method (or feature), expected input or state, and expected behavior in the name of a test method. In other words, if we follow this naming convention, we should name our test methods as follows:If we write tests for a single class, we should name our test methods by using this formula: [the name of the tested method]_[expected input / tested state]_[expected behavior]. For example, if we write a unit test for a registerNewUserAccount() method which throws an exception when the given email address is already associated with an existing user account, we should name our test method as follows: registerNewUserAccount_ExistingEmailAddressGiven_ShouldThrowException(). If we write tests for a single feature, we should name our test methods by using this formula: [the name of the tested feature]_[expected input / tested state]_[expected behavior]. For example, if we write an integration test which tests that an error message is shown when a user tries to create a new user account by using an email address which is already associated with an existing user account, we should name out test method as follows registerNewUserAccount_ExistingEmailAddressGiven_ShouldShowErrorMessage().This naming convention ensures that:The name of a test method describes a specific business or technical requirement. The name of a test method describes expected input (or state) and the expected result for that input (state).In other words, if we follow this naming convention we can answer to the following questions without reading the code of our test methods:What are the features of our application? What is the expected behavior of a feature or method when it receives an input X?Also, if a test fails, we have a pretty good idea what is wrong before we read the source code of the failing test. Pretty cool, huh? Naming Test Class’ Fields A test class can have the following fields:Fields which contains Test doubles such mocks or stubs. A field which contains a reference to the tested object. Fields which contains the other objects (testing utilities) which are used in our test cases.We should name these fields by using the same rules which we use when we name the fields found from the application code. In other words, the name of each field should describe the “purpose” of the object which is stored to that field. This rule sounds pretty “simple” (naming is always hard), and it has been easy for me to follow this rule when I name the tested class and the other classes which are used my tests. For example, if I have to add a TodoCrudService field to my test class, I use the name crudService. When I have added fields which contain test doubles to my test class, I have typically added the type of the test double to the end of the field name. For example, if I have added a TodoCrudService mock to my test class, I have used the name crudServiceMock. It sounds like a good idea but I have come to conclusion that it is a mistake. It is not a major problem but the thing is that a field name should describe the “purpose” of the field, not its type. Thus, we should not add the type of the test double to the field name. Naming Local Variables When we name the local variables used in our test methods, we should follow the same principles used when we name the variables found from our application code. In my opinion, the most important rules are:Describe the meaning of the variable. A good rule of thumb is that the variable name must describe the content of the variable. Don’t use shortened names which aren’t obvious for anyone. Shortened names reduces readability and often you don’t gain anything by using them. Don’t use generic names such as dto, modelObject, or data. Be consistent. Follow the naming conventions of the used programming language. If your project has its own naming conventions, you should honor them as well.Enough with theory. Let’s put these lessons into practice. Putting Theory into Practice Let’s take a look at a modified unit test (I made it worse) which is found from the example application of my Spring Social tutorial. This unit test is written to test the registerNewUserAccount() method of the RepositoryUserService class, and it verifies that this method is working correctly when a new user account is created by using a social sign provider and a unique email address. The source code of our test class looks as follows: import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.mockito.Mock; import org.mockito.invocation.InvocationOnMock; import org.mockito.runners.MockitoJUnitRunner; import org.mockito.stubbing.Answer; import org.springframework.security.crypto.password.PasswordEncoder;import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.mockito.Matchers.isA; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; import static org.mockito.Mockito.verifyZeroInteractions; import static org.mockito.Mockito.when;@RunWith(MockitoJUnitRunner.class) public class RepositoryUserServiceTest {private RepositoryUserService service;@Mock private PasswordEncoder passwordEncoderMock;@Mock private UserRepository repositoryMock;@Before public void setUp() { service = new RepositoryUserService(passwordEncoderMock, repositoryMock); }@Test public void registerNewUserAccountByUsingSocialSignIn() throws DuplicateEmailException { RegistrationForm form = new RegistrationForm(); form.setEmail("john.smith@gmail.com"); form.setFirstName("John"); form.setLastName("Smith"); form.setSignInProvider(SocialMediaService.TWITTER);when(repositoryMock.findByEmail("john.smith@gmail.com")).thenReturn(null); when(repositoryMock.save(isA(User.class))).thenAnswer(new Answer<User>() { @Override public User answer(InvocationOnMock invocation) throws Throwable { Object[] arguments = invocation.getArguments(); return (User) arguments[0]; } });User modelObject = service.registerNewUserAccount(form);assertEquals("john.smith@gmail.com", modelObject.getEmail()); assertEquals("John", modelObject.getFirstName()); assertEquals("Smith", modelObject.getLastName()); assertEquals(SocialMediaService.TWITTER, modelObject.getSignInProvider()); assertEquals(Role.ROLE_USER, modelObject.getRole()); assertNull(modelObject.getPassword());verify(repositoryMock, times(1)).findByEmail("john.smith@gmail.com"); verify(repositoryMock, times(1)).save(modelObject); verifyNoMoreInteractions(repositoryMock); verifyZeroInteractions(passwordEncoderMock); } } This unit test has quite many problems:The field names are pretty generic, and they describe the types of the test doubles. The name of the test method is “pretty good” but it doesn’t describe the given input or the expected behavior. The variable names used in the test method are awful.We can improve the readability of this unit test by making the following changes to it:Change the name of the RepositoryUserService field to registrationService (the name of the service class is a bit bad but let’s ignore that). Remove the the word ‘mock’ from field names of the PasswordEncoder and UserRepository fields. Change the name of the test method to: registerNewUserAccount_SocialSignInAndUniqueEmail_ShouldCreateNewUserAccountAndSetSignInProvider(). Change the name of the form variable to registration. Change the name of the modelObject variable to createdUserAccount.The source code of our “modified” unit test looks as follows: import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.mockito.Mock; import org.mockito.invocation.InvocationOnMock; import org.mockito.runners.MockitoJUnitRunner; import org.mockito.stubbing.Answer; import org.springframework.security.crypto.password.PasswordEncoder;import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.mockito.Matchers.isA; import static org.mockito.Mockito.times; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.verifyNoMoreInteractions; import static org.mockito.Mockito.verifyZeroInteractions; import static org.mockito.Mockito.when;@RunWith(MockitoJUnitRunner.class) public class RepositoryUserServiceTest {private RepositoryUserService registrationService;@Mock private PasswordEncoder passwordEncoder;@Mock private UserRepository repository;@Before public void setUp() { registrationService = new RepositoryUserService(passwordEncoder, repository); }@Test public void registerNewUserAccount_SocialSignInAndUniqueEmail_ShouldCreateNewUserAccountAndSetSignInProvider() throws DuplicateEmailException { RegistrationForm registration = new RegistrationForm(); registration.setEmail("john.smith@gmail.com"); registration.setFirstName("John"); registration.setLastName("Smith"); registration.setSignInProvider(SocialMediaService.TWITTER);when(repository.findByEmail("john.smith@gmail.com")).thenReturn(null);when(repository.save(isA(User.class))).thenAnswer(new Answer<User>() { @Override public User answer(InvocationOnMock invocation) throws Throwable { Object[] arguments = invocation.getArguments(); return (User) arguments[0]; } });User createdUserAccount = registrationService.registerNewUserAccount(registration);assertEquals("john.smith@gmail.com", createdUserAccount.getEmail()); assertEquals("John", createdUserAccount.getFirstName()); assertEquals("Smith", createdUserAccount.getLastName()); assertEquals(SocialMediaService.TWITTER, createdUserAccount.getSignInProvider()); assertEquals(Role.ROLE_USER, createdUserAccount.getRole()); assertNull(createdUserAccount.getPassword());verify(repository, times(1)).findByEmail("john.smith@gmail.com"); verify(repository, times(1)).save(createdUserAccount); verifyNoMoreInteractions(repository); verifyZeroInteractions(passwordEncoder); } } It is clear that this test case still has some problems but I think that our changes improved its readability. I think that the most dramatic improvements are:The name of test method describes the expected behavior of the tested method when a new user account is created by using a social sign in provider and a unique email address. The only way we could get this information from the “old” test case was to read the source code of the test method. This is obviously a lot slower than reading just the method name. In other words, giving good names to test methods saves time and helps us to get a quick overview about the requirements of the tested method or feature. the other changes transformed a generic CRUD test into a “use case”. The “new” test method describes clearlyWhat steps does this use case have. What the registerNewUserAccount() method returns when it receives a registration, which is made by using a social sign in provider and has a unique email address.In my opinion, the “old” test case failed to do this.I am not entirely happy with the name of the RegistrationForm object but it is definitely better than the original name. Summary We have now learned that naming can have a huge positive effect to the readability of our test cases. We have also learned a few basic rules which helps us to transform our test cases into executable specifications. However, our test case still has some problems. These problems are:The code which creates new RegistrationForm objects simply sets the property values of the created object. We can make this code better by using test data builders. The standard JUnit assertions, which verify that the information of the returned User object is correct, are not very readable. Another problem is that they only check that the property values of the returned User object are correct. We can improve this code by turning assertions into a domain-specific language.I will describe both techniques in the future. In the meantime, I would love to hear what kind of naming conventions do you use.Reference: Writing Clean Tests – Naming Matters from our JCG partner Petri Kainulainen at the Petri Kainulainen blog....

The Index You’ve Added is Useless. Why?

Recently, at the office:Bob: I’ve looked into that slow query you’ve told me about yesterday, Alice. I’ve added the indexes you wanted. Everything should be fine now Alice: Thanks Bob. I’ll quickly check … Nope Bob, still slow, it didn’t seem to work Bob: You’re right Alice! It looks like Oracle isn’t picking up the index, for your query even if I add an /*+INDEX(...)*/ hint. I don’t know what went wrong!?And so, the story continues. Alice is frustrated because her feature doesn’t ship on time, Bob is frustrated because he thinks that Oracle doesn’t work right. True story! Bob Forgot about Oracle and NULL Poor Bob forgot (or didn’t know) that Oracle doesn’t put NULL values in “ordinary” indexes. Think about it this way: CREATE TABLE person ( id NUMBER(38) NOT NULL PRIMARY KEY, first_name VARCHAR2(50) NOT NULL, last_name VARCHAR2(50) NOT NULL, date_of_birth DATE NULL );CREATE INDEX i_person_dob ON person(date_of_birth); Now, Bob thinks that his index solves all problems, because he verified if the index worked using the following query: SELECT * FROM person WHERE date_of_birth > DATE '1980-01-01'; (of course, you generally shouldn’t SELECT *) And the execution plan looked alright: ---------------------------------------------------- | Id | Operation | Name | ---------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID| PERSON | |* 2 | INDEX RANGE SCAN | I_PERSON_DOB | ---------------------------------------------------- This is because Bob’s predicate doesn’t rely on NULL being part of the I_PERSON_DOB index. Unfortunately, Alice’s query looked more like this (simplified version): SELECT 1 FROM dual WHERE DATE '1980-01-01' NOT IN ( SELECT date_of_birth FROM person ); So, essentially, Alice’s query checked if anyone had their date of birth at a given date. Her execution plan looked like this: ------------------------------------- | Id | Operation | Name | ------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | FILTER | | | 2 | FAST DUAL | | |* 3 | TABLE ACCESS FULL| PERSON | ------------------------------------- As you can see, her query made a TABLE ACCESS FULL operation, bypassing the index. Why? It’s simple:Oracle doesn’t put NULL values in indexes NOT IN (a, b, NULL, c, d) always yields NULLEven if our DATE '1980-01-01' value is or is not in the index, we’ll still have to check the whole table to see whether a single NULL value is contained in the date_of_birth column. Because, if there was a NULL value, the NOT IN predicate in Alice’s query would never yield TRUE or FALSE, but NULL. Alice can solve this issue with NOT EXISTS Alice can solve it easily herself, by replacing NOT IN through NOT EXISTS, a predicate that doesn’t suffer from SQL’s peculiar three-valued boolean logic. SELECT 1 FROM dual WHERE NOT EXISTS ( SELECT 1 FROM person WHERE date_of_birth = DATE '1980-01-01' ); This new query now again yields an optimal plan: ------------------------------------------ | Id | Operation | Name | ------------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | FILTER | | | 2 | FAST DUAL | | |* 3 | INDEX RANGE SCAN| I_PERSON_DOB | ------------------------------------------ But the problem still exists, because what can happen, will happen, and Alice will have to remember this issue for every single query she writes. Bob should just set the column to NOT NULL The best solution, however is to simply set the column to NOT NULL: ALTER TABLE person MODIFY date_of_birth DATE NOT NULL; With this constraint, the NOT IN query is exactly equivalent to the NOT EXISTS query, and Bob and Alice can be friends again. Takeaway: How to find “bad” columns? It’s easy. The following useful query lists all indexes that have at least one nullable column in them. SELECT i.table_name, i.index_name, LISTAGG( LPAD(i.column_position, 2) || ': ' || RPAD(i.column_name , 30) || ' ' || DECODE(t.nullable, 'Y', '(NULL)', '(NOT NULL)'), ', ' ) WITHIN GROUP (ORDER BY i.column_position) AS "NULLABLE columns in indexes" FROM user_ind_columns i JOIN user_tab_cols t ON (t.table_name, t.column_name) = ((i.table_name, i.column_name)) WHERE EXISTS ( SELECT 1 FROM user_tab_cols t WHERE (t.table_name, t.column_name, t.nullable) = ((i.table_name, i.column_name, 'Y' )) ) GROUP BY i.table_name, i.index_name ORDER BY i.index_name ASC; When run against Bob and Alice’s schema, the above query yields: TABLE_NAME | INDEX_NAME | NULLABLE columns in indexes -----------+--------------+---------------------------- PERSON | I_PERSON_DOB | 1: DATE_OF_BIRTH (NULL) Use this query on your own schema now, and go through the results, carefully evaluating if you really need to keep that column nullable. In 50% of the cases, you don’t. By adding a NOT NULL constraint, you can tremendously speed up your application!  Reference: The Index You’ve Added is Useless. Why? from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

Cheating on the N Queens benchmark

Many Solver distributions include an N Queens example, in which n queens need to be placed on a n*n sized chessboard, with no attack opportunities. So when you’re looking for the fastest Solver, it’s tempting to use the N Queens example as a benchmark to compare those solvers. That’s a tragic mistake, because the N Queens problem is solvable in polynomial time, which means there’s a way to cheat. That being said, OptaPlanner solves the 1 000 000 queens problem in less than 3 seconds. Here’s a log to prove it (with time spent in milliseconds):       INFO Opened: data/nqueens/unsolved/10000queens.xml INFO Solving ended: time spent (23), best score (0), ...INFO Opened: data/nqueens/unsolved/100000queens.xml INFO Solving ended: time spent (159), best score (0), ...INFO Opened: data/nqueens/unsolved/1000000queens.xml INFO Solving ended: time spent (2981), best score (0), ... How to cheat on the N Queens problem The N Queens problem is not NP-complete, nor NP-hard. That is math speak for stating that there’s a perfect algorithm to solve this problem: the Explicits Solutions algorithm. Implemented with a CustomSolverPhaseCommand in OptaPlanner it looks like this: public class CheatingNQueensPhaseCommand implements CustomSolverPhaseCommand {public void changeWorkingSolution(ScoreDirector scoreDirector) { NQueens nQueens = (NQueens) scoreDirector.getWorkingSolution(); int n = nQueens.getN(); List<Queen> queenList = nQueens.getQueenList(); List<Row> rowList = nQueens.getRowList();if (n % 2 == 1) { Queen a = queenList.get(n - 1); scoreDirector.beforeVariableChanged(a, "row"); a.setRow(rowList.get(n - 1)); scoreDirector.afterVariableChanged(a, "row"); n--; } int halfN = n / 2; if (n % 6 != 2) { for (int i = 0; i < halfN; i++) { Queen a = queenList.get(i); scoreDirector.beforeVariableChanged(a, "row"); a.setRow(rowList.get((2 * i) + 1)); scoreDirector.afterVariableChanged(a, "row");Queen b = queenList.get(halfN + i); scoreDirector.beforeVariableChanged(b, "row"); b.setRow(rowList.get(2 * i)); scoreDirector.afterVariableChanged(b, "row"); } } else { for (int i = 0; i < halfN; i++) { Queen a = queenList.get(i); scoreDirector.beforeVariableChanged(a, "row"); a.setRow(rowList.get((halfN + (2 * i) - 1) % n)); scoreDirector.afterVariableChanged(a, "row");Queen b = queenList.get(n - i - 1); scoreDirector.beforeVariableChanged(b, "row"); b.setRow(rowList.get(n - 1 - ((halfN + (2 * i) - 1) % n))); scoreDirector.afterVariableChanged(b, "row"); } }}} Now, one could argue that this implementation doesn’t use any of OptaPlanner’s algorithms (such as the Construction Heuristics or Local Search). But it’s straightforward to mimic this approach in a Construction Heuristic (or even a Local Search). So, in a benchmark, any Solver which simulates that approach the most, is guaranteed to win when scaling out. Why doesn’t that work for other planning problems? This algorithm is perfect for N Queens, so why don’t we use a perfect algorithm on other planning problems? Well, simply because there are none! Most planning problems, such as vehicle routing, employee rostering, cloud optimization, bin packing, … are proven to be NP-complete (or NP-hard). This means that these problems are in essence the same: a perfect algorithm for one, would work for all of them. But no human has ever found such an algorithm (and most experts believe no such algorithm exists). Note: There are a few notable exceptions of planning problems that are not NP-complete, nor NP-hard. For example, finding the shortest distance between 2 points can be solved in polynomial time with A*-Search. But their scope is narrow: finding the shortest distance to visit n points (TSP), on the other hand, is not solvable in polynomial time. Because N Queens differs intrinsically from real planning problems, is a terrible use case to benchmark. Conclusion Benchmarks on the N Queens problem are meaningless. Instead, benchmark implementations of a realistic competition. A realistic competition is an official, independent competition:that clearly defines a real-word use case with real-world constraints with multiple, real-world datasets that expects reproducible results within a specific time limit on specific hardware that has had serious participation from the academic and/or enterprise Operations Research communityOptaPlanner‘s examples implement several cases of realistic competitions.Reference: Cheating on the N Queens benchmark from our JCG partner Geoffrey De Smet at the OptaPlanner blog....

Test coverage using testing tools and methods

Overview: To define “Test coverage” we have to talk about the topics stated below:The purpose of test coverage. Different types of test coverage standards. Test coverage metrics Unit test coverage Test coverage tools Advantage and disadvantage of test coverage.The purpose of test coverageTest coverage is an estimate utilized in software testing. It gives details about the level to which the written coding of an application has been tested. It is a type of testing that seems straight and so appears in the caption of white box testing. Presently, the importance of test coverage is extensive in the ground of software engineering, the current plan method of which depends on computer languages. Test coverage methods are among the initial methods discovered for efficient software testing. Different types of test coverage standards. To calculate what proportion of code has been covered by a test collection, one or more coverage standards are utilized. These standards are generally distinct as a regulation or necessity. Essential coverage standard There is lots of coverage standards, the importance are as followsFunction coverage – Here every process or functions in the system has been recognized. Statement coverage – Every statement in the system been executed. Branch coverage – Every division of every system configuration been executed. Condition coverage – Every Boolean sub phrase calculated both to correct and incorrect.Customized conclusion coverage It is amalgamation of function coverage and branch coverage and occasionally also describe conclusion coverage. This standard needs that each position of access and depart in the application have been called up as a minimum one time, and each conclusion in the application have employed all probable results as a minimum one time. Here in this situation the conclusion is a Boolean term collected of situation and none or added Boolean operatives. This description is not the similar like branch coverage, though; various employ the word conclusion coverage as another word for branch coverage. Conclusion coverage needs that equally conclusion and situation coverage been pleased. Though, for security vital program it is frequently necessary that customized conclusion coverage be fulfilled. This standard expands conclusion standard with necessities that every state must influence the conclusion. Various state coverage This standard needs that all amalgamation of situations within every conclusion is experienced. Constraint value coverage Constraint value coverage needs that in a process taking constraints, the entire general principles for those constraints been measured. The initiative is that all general probable principles for a constraint are experienced. Unit Test CoverageUnit tests notify us if the source code executed as estimated, and test coverage notifies us what lefts to be tested. Mainly programmers recognize this method and consent on its value proposal, and frequently objective full coverage. Though full coverage is an excellent aim, full of the incorrect sort of treatment may direct to troubles. A distinctive software creation attempt procedures treatment in words of the amount of any statement or branches to be tested. Yet with full statement or branch coverage, serious errors might be there in the logic of the source code, avoiding both programmers and administrators with a fake logic of protection. Statement and branch coverage are good for discovering obvious troubles established in unimplemented source code, but frequently overlook errors linked to equally conclusion configurations and conclusion connections. On the additional, path coverage is a stronger and inclusive procedure that assists discloses imperfection early on. Test coverage metrics There are six kinds of test coverage metrics, which are describes below. Line Coverage Line coverage check is the line of source code implemented at the time of replication. Enclosed will demonstrate the amount of reasonable lines of source code that present in a appropriate file with the amount of reasonable lines that were implemented at the time of the replication alongside with a proportion representative the proportion of lines implemented. If wordy form is chosen for a statement, enclosed will demonstrate the lines of reason that were not implemented throughout the replication execution. Toggle Coverage Toggle coverage checks at the time of replication are this bit of the line or catalog transform from a significance of zero to one and get back from one to zero. A bit is supposed to be completely enclosed when it closures rear and onward as a minimum one time. This metric does not signify to the end user that each worth of a multi-bit vector was observed. Memory CoverageMemory coverage checks an amount of issues regarding recollections or multidimensional collections being employed in the design, together with the following details: Checks all bits of every accessible remembrance component closure from 0 to 1.Checks all bits of every accessible remembrance component closure from 1 to 0. Checks every accessible remembrance component can be written down. Checks every accessible remembrance component can be comprehend.Recollections or multidimensional collections might have two kinds of extents, those are packed and unpacked. Packed aspects details are particular to the left of a collection statement though unpacked aspects details are particular to the right of a collection statement. Combinational Logic CoverageThis type of coverage checks the results when a functions estimates to throughout the lessons of the recreation. This kind of combination logic coverage is awfully valuable in formative rational amalgamations of indications that were not attempted at the time of recreation, revealing probable gaps in confirmation. Finite State Machine Coverage This sort of coverage metrics checks if it is possible to achieve each of the conditions and go across every probable pathway throughout a specified state machine. Finite state machine coverage has two types of coverage details which are enclosed can handle,Condition coverage – It checks all condition of finite state machine coverage hits at the time of recreation. Condition evolution coverage – It checks the finite state machine coverage evolution among all condition in recreation.Assertion CoverageThis type of coverage metrics checks all of the probable coverage issues of the declaration which is incorporated in the design can be hit. Presently, enclosed is able of discovery all open confirmation language statement components observed in the plan. In the majority open confirmation language declaration components are one or more integral coverage issues. While the end user has precise an exacting declaration example to ensure for an exposure reason, enclosed replicates this declaration component, maintenance follow of which treatment issues in the declaration have been hit through recreation and which have not. This permits the end user to ensure for additional composite treatment situations in the plan. At present, only open verification language edition 1.6 and new is maintained. Previous editions of the open verification language might not work and are not intended to be maintained. Test coverage toolsCode coverage ToolsPresently the programmer group of people is in a comparatively fine condition concerning the accessibility of high class test coverage tools. We are going to take a glance at a small amount of, but there are plenty more out there, the majorities are for trade, but various are complimentary or yet open source. Preliminary, we begin by observing how test coverage dealings are normally executed. Execution method The executions method may be classified into two separate execution types:InstrumentationThis type method is concerning influencing the source code by inserting coverage code into planned locations. In details, the way of instrumentation has two instructs: class instrumentation and source instrumentation. Not unexpectedly, the differentiation is that class instrumentation inserts the coverage code straightly into assembles .class files whereas source instrumentation generates a mediator edition of the resources which are then assembled into the ultimate, source implemented .class files. Nearly all test coverage tools have selected one of these two instrumentation methods.Custom JVMOne more option to inserting coverage code to the byte code is to shift that liability into the custom JVM itself. Test coverage investigation might be executed by having the fundamental instrument continue calculation of which division of the overloaded classes are carry out. In observe, though, there isn’t any accepted tools taking this type. Advantage and disadvantage of test coverageAdvantageIt builds extra test conditions to enhance exposure. It assists in discovering location of a application not implemented by a group of test conditions. It assists influentially a significant calculation of test coverage, which ultimately procedures the excellence of the software application.DisadvantageOne problem of test coverage capacity is that it calculates treatment of what has been written down, that is the code can’t declare anything regarding the application that has not been written down. If a particular method has not been executed or a method was deleted from the requirement, then configuration basis methods can’t declare anything about them it simply observes at a configuration which is previously present.Conclusion Employing several kind of test coverage method is a phase in the accurate track so far it is simple to misunderstand the outcomes. Though statement and branch coverage metrics are simple to calculate and accomplish together may depart serious faults unexposed, providing programmers and administrators a fake intelligence of defense. Foundation path coverage gives an additional healthy and complete way for discovering these overlooked faults lacking aggressively growing the amount of tests essential.  Reference: Test coverage using testing tools and methods from our JCG partner Kaushik Pal at the TechAlpine – The Technology world blog....

The Low Quality of Scientific Code

Recently I’ve been trying to get a bit into music theory, machine learning, computational linguistics, so I ended up looking at libraries and tools written by the scientific community – examples include the Stanford Core NLP library, GATE, Weka, jMusic, and several more. The general feeling is that scientific libraries have mostly bad code. I will not point fingers, but there are too many freshman mistakes – not considering thread-safety, cryptic, ugly and/or stringly-typed APIs, lack of type-safety, poorly named variables and methods, choosing bad/slow serialization formats, writing debug messages to System.err (or out), lack of documentation, lack of tests.   Thus using these libraries becomes time consuming and error prone. Every 10 minutes you see some horribly written code that you don’t have the time to fix. And it’s not just one or two things, that you would report in a normal open-source project – it’s an overall low level of quality. On the other hand these libraries have a lot of value, because the low-level algorithms will take even more time and especially know-how to implement, so just reusing them is obviously the right approach. Some libraries are even original research and so you just can’t write them yourself, without spending 3 years on a PhD thesis. I cannot but mention Heartbleed here – OpenSSL is written by scientific people, and much has been written on topic that even OpenSSL does not meet modern software engineering standards. But that’s only the surface. Scientists in general can’t write good code. They write code simply to achieve their immediate goal, and then either throw it away, or keep using it for themselves. They are not software engineers, and they don’t seem to be concerned with code quality, code coverage, API design. Not to mention scientific infrastructure, deployment on multiple servers, managing environment. These things are rarely done properly in the scientific community. And that’s not only in computer science and related fields like computational linguistics – it’s everywhere, because every science now requires at least computer simulations. Biology, bioinformatics, astronomy, physics, chemistry, medicine, etc – almost every scientists has to write code. And they aren’t good at it. And that’s OK – we are software engineers and we dedicate our time and effort to these things; they are scientists, and they have vast knowledge in their domain. Scientists use programming the way software engineers use public transport – just as a means to get to what they have to do. And scientists should not be distracted from their domain by becoming software engineers. But the problem is still there. Not only there are bad libraries, but the code scientists write may yield wrong results, work slowly, or regularly crash, which directly slows down or even invisibly hampers their work. For the libraries, we, software engineers can contribute, or companies using them can dedicate an engineer to improving the library. Refactor, cleanup, document, test. The authors of the libraries will be more than glad to have someone prettify their hairy code. The other problem is tougher – science needs funding for dedicated software engineers, and they prefer to use that funding for actual scientists. And maybe that’s a better investment, maybe not. I can say for myself that I’ll be glad to join a research team and help with the software part, while at the same time gaining knowledge in the field. And that would be fascinating, and way more exciting than writing boring business software. Unfortunately that doesn’t happen too often now (I tried once, a couple of years ago, and got rejected, because I lacked formal education in biology). Maybe software engineers can help in the world of science. But money is a factor.  Reference: The Low Quality of Scientific Code from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog....

Lawyers and Developers, not so different

Really I have been developing software professionally since 1978. I went to law school (BU Law ’91). I think that computer programming technology and the law are really, really similar. At the end of the day, both law and computing is about wrapping abstractions around very complex interactions such that the rules are comprehensible and the outcomes are predictable. At the end of the day, both law and computing are about giving individuals the ability to reason about the behavior of systems (people, groups, computers) based on a wide variety of inputs that cannot all be conceived of when the systems/laws are initially developed. Both the law and computer systems have ways of dealing with new, unexpected inputs: judges/common law and system updates. Both US/UK law and computing have externally mandated requirements (legislature and language designer) and evolved requirements (common law and libraries/frameworks.) Both law and computing have words for things that a person skilled in an area should deeply understand, yet the terms used are rather simple. In law, it’s called “terms of art” and in computing it’s called “design patterns.” Both law and computing have practitioners that spend many, many years understanding the state of the art and often influencing the state of the art and having the requirement that they are up to date in the state of the art in their area. Most practitioners of law and computing ultimately have very little influence on the overall direction of their field. Names like Hand and Brandies and Wadler and Hopper and known and revered in each of our worlds because they are the few who really made material differences. Both law and computing necessarily have to evolve in ways that practitioners can keep up. Even “trivial” changes like the 1986 Tax Reform or Microsoft’s .Net have taken years for practitioners in law and computing to fully understand and reconcile. So, when a lawyer or judge says, “well, just go make a new language,” ask that person when the next UCC will be happening. When a lawyer or judge says, “there are lots of options for building software for a new phone,” ask them what popular phone is built on a system that has less than 10 years of popular programming support. Hint, none. Apple’s was built on OS X which was built on NextStep with was built on BSD Unix. The iPhone APIs are substantially similar to the NextStep APIs which were released in the 1980s. Apple had a 10,000+ strong developer network to draw from for iOS development. Windows Phone were built on Windows APIs that date back to the 1990s. Even Blackberry and Nokia used C, UNIX-style APIs, and popular windowing toolkits. Just as a new approach to the law (e.g., the UCC which “merely” unified the standard business practices across states) requires many years, millions of dollars of effort, and a ton of training and learning and knowledge sharing, so, too, does a new approach to computing. That is why there are very few “new” languages:That is why most of the available languages derive from each other. Just like Judge Blackstone is very alive in common law, Backus and McCarthy are very much alive in every line of code we write. Just as the law is a 500 year chain of precedents leading to where we are today… punctuated by legislation… computer languages, systems, and APIs are a 60-80 year chain of design decisions and evolutions that lead us to where we are now. No computing system is done in a vacuum, just as no legal case is done in a vacuum. Learned Hand is perched on the shoulder of every sitting judge in the US. Just as Backus is perched on the shoulder of every Java and C and Ruby and Python programmer. Just as every legal case is a necessarily and decidedly a derivative work of the prior art in the law, almost every computer language and library and API is a derivative work of what came before in computing. We are not so different. Let’s try to communicate to the law folks that computing art and systems have evolved much like the art of the law. It’s not so simple to make sweeping changes. In fact, it’s very, very costly.  Reference: Lawyers and Developers, not so different from our JCG partner David Pollak at the DPP’s Blog blog....

Oracle v. Google, My Sweet Lord

He’s not So Fine I wrote about how to mitigate the disaster that is the appeal courts’s decision in Oracle v. Google. Today, I’m going to cover a few more topics. Sometimes, copyright law can have really bad side effect. It’s supposed to help content creators make money from their content and that’s awesome. But in the George Harrison case, he lost a copyright suit because he “subconsciously copied” another song. If this is the standard that the court is going to apply to all API design, the computer industry is screwed… luckily starting with Oracle.   We build on the shoulders of giants Most computer software is a derivative work of some other computer software. We take someone else’s design, make some changes, and it becomes our design. It is considered good practice to use the Gang of Four’s Design Patterns to write software. There’s even a book about it. More accurately, there’s a cottage industry of books about the design patterns. Sadly, after the court’s decision, one cannot safely use the Structure, Sequence, and Organization of Design Patterns from a copyrighted work. Why not? Well, copyright is attached to the book (I know this because I recently signed over my copyright in the second edition of Beginning Scala to Apress). So copying the design pattern, the very essence of structure, sequence, and organization, is a copyright violation. We can’t use the structure, sequence, and organization we read about in copyrighted works anymore unless the copyright holder in the book explicitly grants permission. Sigh. In fact, there is very little in computing that isn’t a derivative work at the design/API level from somebody else. Scala’s Actors derive from Erlang’s Actors derive from Scheme. Collections libraries… mostly derive from Smalltalk. Network libraries… mostly from BSD sockets. Layers on top of Posix (who has the copyright in the SSO of Posix?) The list goes on. Oracle is the first to be screwed Under the 9th Circuit, copyright attaches to the Structure, Sequence, and Organization of APIs. Structure, Sequence, and Organization is actually a test for non-literal copying in software copyright cases. So, everything in Java and in MySQL’s wire API and everything in most of Oracle’s ERP systems are APIs that derive from other copyrighted works. If I were an IP litigator, I’d be warming up my pitch deck with quotes from the JCP discussing how a particular Java API should be similar to a .Net API and going to pitch Microsoft on an API infringement suit. If I were an IP litigator, I’d be warming up my pitch deck with quotes from Oracle’s ERP marketing materials about how Oracle’s APIs are similar to SAP’s to make it easier to migrate from SAP to Oracle and going to pitch SAP on beating the mule snot out of Oracle. And IBM… IBM and Google have a collectively huge brain trust of folks that were the originators of the structure, sequence, and organization of most computing libraries, APIs, and paradigms. If SSO of systems, not just software, are subject to copyright, then IBM and Google have about 75% of the brains that developed what we all use today. Once we toss in the subconscious copying of the structure, sequence, and organization of the APIs we’ve seen and used over the years, there will be a bonanza for the lawyers and the software industry will be screwed… at least Oracle has the most to lose in this situation… My Sweet Lord. Sun made a ton of money off Java Only an idiot would try to monetize a language. It’s clear from Eiffel and GemStone/Smalltalk and more recently Typesafe/Scala (note that Scala is now mostly absent from Typesafe’s home page) that one cannot make money from selling a language. What Java did for Sun was make it possible in the 90s and early 2000s for Sun to sell very expensive hardware. How? Back in the 90s, Windows boxes were inexpensive and Sun servers were very expensive. Back in the 90s, you needed at least 10 developers to put together the simplest interactive web site. In order for Sun to sell very expensive (and worth it) servers, they needed a way for developers to write software for those servers. But it would have been cost-prohibitive to put 10 $10K Sun workstations on 10 developers’ desks. But Java’s “write once, run anywhere” promise allowed developers to write web sites and test web sites on their Windows machines and then deploy to Sun’s servers and have nearly identical behavior. Sun’s product was expensive, high margin servers. Sun enabled the use of its servers with Java because developers could develop on inexpensive machines and deploy to expensive machines. This is no different than Google making search “free” and selling advertising. Or broadcast networks making their content “free” and covering the cost with advertising. Underlying Oracle’s suit is the premise that because Sun/Oracle put a lot of money into Java, there should be some direct way to monetize that effort. But, Oracle’s perspective is a pure software perspective. And, as a matter of history, one cannot monetize a language. One can monetize things around the language… and Oracle has WebLogic and many other things around Java-the-language that monetize the language. Even IBM has a competing implementation (or at least used to) of Java that IBM subsidizes because IBM has a language that runs the same on a PC and a mainframe. Sun survived and for a time thrived because of Java. Sun would have been Silicon Graphics but for Java. Instead, Oracle acquired Sun for $5B+. This may not have been the best deal for Oracle, but Java was certainly a key driver in Sun’s outcome and the cost of Java development at Sun was certainly paid many times over in hardware sales and the exit to Oracle. What Google should do Google should petition for an en banc re-hearing of the case. Why?The court erred by going to the Structure, Sequence, and Organization analysis when they could have found that copyright attached to the source code of the Java APIs. Google copied the source code (this is in the record), and thus the Structure, Sequence, and Organization analysis was unnecessary. The SSO analysis conflicts with the outcome in Sony because a simple machine-based reverse engineering seems to magically remove copyright even though the SSO of the resulting APIs are identical. The court’s use of the term “Original Work” is different than the accepted industry norm relating to APIs. As a matter of fact, Sun employees typed the names and developed the specific hierarchy of the Java APIs, but the vast majority of the APIs derived Structure, Sequence, and Organization from other works. Google did not dispute that Sun/Oracle typed the source code that resulted in the Java APIs. But if APIs are subject to SSO analysis, then Google’s lawyers erred in failing to dispute that the Java APIs were original work because they are not an original work under an SSO analysis.If the EFF or Google’s lawyers want me to take a look at any briefs or otherwise help with the issue, I’ll be happy to. There may be a conflict, but we can discuss the potential conflict privately.  Reference: Oracle v. Google, My Sweet Lord from our JCG partner David Pollak at the DPP’s Blog blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books