Featured FREE Whitepapers

What's New Here?


Hibernate Identity, Sequence and Table (Sequence) generator

Introduction In my previous post I talked about different database identifier strategies. This post will compare the most common surrogate primary key strategies:IDENTITY SEQUENCE TABLE (SEQUENCE)    IDENTITY The IDENTITY type (included in the SQL:2003 standard) is supported by:SQL Server MySQL (AUTO_INCREMENT) DB2 HSQLDBThe IDENTITY generator allows an integer/bigint column to be auto-incremented on demand. The increment process happens outside of the current running transaction, so a roll-back may end-up discarding already assigned values (value gaps may happen). The increment process is very efficient since it uses a database internal lightweight locking mechanism as opposed to the more heavyweight transactional course-grain locks. The only drawback is that we can’t know the newly assigned value prior to executing the INSERT statement. This restriction is hinderingthe “transactional write behind” flushing strategy adopted by Hibernate. For this reason Hibernates disables the JDBC batch support for entities using the IDENTITY generator. For the following examples we’ll enable Session Factory JDBC batching: properties.put("hibernate.order_inserts", "true"); properties.put("hibernate.order_updates", "true"); properties.put("hibernate.jdbc.batch_size", "2"); Let’s define an Entity using the IDENTITY generation strategy: @Entity(name = "identityIdentifier") public static class IdentityIdentifier {@Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; } Persisting 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new IdentityIdentifier()); } session.flush(); return null; } }); Will execute one query after the other (there is no JDBC batching involved): Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Query:{[insert into identityIdentifier (id) values (default)][]} Aside from disabling JDBC batching, the IDENTITY generator strategy doesn’t work with the Table per concrete class inheritance model, because there could be multiple subclass entities having the same identifier and a base class query will end up retrieving entities with the same identifier (even if belonging to different types). SEQUENCE The SEQUENCE generator (defined in the SQL:2003 standard) is supported by:Oracle SQL Server PostgreSQL DB2 HSQLDBA SEQUENCE is a database object that generates incremental integers on each successive request. SEQUENCES are much more flexible than IDENTIFIER columns because:A SEQUENCE is table free and the same sequence can be assigned to multiple columns or tables A SEQUENCE may preallocate values to improve performance A SEQUENCE may define an incremental step, allowing us to benefit from a “pooled” Hilo algorithm A SEQUENCE doesn’t restrict Hibernate JDBC batching A SEQUENCE doesn’t restrict Hibernate inheritance modelsLet’s define a Entity using the SEQUENCE generation strategy: @Entity(name = "sequenceIdentifier") public static class SequenceIdentifier { @Id @GenericGenerator(name = "sequence", strategy = "sequence", parameters = { @org.hibernate.annotations.Parameter(name = "sequenceName", value = "sequence"), @org.hibernate.annotations.Parameter(name = "allocationSize", value = "1"), }) @GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE) private Long id; } I used the “sequence” generator because I didn’t want Hibernate to choose a SequenceHiLoGenerator or a SequenceStyleGenerator on our behalf. Adding 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new SequenceIdentifier()); } session.flush(); return null; } }); Generate the following queries: Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[call next value for hibernate_sequence][]} Query:{[insert into sequenceIdentifier (id) values (?)][1]} {[insert into sequenceIdentifier (id) values (?)][2]} Query:{[insert into sequenceIdentifier (id) values (?)][3]} {[insert into sequenceIdentifier (id) values (?)][4]} Query:{[insert into sequenceIdentifier (id) values (?)][5]} This table the inserts are batched, but we know have 5 sequence calls prior to inserting the entities. This can be optimized by using a HILO algorithm. TABLE (SEQUENCE) There is another database independent alternative to generating sequences. One or multiple tables can be used to hold the identifier sequence counter. But it means trading write performance for database portability. While IDENTITY and SEQUENCES are transaction-less, using a database table mandate ACID, for synchronizing multiple concurrent id generation requests. This is made possible by using row-level locking which comes at a higher cost than IDENTITY or SEQUENCE generators. The sequence must be calculated in a separate database transaction and this requires the IsolationDelegate mechanism, which has support for both local (JDBC) and global(JTA) transactions.For local transactions, it must open a new JDBC connection, therefore putting more pressure on the current connection pooling mechanism. For global transactions, it requires suspending the current running transaction. After the sequence value is generated, the actual transaction has to be resumed. This process has its own cost, so the overall application performance might be affected.Let’s define a Entity using the TABLE generation strategy: @Entity(name = "tableIdentifier") public static class TableSequenceIdentifier {@Id @GenericGenerator(name = "table", strategy = "enhanced-table", parameters = { @org.hibernate.annotations.Parameter(name = "table_name", value = "sequence_table") }) @GeneratedValue(generator = "table", strategy=GenerationType.TABLE) private Long id; } I used the newer “enhanced-table” generator, because the legacy “table” generator has been deprecated. Adding 5 entities: doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { for (int i = 0; i < 5; i++) { session.persist(new TableSequenceIdentifier()); } session.flush(); return null; } }); Generate the following queries: Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[insert into sequence_table (sequence_name, next_val) values (?,?)][default,1]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][2,1,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][3,2,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][4,3,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][5,4,default]} Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} Query:{[update sequence_table set next_val=? where next_val=? and sequence_name=?][6,5,default]} Query:{[insert into tableIdentifier (id) values (?)][1]} {[insert into tableIdentifier (id) values (?)][2]} Query:{[insert into tableIdentifier (id) values (?)][3]} {[insert into tableIdentifier (id) values (?)][4]} Query:{[insert into tableIdentifier (id) values (?)][5]} The table generator allows JDBC batching but it resorts to SELECT FOR UPDATE queries. The row level locking is definitely less efficient than using a native IDENTITY or SEQUENCE. So, based on your application requirements you have multiple options to choose from. There isn’t one single winning strategy, each one having both advantages and disadvantages.Code available on GitHub.Reference: Hibernate Identity, Sequence and Table (Sequence) generator from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....

How to implement a custom password strength indicator in JSF

Verifying password strength using JavaScript is a common task. In this post, I will show how to add a password strength indicator to an JSF based web application. The password component in PrimeFaces already has a feedback indicator of the password strength, but it has two major shortcomings:Feedback indicator is not responsive (fix width, not mobile friendly, etc.) Rules for the password strength verification are hard coded in JavaScript. No customization is possible.What we need is a good looking, easy customizable and responsive password strength indicator / meter. Fortunately, PrimeFaces has another component – progress bar which we can use for our purpose. This is not a misusage. The end result is actually impressive.Let’s start with XHTML. First, define a quite normal passwort field.   <p:password id="passwort" value="#{mybean.password}" label="Password" required="true" autocomplete="off"/>Second, define a progress bar with displayOnly=”true” and some messages for the passwort strength (weak, medium, strong). <div style="white-space:nowrap;"> <h:outputText value="Password strength "/> <h:outputText id="pwdWeak" value="weak" style="display:none" styleClass="bold weakMsg"/> <h:outputText id="pwdMedium" value="medium" style="display:none" styleClass="bold mediumMsg"/> <h:outputText id="pwdStrong" value="strong" style="display:none" styleClass="bold strongMsg"/> </div> <p:progressBar id="pwdStrength" value="0" styleClass="pwdStrength" displayOnly="true"/> Let’s go to the JavaScript part. We need a script block (placed somewhere after the p:progressBar) where we intend to invoke a custom JS function setupPasswordStrength(). <script type="text/javascript"> $(document).ready(function () { setupPasswordStrength("passwort", "pwdStrength"); }); </script> The JS function has two arguments: an Id of the password field and an Id of the progress bar. In the function, we will register a callback for the namespaced keyup event. In the callback, we will check the current input value by means of reg. expressions. We would like to take the following rules (rules are up to you):Password length is less than 8 characters or doesn’t contain at least one digit ==> weak password Password length is equal or greater than 8 characters, contains at least one digit, but doesn’t have at least one lower and one upper case letter OR one special char: ==> medium password Password length is equal or greater than 8 characters, contains at least one digit AND has at least one lower and one upper case letter OR one special char: ==> strong passwordThese are good rules I have often seen across the internet. Let me show the JS function. function setupPasswordStrength(pwdid, pbarid) { // reg. exp. for a weak password var weak = XRegExp("^(?=.*\\d{1,}).{8,}$"); // reg. exp. for a strong password var strong = XRegExp("^(?=.*[a-z])(?=.*[A-Z]).+|(?=.*[!,%,&,@,#,$,^,*,?,_,~,\\-]).+$");var $this = $("#" + pwdid); var pbar = $("#" + pbarid).find(".ui-progressbar-value");// visualize on keyup $this.off('keyup.' + pwdid).on('keyup.' + pwdid, function(e) { visualizePasswordStrength($(this).val(), pbar, weak, strong); });// fix chrome issue with autofill fields setTimeout(function(){$this.triggerHandler('keyup.' + pwdid);}, 150); }function visualizePasswordStrength(pwd, pbar, weak, strong) { var pparent = pbar.parent().parent().parent(); var weakMsg = pparent.find(".weakMsg"); var mediumMsg = pparent.find(".mediumMsg"); var strongMsg = pparent.find(".strongMsg");if (pwd == null || pwd.length < 1) { pbar.removeClass("weak medium strong"); weakMsg.hide(); mediumMsg.hide(); strongMsg.hide(); return; }if (!weak.test(pwd)) { // weak pbar.removeClass("medium strong").addClass("weak"); mediumMsg.hide(); strongMsg.hide(); weakMsg.show(); return; }if (!strong.test(pwd)) { // medium pbar.removeClass("weak strong").addClass("medium"); weakMsg.hide(); strongMsg.hide(); mediumMsg.show(); return; }// strong pbar.removeClass("weak medium").addClass("strong"); weakMsg.hide(); mediumMsg.hide(); strongMsg.show(); } In the function visualizePasswordStrength(), we remove and add style classes to the progress bar dependent on the password strength (when user is typing his password). They are: .weak { background-color: #F88E7D !important; border: 1px solid #F95D24 !important; width: 33.33% !important; }.medium { background-color: #FEE379 !important; border: 1px solid #EDB605 !important; width: 66.66% !important; }.strong { background-color: #81FF6C !important; border: 1px solid #05E428 !important; width: 101% !important; } The weak indicator reserves one-third of the progress bar’s length. The medium and strong indicators reserve respectively two-thirds and all available space. The styling of the progress bar looks as follows: .pwdStaerke.ui-progressbar { -moz-border-radius: 6px; -webkit-border-radius: 6px; border-radius: 6px; margin-top: 8px; height: 18px !important; border: solid 1px #c2c2c2 !important; }.pwdStaerke.ui-progressbar .ui-progressbar-value { display: block !important; margin-left: -2px !important; -moz-border-radius: 6px !important; -webkit-border-radius: 6px !important; border-radius: 6px !important; }Reference: How to implement a custom password strength indicator in JSF from our JCG partner Oleg Varaksin at the Thoughts on software development blog....

Turning recursive file system traversal into Stream

When I was learning programming, back in the days of Turbo Pascal, I managed to list files in directory usingFindFirst, FindNext and FindClose functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn’t know the term recursion back then, but it worked. Similar code in Java would look something like this:           public void printFilesRecursively(final File folder) { for (final File entry : listFilesIn(folder)) { if (entry.isDirectory()) { printFilesRecursively(entry); } else { System.out.println(entry.getAbsolutePath()); } } }private File[] listFilesIn(File folder) { final File[] files = folder.listFiles(); return files != null ? files : new File[]{}; }Didn’t know File.listFiles() can return null, did ya? That’s how it signals I/O errors, like if IOException never existed. But that’s not the point. System.out.println() is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system:Getting a complete list of all files for display purposes Looking for all files matching given pattern/property (also check out File.list(FilenameFilter)) Searching for one particular file Processing every single file, e.g. sending it over networkEvery use case above has a unique set of challenges. For example we don’t want to build a list of all files because it will take a significant amount of time and memory before we can start processing it. We would like to process files as they are discovered and lazily – by pipe-lining computation (but without clumsy visitor pattern). Also we want to short-circuit searching to avoid unnecessary I/O. Luckily in Java 8 some of these issues can be addressed with streams: final File home = new File(FileUtils.getUserDirectoryPath()); final Stream<Path> files = Files.list(home.toPath()); files.forEach(System.out::println);Remember that Files.list(Path) (new in Java 8) does not look into subdirectories – we’ll fix that later. The most important lesson here is: Files.list() returns a Stream<Path> – a value that we can pass around, compose, map, filter, etc. It’s extremely flexible, e.g. it’s fairly simple to count how many files I have in a directory per extension: import org.apache.commons.io.FilenameUtils;//...final File home = new File(FileUtils.getUserDirectoryPath()); final Stream<Path> files = Files.list(home.toPath()); final Map<String, List<Path>> byExtension = files .filter(path -> !path.toFile().isDirectory()) .collect(groupingBy(path -> getExt(path)));byExtension. forEach((extension, matchingFiles) -> System.out.println( extension + "\t" + matchingFiles.size()));//...private String getExt(Path path) { return FilenameUtils.getExtension(path.toString()).toLowerCase(); }OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying “flatMap that shit” is applicable here as well, check out this recursive Java 8 code: //WARNING: doesn't compile, yet:private static Stream<Path> filesInDir(Path dir) { return Files.list(dir) .flatMap(path -> path.toFile().isDirectory() ? filesInDir(path) : singletonList(path).stream()); }Stream<Path> lazily produced by filesInDir() contains all files within directory including subdirectories. You can use it as any other stream by calling map(), filter(), anyMatch(), findFirst(), etc. But how does it really work?flatMap() is similar to map() but while map() is a straightforward 1:1 transformation, flatMap() allows replacing single entry in input Stream with multiple entries. If we had used map(), we would have end up withStream<Stream<Path>> (or maybe Stream<List<Path>>). But flatMap() flattens this structure, in a way exploding inner entries. Let’s see a simple example. Imagine Files.list() returned two files and one directory. For filesflatMap() receives a one-element stream with that file. We can’t simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir()recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream. Code above is short, sweet and… doesn’t compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity: public static Stream<Path> filesInDir(Path dir) { return listFiles(dir) .flatMap(path -> path.toFile().isDirectory() ? filesInDir(path) : singletonList(path).stream()); }private static Stream<Path> listFiles(Path dir) { try { return Files.list(dir); } catch (IOException e) { throw Throwables.propagate(e); } }Unfortunately this quite elegant code is not lazy enough. flatMap() evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.Reference: Turning recursive file system traversal into Stream from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog....

10 things you can do as a developer to make your app secure: #9 Start with Requirements

To build a secure system, you should start thinking about security from the beginning. Legal and Compliance Constraints First, make sure that everyone on the team understands the legal and compliance requirements and constraints for the system. Regulations will drive many of the security controls in your system, including authentication, access control, data confidentiality and integrity (and encryption), and auditing, as well as system availability and reliability.   Agile teams in particular should not depend only on their Product Owner to understand and communicate these requirements. Compliance restrictions can impose important design constraints which may not be clear from a business perspective, as well as assurance requirements that dictate how you need to build and test and deliver the system, and what evidence you need to show that you have done a responsible job. As developers you should try to understand what all of this means to you as early as possible. As a place to start, Microsoft has a useful and simple guide (Regulatory Compliance Demystified: An Introduction to Compliance for Developers) that explains common business regulations including SOX, HIPAA, and PCI-DSS and what they mean to developers. Tracking Confidential Data The fundamental concern in most regulatory frameworks is controlling and protecting data. Make sure that everyone understands what data is private/confidential/sensitive and therefore needs to be protected. Identify and track this data throughout the system. Who owns the data? What is the chain of custody? Where is the data coming from? Can the source be trusted? Where is the data going to? Can you trust the destination to protect the data? Where is the data stored or displayed? Does it have to be stored or displayed? Who is authorized to create it, see it, change it, and do these actions need to be tracked and reviewed? The answers to these questions will drive requirements for data validation, data integrity, access control, encryption, and auditing and logging controls in the system. Application Security Controls Think through the basic functional application security controls: Authentication, Access Control, Auditing – all of which we’ve covered earlier in this series of posts. Where do these controls need to be added? What security stories need to be written? How will these controls be tested? Business Logic Abuse Can be Abused Security also needs to be considered in business logic, especially multi-step application workflows dealing with money or other valuable items, or that handle private or sensitive information, or command and control functions. Features like online shopping carts, online banking account transactions, user password recovery, bidding in online auctions, or online trading and root admin functions are all potential targets for attack. The user stories or use cases for these features should include exceptions and failure scenarios (what happens if a step or check fails or times out, or if the user tries to cancel or repeat or bypass a step?) and requirements derived from “abuse cases” or “misuse cases”. Abuse cases explore how the application’s checks and controls could be subverted by attackers or how the functions could be gamed, looking for common business logic errors including time of check/time of use and other race conditions and timing issues, insufficient entropy in keys or addresses, information leaks, failure to prevent brute forcing, failing to enforce workflow sequencing and approvals, and basic mistakes in input data validation and error/exception handling and limits checking. This isn’t defence-against-the-dark-arts black hat magic, but getting this stuff wrong can be extremely damaging. For some interesting examples of how bad guys can exploit small and often just plain stupid mistakes in application logic, read Jeremiah Grossman’s classic paper “Seven Business Logic Flaws that put your Website at Risk”. Make time to walk through important abuse cases when you’re writing up stories or functional requirements, and make sure to review this code carefully and include extra manual testing (especially exploratory testing) as well as pen testing of these features to catch serious business logic problems. We’re close to the finish line. The final post in this series is coming up: Design and Architect Security In.Reference: 10 things you can do as a developer to make your app secure: #9 Start with Requirements from our JCG partner Jim Bird at the Building Real Software blog....

Writing Tests for Data Access Code – Unit Tests Are Waste

A few years ago I was one of those developers who write unit tests for my data access code. I was testing everything in isolation, and I was pretty pleased with myself. I honestly thought that I was doing a good job. Oh boy was I wrong! This blog post describes why we shouldn’t write unit tests for our data access code and explains why we should replace unit tests with integration tests. Let’s get started. Unit Tests Answers to the Wrong Question We write tests for our data access code because we want to know that it works as expected. In other words, we want to find the answers to these questions:  Is the correct data stored to the used database? Does our database query return the correct data?Can unit tests help us to find the answers we seek? Well, one of the most fundamental rules of unit testing is that unit tests shouldn’t use external systems such as a database. This rule isn’t a good fit for the situation at hand because the responsibility of storing correct information and returning correct query results is divided between our data access code and the used database. For example, when our application executes a single database query, the responsibility is divided as follows:The data access code responsible of creating the executed database query. The database is responsible of executing the database query and returning the query results back to the data access code.The thing is that if we isolate our data access code from the database, we can test that our data access code creates the “correct” query, but we cannot ensure that the created query returns the correct query results. That is why unit tests cannot help us to find the answers we seek. A Cautionary Tale: Mocks Are Part of the Problem There was a time when I wrote unit tests for my data access code. At the time I had two rules:Every piece of code must be tested in isolation. Let’s use mocks.I was working in a project which used Spring Data JPA, and dynamic queries were built by using JPA criteria queries. If you aren’t familiar with Spring Data JPA, you might want to read the fourth part of my Spring Data JPA tutorial which explains how you can create JPA criteria queries with Spring Data JPA. Anyway, I created a specification builder class which builds Specification<Person> objects. After I had created a Specification<Person> object, I passed it forward to my Spring Data JPA repository which executed the query and returns the query results. The source code of the specification builder class looks as follows: import org.springframework.data.jpa.domain.Specification; import javax.persistence.criteria.CriteriaBuilder; import javax.persistence.criteria.CriteriaQuery; import javax.persistence.criteria.Predicate; import javax.persistence.criteria.Root; public class PersonSpecifications { public static Specification<Person> lastNameIsLike(final String searchTerm) { return new Specification<Person>() { @Override public Predicate toPredicate(Root<Person> personRoot, CriteriaQuery<?> query, CriteriaBuilder cb) { String likePattern = getLikePattern(searchTerm); return cb.like(cb.lower(personRoot.<String>get(Person_.lastName)), likePattern); } private String getLikePattern(final String searchTerm) { return searchTerm.toLowerCase() + "%"; } }; } } Let’s take a look at the test code which “verifies” that the specification builder class creates “the correct” query. Remember that I wrote this test class by following my own rules which means that the result should be great. The source code of the PersonSpecificationsTest class looks as follows: import org.junit.Before; import org.junit.Test; import org.springframework.data.jpa.domain.Specification; import javax.persistence.criteria.*; import static junit.framework.Assert.assertEquals; import static org.mockito.Mockito.*; public class PersonSpecificationsTest { private static final String SEARCH_TERM = "Foo"; private static final String SEARCH_TERM_LIKE_PATTERN = "foo%"; private CriteriaBuilder criteriaBuilderMock; private CriteriaQuery criteriaQueryMock; private Root<Person> personRootMock; @Before public void setUp() { criteriaBuilderMock = mock(CriteriaBuilder.class); criteriaQueryMock = mock(CriteriaQuery.class); personRootMock = mock(Root.class); } @Test public void lastNameIsLike() { Path lastNamePathMock = mock(Path.class); when(personRootMock.get(Person_.lastName)).thenReturn(lastNamePathMock); Expression lastNameToLowerExpressionMock = mock(Expression.class); when(criteriaBuilderMock.lower(lastNamePathMock)).thenReturn(lastNameToLowerExpressionMock); Predicate lastNameIsLikePredicateMock = mock(Predicate.class); when(criteriaBuilderMock.like(lastNameToLowerExpressionMock, SEARCH_TERM_LIKE_PATTERN)).thenReturn(lastNameIsLikePredicateMock); Specification<Person> actual = PersonSpecifications.lastNameIsLike(SEARCH_TERM); Predicate actualPredicate = actual.toPredicate(personRootMock, criteriaQueryMock, criteriaBuilderMock); verify(personRootMock, times(1)).get(Person_.lastName); verifyNoMoreInteractions(personRootMock); verify(criteriaBuilderMock, times(1)).lower(lastNamePathMock); verify(criteriaBuilderMock, times(1)).like(lastNameToLowerExpressionMock, SEARCH_TERM_LIKE_PATTERN); verifyNoMoreInteractions(criteriaBuilderMock); verifyZeroInteractions(criteriaQueryMock, lastNamePathMock, lastNameIsLikePredicateMock); assertEquals(lastNameIsLikePredicateMock, actualPredicate); } } Does this make any sense? NO! I have to admit that this test is a piece of shit which has no value to anyone, and it should be deleted as soon as possible. This test has three major problems:It doesn’t help us to ensure that the database query returns the correct results. It is hard to read and to make matters worse, it describes how the query is build but it doesn’t describe what it should return. Tests like this are hard to write and maintain.The truth is that this unit test is a textbook example of a test that should have never been written. It has no value to us, but we still have to maintain it. Thus, it is waste! And yet, this is what happens if we write unit tests for our data access code. We end up with a test suite which doesn’t test the right things. Data Access Tests Done Right I am a big fan of unit testing but there are situations when it is not the best tool for the job. This is one of those situations. Data access code has a very strong relationship with the used data storage. That relationship is so strong that the data access code itself isn’t useful without the data storage. That is why it makes no sense to isolate our data access code from the used data storage. The solution to this problem is simple. If we want to write comprehensive tests for our data access code, we must test our data access code together with the used data storage. This means that we have to forget unit tests and start writing integration tests. We must understand that only integration tests can verify thatOur data access code creates the correct database queries. Our database returns the correct query results.If you want to know how you can write integration tests for Spring powered repositories, you should read my blog post titled Spring Data JPA Tutorial: Integration Testing. It describes how you can write integration tests for Spring Data JPA repositories. However, you can use the same technique when you are writing integration tests for any repository which uses a relational database. For example, the integration test written to test the example application of my Using jOOQ with Spring tutorial use the technique described in that blog post. Summary This blog post has taught us two things:We learned that unit tests cannot help us to verify that our data access code working properly because we cannot ensure that the correct data is inserted to our data storage or that our queries return the correct results. We learned that we should test our data access code by using integration tests because the relationship between our data access code and the used data storage is so tight that it makes no sense to separate them.There is only one question left: Are you still writing unit tests for your data access code?Reference: Writing Tests for Data Access Code – Unit Tests Are Waste from our JCG partner Petri Kainulainen at the Petri Kainulainen blog....

Lessons Learned in Automated Testing

I want to discuss some takeaways from my role as a Quality Assurance (QA) Software Developer. My experiences in QA were two-fold: I started as a Software Engineer responsible for QA on a Scrum team and I later had a role as a QA Engineer responsible for implementing automated testing infrastructure. In the latter position, I made sure teams had no issues that blocked them from writing automated tests. These two roles gave me a lot of insight into the challenges that teams face when testing code with automation. Below I share a few of the common challenges Agile teams face when trying to write automated tests and ways that I believe these challenges can be approached.   Challenge 0 – No time. When deadlines loomed in a sprint and the burn-down chart stayed flat, one of the first tasks that suffered was the quality and completeness of the automated tests. This problem was further exacerbated when product owners and the Scrum Master accepted incomplete automation because doing more automated tests would cause a User Story to be carried over. Solution 0 – Budget time for automated testing. A team must include the writing and implementation of automated tests in their development schedules. It is also critical that automation is not seen as something that can be dumped halfway through a sprint when schedules get tight. Every stakeholder must be committed to utilizing automated tests and see them as an essential part of feature completeness. Challenge 1 – Who is going to fix the broken tests? The great part about having a lot of code coverage with automated tests is that when changes are made the automated tests will fail. However, when teams add new features that break tests there can be frustration around who is responsible for fixing the automated tests. Solution 1 – It is your job to fix your tests and your job to not break others’ tests. In a perfect world, your team’s automated tests are well-written and specific enough that a change in another team’s code should not break your tests. Even well-written tests can not always be insulated from changes because a test could simply have a dependency on a specification that has changed. A frequent issue I came across was that programmers would become frustrated when they felt that “Team X” was always breaking their tests. I think the best way to resolve these situations is through better communication between and among teams, as well as more responsibility on behalf of individual teams to act when their changes cause cascading issues. Challenge 2 – Noise in the test environments. When an automated test run occurs with tests constantly (or randomly) failing, developers quickly learn to ignore the results. Understandably, it can be time consuming to look at every failure if you have many tests and test runs. When developers make a habit of ignoring noisy test failures they can often miss actual bugs in the code. Solution 2 – Get rid of the noise. There are two common problems that need to be addressed when removing noise. The first is to assess and solve infrastructure issues that cause failures (when the failures are not the tests’ fault). As part of a test infrastructure team, I had to resolve a number of situations where tests would interact with each other poorly or tests would make incorrect assumptions about the state of the test environment. Secondly, it is important to address the behavior of a team when they have failing tests. Teams should addresses a failing test with a “stop the line” mentality. The easiest way for a product owner to enforce this desired behavior is by not accepting User Stories from a team when they have failing tests. While this is often seen as a harsh policy, I believe it is an effective way to re-enforce good behavior in response to failing tests. Challenge 3 – This test case is impossible to automate. I would often hear that a particular functional area of the code base was impossible to automate. Solution 3 – Fix the code. Rarely is it true that a feature is impossible to automate, but sometimes implementing automated tests is inordinately difficult. As much as I crusade for complete automation coverage, I also understand the realities of software development. For example, in some cases, the time it takes to test a particular feature can outweigh (by a large margin) the value of those tests. I do think it is important that teams look carefully at areas of code that seem too difficult to write automated tests for and ask the following questions: are there any tools or testing infrastructure changes that could help automate tests?; can I re-factor the code to make automated testing easier?; in the future, how can I avoid writing code that is so difficult to automate? Challenge 4 – The User Interface (UI) tests are always broken. As I addressed in Challenge 2, constantly failing tests can cause apathy to grow among developers. Tests that frequently fail are automated UI tests and, in my experience, are commonly fragile and often not specific enough to qualify as good automated tests. Moreover, many developers lack expertise with UI automation tools. Solution 4 – More training and sparing use of automated UI tests. The Agile test pyramid tells us that automated UI testing should be used sparingly. Limiting automated UI tests is the simplest way avoid the frustration they cause. This requires that teams be disciplined in other areas of automation testing so that very little testing of the UI is required. It is also important that developers are well trained in the creation of the automated UI tests otherwise they may choose to avoid creating them even when they are appropriate. In Conclusion I hope these lessons provide some insights into your current issues with automated testing and that they help provide direction as your teams increase the use of automated testing. In the comments, please share the most common challenges you have faced with automated testing and how you solved them.Reference: Lessons Learned in Automated Testing from our JCG partner Josh Robinson at the Keyhole Software blog....

How To Test Your Tests

When we write tests, we focus on the scenario we want to test, and then write that test. Pretty simple, right? That’s how our minds work. We can’t focus on many things at the same time. TDD acknowledges that and its incremental nature is built around it. TDD or not, when we have a passing test, we should do an evaluation. Start with this table:Property DescriptionValidity Does it test a valid scenario? Is this scenario always valid?Readability Of course I understand the test now, but will someone else understand the test a year from now?Speed How quickly does it run? Will it slow down an entire suite?Accuracy When it fails, can I easily find the problem is in the code, or do I need to debug?Differentiation How is this case different than its brothers? Can I understand just by looking at the tests?Maintenance How much work will I need to do around this test when requirements change? How fragile is it?Footprint Does the test clean after itself? Or does it leave files, registry handles, threads, or a memory blob that can affect other tests?Robustness How easy it is to break this test? What kind of variation are we permitting, and is that variation allowed?Deterministic Does this test have dependencies (the computer clock, CPU, files, data) that can alter its result based on when or where it runs?Isolation Does the test rely on a state that was not specified explicitly in it? If not, will the implicit state always be true?If something’s not up to your standards (I’m assuming you’re a high standard professional) fix it. Now, I hear you asking: Do all this for every test? Let’s put it this way: If the test fails the evaluation, there’s going to be work later to fix it. When would you rather do it – now, when the test is fresh in your head, or later, when you have to dive in again, into code that you haven’t seen in 6 months, instead of working on the new exciting feature you want to work on? It’s testing economics 101. Do it now.Reference: How To Test Your Tests from our JCG partner Gil Zilberfeld at the Geek Out of Water blog....

Do Software Developers Really Need Degrees?

When I first started out my career as a software developer, I didn’t have a degree. I took my first real job when I was on summer break from my first year of college. By the time the summer was up and it was time to enroll back in school, I found that the salary I was making from that summer job was about what I had expected to make when I graduated college—only I didn’t have any debt at this point—so, I dropped out and kept the job. But, did I make the right choice? Do you really need a university degree to be a computer programmer? The difference between education and school Just because you have a college degree doesn’t mean you have learned anything. That is the main problem I have with most traditional education programs today. School has become much more about getting a degree – a piece of paper – than it has about actually learning something of value. To some extent, I am preaching to the choir. If you have a degree that you worked hard for and paid a large amount of money for, you are more inclined to believe that piece of paper has more value than it really does.If you don’t have a degree, you are probably more inclined to believe that degrees are worthless and completely unnecessary—even though you may secretly wish you had one. So, whatever side you fall on, I am going to ask you to momentarily suspend your beliefs — well, biases really — and consider that both views are not exactly correct, that there is a middle-ground somewhere in between the two viewpoints where a degree isn’t necessarily worthless and it isn’t necessarily valuable either. You see, the issue is not really whether or not a particular degree has any value. The degree itself represents nothing but a cost paid and time committed. A degree can be acquired by many different methods, none of which guarantee any real learning has taken place. If you’ve ever taken a college course, you know that it is more than possible to pass that course without actually learning much at all. Now, don’t get me wrong, I’m not saying that you can’t learn anything in college. I’m not saying that every degree that is handed out is a fraud. I’m simply saying that the degree itself does not prove much; there is a difference between going to school and completing a degree program and actually learning something. Learning is not just memorizing facts. True learning is about understanding. You can memorize your multiplication tables and not understand what they mean. With that knowledge, you can multiply any two numbers that you have memorized the answer for, but you would lack the ability to multiply any numbers that you don’t already have a memorized answer for. If you understand multiplication, even without knowing any multiplication tables, you can figure out how to work out the answer to any multiplication problem — even if it takes you a while. You can be highly educated without a degree Traditional education systems are not the only way to learn things. You don’t have to go to school and get a degree in order to become educated. Fifty years ago, this probably wasn’t the case — although I can’t say for sure, since I wasn’t alive back then. Fifty years ago we didn’t have information at our fingertips. We didn’t have all the resources we have today that make education, on just about any topic, so accessible. A computer science degree is merely a collection of formalized curriculum. It is not magic. There is no reason a person couldn’t save the money and a large degree of the time required to get a computer science degree from an educational institution by learning the exact same information on their own. Professors are not gifted beings who impart knowledge and wisdom on students simply by being in the same room with them. Sure, it may be easier to obtain an education by having someone spoon-feed it to you, but you do not need a teacher to learn. You can become your own teacher. In fact, today there are a large number of online resources where you can get the equivalent of a degree, for free – or at least very cheap.Coursera edX Khan Academy MIT Open Courseware Udemy Pluralsight (I have courses here)Even if you have a degree, self-education is something you shouldn’t ignore—especially when it’s practically free. You can also find many great computer science textbooks online. For example, one the best ones is: Structure and Interpretation of Computer Programs – 2nd Edition (MIT Electrical Engineering and Computer Science) So, is there any real benefit to having a degree? My answer may surprise you, but, yes right now I think there is. I told you that I had forgone continuing my education in order to keep my job, but what I didn’t tell you is that I went back and got my degree later. Now, I didn’t go back to college and quit my job, but I did think there was enough value in having an actual computer science degree that I decided to enroll in an online degree program and get my degree while keeping my job.Why did I go back and get my degree? Well, it had nothing to do with education. By that point, I knew that anything I wanted or needed to learn, I could learn myself. I didn’t really need a degree. I already had a good paying job and plenty of work experience. But, I realized that there would be a significant number of opportunities that I might be missing out on if I didn’t go through the formal process of getting that piece of paper. The reality of the situation is even though you and I may both know that degrees don’t necessarily mean anything, not everyone holds the same opinion. You may be able to do your job and you may know your craft better than someone who has a degree, but sometimes that piece of paper is going to make the difference between getting a job or not and is going to have an influence on how high you can raise in a corporate environment. We can’t simply go by our own values and expect the world to go along with them. We have to realize that some people are going to place a high value on having a degree—whether you actually learned anything while getting one or not. But, at the same time, I believe you can get by perfectly well without one – you’ll just have a few less opportunities – a few more doors that are closed to you. For a software developer, the most important thing is the ability to write code. If you can demonstrate that ability, most employers will hire you—at least it has been my experience that this is the case. I have the unique situation of being on both sides of the fence. I’ve tried to get jobs when I didn’t have a degree and I’ve tried to get jobs when I did have a degree. I’ve found that in both cases, the degree was not nearly as important as being able to prove that I could actually write good code and solve problems. So, I know it isn’t necessary to have a degree, but it doesn’t hurt either. What should you do if you are starting out? If I were starting out today, here is what I would do: I would plan to get my degree as cheaply as possible and to either work the whole time or, better yet, create my own product or company during that time. I’d try and get my first two years of school at a community college where the tuition is extremely cheap. During that time, I’d try to gain actual work experience either at a real job or developing my own software. Once the two-year degree was complete, then I’d enroll in a university, hopefully getting scholarships that would pay for most of my tuition. I would also avoid taking on any student debt. I would make sure that I was making enough money outside of school to be able to afford the tuition. I realize this isn’t always possible, but I’d try to minimize that debt as much as possible. What you absolutely don’t want to do is to start working four year later than you could be and have a huge debt to go with it. Chances are, the small amount of extra salary your degree might afford you will not make up for the sacrifice of losing four years of work experience and pay and going deeply into debt. Don’t make that mistake. The other route I’d consider is to completely get your education online – ignoring traditional school completely. Tuition prices are constantly rising and the value of a traditional degree is constantly decreasing – especially in the field of software development. If you go this route, you need to have quite a bit of self-motivation and self-discipline. You need to be willing to create your own education plan and to start building your own software that will prove that you know what you are doing. The biggest problem you’ll face without a degree is getting that first job. It is difficult to get a job with no experience, but it is even more difficult when you don’t have a degree. What you need is a portfolio of work that shows that you can actually write code and develop software. I’d even recommend creating your own company and creating at least one software product that you sell through that company. You can put that experience down on your resume and essentially create your own first job. (A mobile app is a great product for a beginning developer to create.) What if you are already an experienced developer? Should you go back and get your degree now? It really depends on your goals. If you are planning on climbing the corporate ladder, then yes. In a corporate environment, you are very likely to hit a premature glass-ceiling if you don’t have a degree. That is just how the corporate world works. Plus, many corporations will help pay for your degree, so why not take advantage of that. If you just want to be a software developer and write code, then perhaps not. It might not be worth the investment, unless you can do it for very cheaply—and even then the time investment might not be worth it. You really have to weigh how much you think you’ll be able to earn extra versus how much the degree will cost you. You might be better off self-educating yourself to improve your skills than you would going back to school to get a traditional degree.Reference: Do Software Developers Really Need Degrees? from our JCG partner John Sonmez at the Making the Complex Simple blog....

Mapping your Entities to DTO’s Using Java 8 Lambda expressions

We all facing the cluttered overhead code when we need to convert our DTO’S to Entities(Hibernate Entities, etc..) and backwards. In my example ill demonstrate how the code is getting much shorter with Java 8. Let’s create the Target DTO:             public class ActiveUserListDTO {public ActiveUserListDTO() { }public ActiveUserListDTO(UserEntity userEntity) {this.username = userEntity.getUsername();... } } A simple find method to retrieve all entities using Spring data JPA API: userRepository.findAll();Problem:Find.All() method signature (like many others) returns java.lang.Iterable<T> 1java.lang.Iterable<T> findAll(java.lang.Iterable<ID> iterable) We can’t make a Stream out of java.lang.Iterable(* Streams working on collections. Every Collection is Iterable but not every Iterable is necessary a collection). So how do we get a Stream object in order to get Java8 Lambda’s Power? Let’s use StreamSupport object to convert Iterable into Stream: Stream<UserEntity> userEntityStream = StreamSupport.stream(userRepository.findAll().spliterator(), false); Great. Now we’ve got Stream in our hands which is the key to our Java 8 labmda’s! What’s left is to map and collect: List<ActiveUserList> activeUserListDTOs = userEntities.stream().map(ActiveUserList::new).collect(Collectors.toList()); I am using Java 8 Method Reference and therefor initiating (and mapping) each entity into dto. So let’s make one short line out of everything: List<ActiveUserList> activeUserListDTOs=StreamSupport.stream(userRepository.findAll().spliterator(), false).map(ActiveUserList::new).collect(Collectors.toList()); That’s neat!! Idan. Related Articles:Auditing infrastructure for your app using Spring AOP, Custom annotations and Reflection AmazonSQS and Spring for messaging queue Authentication and Authorization service as an open source solution Invoking Async method call using Future object Using Spring IntegrationReference: Mapping your Entities to DTO’s Using Java 8 Lambda expressions from our JCG partner Idan Fridman at the IdanFridman.com blog....

Use Cases for Elasticsearch: Document Store

I’ll be giving an introductory talk about Elasticsearch twice in July, first at Developer Week Nürnberg, then at Java Forum Stuttgart. I am showing some of the features of Elasticsearch by looking at certain use cases. To prepare for the talks I will try to describe each of the use cases in a blog post as well. When it comes to Elasticsearch the first thing to look at often is the search part. But in this post I would like to start with its capabilities as a distributed document store. Getting Started Before we start we need to install Elasticsearch which fortunately is very easy. You can just download the archive, unpack it and use a script to start it. As it is a Java based application you of course need to have a Java runtime installed. # download archive wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.zip # zip is for windows and linux unzip elasticsearch-1.2.1.zip # on windows: elasticsearch.bat elasticsearch-1.2.1/bin/elasticsearch Elasticsearch can be talked to using HTTP and JSON.When looking around at examples you will often see curl being used because it is widely available. (See this post on querying Elasticsearch using plugins for alternatives). To see if it is up and running you can issue a GET request on port 9200: curl -XGET http://localhost:9200. If everything is set up correctly Elasticsearch will respond with something like this: { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } Storing Documents When I say document this means two things. First, Elasticsearch stores JSON documents and even uses JSON internally a lot. This is an example of a simple document that describes talks for conferences. { "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-17T15:35:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Java Forum Stuttgart", "city" : "Stuttgart" } } There are fields and values, arrays and nested documents. Each of those features is supported by Elasticsearch. Besides the JSON documents that are used for storing data in Elasticsearch, document refers to the underlying library Lucene, that is used to persist the data and handles data as documents consisting of fields. So this is a perfect match: Elasticsearch uses JSON, which is very popular and supported from lots of technologies. But the underlying data structures also use documents. When indexing a document we can issue a post request to a certain URL. The body of the request contains the document to be stored, the file we are passing contains the content we have seen above. curl -XPOST http://localhost:9200/conferences/talk/ --data-binary @talk-example-jfs.json When started Elasticsearch listens on port 9200 by default. For storing information we need to provide some additional information in the URL. The first segment after the port is the index name. An index name is a logical grouping of documents. If you want to compare it to the relational world this can be thought of as the database. The next segment that needs to be provided is the type. A type can describe the structure of the doucments that are stored in it. You can again compare this to the relational world, this could be a table, but that is only slightly correct. Documents of any kind can be stored in Elasticsearch, that is why it is often called schema free. We will look at this behaviour in the next post where you will see that schema free isn’t the most appropriate term for it. For now it is enough to know that you can store documents with completely different structure in Elasticsearch. This also means you can evolve your documents and add new fields as appropriate. Note that neither index nor type need to exist when starting indexing documents. They will be created automatically, one of the many features that makes it so easy to start with Elasticsearch. When you are storing a document in Elasticsearch it will automatically generate an id for you that is also returned in the result. { "_index":"conferences", "_type":"talk", "_id":"GqjY7l8sTxa3jLaFx67_aw", "_version":1, "created":true } In case you want to determine the id yourself you can also use a PUT on the same URL we have seen above plus the id. I don’t want to get into trouble by calling this RESTful but did you notice that Elasticsearch makes good use of the HTTP verbs? Either way how you stored the document you can always retrieve it by specifying the index, type and id. curl -XGET http://localhost:9200/conferences/talk/GqjY7l8sTxa3jLaFx67_aw?pretty=true which will respond with something like this: { "_index" : "conferences", [...] "_source":{ "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-17T15:35:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Java Forum Stuttgart", "city" : "Stuttgart" } } } You can see that the source in the response contains exactly the document we have indexed before. Distributed Storage So far we have seen how Elasticsearch stores and retrieves documents and we have learned that you can evolve the schema of your documents. The huge benefit we haven’t touched so far is that it is distributed. Each index can be split into several shards that can then be distributed across several machines. To see the distributed nature in action fortunately we don’t need several machines. First, let’s see the state of our currently running instance in the plugin elasticsearch-kopf (See this post on details how to install and use it):  On the left you can see that there is one machine running. The row on top shows that it contains our index conferences. Even though we didn’t explicitly tell Elasticsearch it created 5 shards for our index that are currently all on the instance we started. As each of the shards is a Lucene index in itself even if you are running your index on one instance the documents you are storing are already distributed across several Lucene indexes. We can now use the same installation to start another node. After a short time we should see the instance in the dashboard as well.   As the new node joins the cluster (which by default happens automatically) Elasticsearch will automatically copy the shards to the new node. This is because by default it not only uses 5 shards but also 1 replica, which is a copy of a shard. Replicas are always placed on different nodes than their shards and are used for distributing the load and for fault tolerance. If one of the nodes crashes the data is still available on the other node. Now, if we start another node something else will happen. Elasticsearch will rebalance the shards. It will copy and move shards to the new node so that the shards are distributed evenly across the machines.    Once defined when creating an index the number of shards can’t be changed. That’s why you normally overallocate (create more shards than you need right now) or if your data allows it you can create time based indices. Just be aware that sharding comes with some cost and think carefully about what you need. Designing your distribution setup can still be difficult even with Elasticsearch does a lot for you out of the box. Conclusion In this post we have seen how easy it is to store and retrieve documents using Elasticsearch. JSON and HTTP are technologies that are available in lots of programming environments. The schema of your documents can be evolved as your requirements change. Elasticsearch distributes the data by default and lets you scale across several machines so it is suited well even for very large data sets. Though using Elasticsearch as a document store is a real use case it is hard to find users that are only using it that way. Nobody retrieves the documents only by id as we have seen in this post but uses the rich query facilities we will look at next week. Nevertheless you can read about how Hipchat uses Elasticsearch to store billions of messages and how Engagor uses Elasticsearch here and here. Both of them are using Elasticsearch as their primary storage. Though it sounds more drastic than it probably is: If you are considering using Elasticsearch as your primary storage you should also read this analysis of Elasticsearchs behaviour in case of network partitions. Next week we will be looking at using Elasticsearch for something obvious: Search.Reference: Use Cases for Elasticsearch: Document Store from our JCG partner Florian Hopf at the Dev Time blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: