Featured FREE Whitepapers

What's New Here?


Improve your Feedbackloop with Continuous Testing

Have you ever though about what the most valueable thing in software development was for you? And im not talking about things that value for you personally, but for the success of the development itself. Well i have thought about it, and for me it was Feedback – in any form. It is so important, because it enables steering. Development practices are made for the purpose of better feedback. TDD, Continuous Integration, Iterations, to name only a view. Many of the agile methods and XP are basically about better feedback. It begins with customer interaction. I need as much feedback as possible, as frequent as possible. If i don’t get feedback, i’m likely to get on the wrong track, resulting in a product that is not going to be used, because it’s not what the customer needed. The more feedback i get, the better the outcome will be. If i get feedback rarely, i am unable to steer. I’m forced to make assumptions which are likely just obstacles. The quicker i get feedback, the faster i can deliver value to my customer. Feedback allows me to steer. Feedback is as important for programming. I want it early and often. If i write hundreds of lines of code without running them, they will most likely result in a very long and painful debugging session, and a lot of changes. I don’t want that, so i take baby steps. They make me go safer, faster and happier. There are two phases that define my programming feedback loop.The phase where i write code. Lets call it alpha, like so ‘α’. The phase where i evaluate my code, and eventually fix errors. Lets call it beta, like so ‘β’.You could also see those phases as modes. It is important to understand here, that these phases have nothing todo with the alpha/beta definition of a software cycle. I just invented them to describe my programming feedbackloop.In the following graphics you’ll notice that the lines get shorter and shorter by example which is intentional and should point out how i got faster using new strategies. When i first started coding, i did not write any tests. I wrote lots of code before i tried it out manually. Obviously it didn’t work when i first ran it. I ended up in rather long α-phases, where i just wrote code, and also long β-phases, where i evaluated (got my feedback), and fixed it. Like this:  I was very slow back then. I soon started with an interpreted language, which was very cool because i could run the scripts immediately. No need to compile or anything. Just write and run. It shortened my feedback loop, and i became faster overall:  Sooner or later i eventually started tdd. And regardless of the language that i was using, interpreted or not, it again shortened my feedback loop and made me go faster. The loop was shortened to a single ‘unit’, which is obviously smaller than anything manually executable. It allowed me to evaluate small behaviours, long before the program was even runnable. It is important to understand, that the α-phase in the following graphic contains both writing tests and implementation. The β-phase is much shorter, since unittests run very fast.  I thought this was just great, and it could not get any better. Wrong i was!! Later, i tried something that made me go like this:  “What the…?” You might ask. No, i did not break space-time. The thing i tried was Continuous Testing. Which basically means, that i do tdd, but i don’t run my tests by pressing a button and then wait. I just have them run all the time in the background automatically…Everytime i change my code, my tests immediately run automatically, and show me “OK” or “NOT OK” on a small icon on my screen. Since the tests only take about a second to run, this feedback is instant. And since my IDE saves my files onchange automatically, i do not have to press ctrl+s or anything. I just code…and as i code my files get saved….and as the files get saved my tests get run…fluently, immediately. This is HUGE. I am now progressing without disruption. I completely broke out of the phases, or if you want to call them ‘modes’. I love it. I have used infinitest for this. It is a Continuous Testing plugin for Intellij/Eclipse. If you are doing javascript, i can also recommend grunt for Continuous Testing.Reference: Improve your Feedbackloop with Continuous Testing from our JCG partner Gregor Riegler at the Be a better Developer blog....

Java 8 Friday: More Functional Relational Transformation

In the past, we’ve been providing you with a new article every Friday about what’s new in Java 8. It has been a very exciting blog series, but we would like to focus again more on our core content, which is Java and SQL. We will still be occasionally blogging about Java 8, but no longer every Friday (as some of you have already notice). In this last, short post of the Java 8 Friday series, we’d like to re-iterate the fact that we believe that the future belongs to functional relational data transformation (as opposed to ORM). We’ve spent about 20 years now using the object-oriented software development paradigm. Many of us have been very dogmatic about it. In the last 10 years, however, a “new” paradigm has started to get increasing traction in programming communities: Functional programming. Functional programming is not that new, however. Lisp has been a very early functional programming language. XSLT and SQL are also somewhat functional (and declarative!). As we’re big fans of SQL’s functional (and declarative!) nature, we’re quite excited about the fact that we now have sophisticated tools in Java to transform tabular data that has been extracted from SQL databases. Streams! SQL ResultSets are very similar to Streams As we’ve pointed out before, JDBC ResultSets and Java 8 Streams are quite similar. This is even more true when you’re using jOOQ, which replaces the JDBC ResultSet by an org.jooq.Result, which extends java.util.List, and thus automatically inherits all Streams functionality. Consider the following query that allows fetching a one-to-many relationship between BOOK and AUTHOR records: Map<Record2<String, String>, List<Record2<Integer, String>>> booksByAuthor =// This work is performed in the database // -------------------------------------- ctx.select( BOOK.ID, BOOK.TITLE, AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME ) .from(BOOK) .join(AUTHOR) .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID)) .orderBy(BOOK.ID) .fetch()// This work is performed in Java memory // ------------------------------------- .stream()// Group BOOKs by AUTHOR .collect(groupingBy(// This is the grouping key r -> r.into(AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME),// This is the target data structure LinkedHashMap::new,// This is the value to be produced for each // group: A list of BOOK mapping( r -> r.into(BOOK.ID, BOOK.TITLE), toList() ) )); The fluency of the Java 8 Streams API is very idiomatic to someone who has been used to writing SQL with jOOQ. Obviously, you can also use something other than jOOQ, e.g. Spring’s JdbcTemplate, or Apache Commons DbUtils, or just wrap the JDBC ResultSet in an Iterator… What’s very nice about this approach, compared to ORM is the fact that there is no magic happening at all. Every piece of mapping logic is explicit and, thanks to Java generics, fully typesafe. The type of the booksByAuthor output is complex, and a bit hard to read / write, in this example, but it is also fully descriptive and useful. The same functional transformation with POJOs If you aren’t too happy with using jOOQ’s Record2 tuple types, no problem. You can specify your own data transfer objects like so: class Book { public int id; public String title;@Override public String toString() { ... }@Override public int hashCode() { ... }@Override public boolean equals(Object obj) { ... } }static class Author { public String firstName; public String lastName;@Override public String toString() { ... }@Override public int hashCode() { ... }@Override public boolean equals(Object obj) { ... } } With the above DTO, you can now leverage jOOQ’s built-in POJO mapping to transform the jOOQ records into your own domain classes: Map<Author, List<Book>> booksByAuthor = ctx.select( BOOK.ID, BOOK.TITLE, AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME ) .from(BOOK) .join(AUTHOR) .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID)) .orderBy(BOOK.ID) .fetch() .stream() .collect(groupingBy(// This is the grouping key r -> r.into(Author.class), LinkedHashMap::new,// This is the grouping value list mapping( r -> r.into(Book.class), toList() ) )); Explicitness vs. implicitness At Data Geekery, we believe that a new time has started for Java developers. A time where Annotatiomania™ (finally!) ends and people stop assuming all that implicit behaviour through annotation magic. ORMs depend on a huge amount of specification to explain how each annotation works with each other annotation. It is hard to reverse-engineer (or debug!) this kind of not-so-well-understood annotation-language that JPA has brought to us. On the flip side, SQL is pretty well understood. Tables are an easy-to-handle data structure, and if you need to transform those tables into something more object-oriented, or more hierarchically structured, you can simply apply functions to those tables and group values yourself! By grouping those values explicitly, you stay in full control of your mapping, just as with jOOQ, you stay in full control of your SQL. This is why we believe that in the next 5 years, ORMs will lose relevance and people start embracing explicit, stateless and magicless data transformation techniques again, using Java 8 Streams.Reference: Java 8 Friday: More Functional Relational Transformation from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

Fridays Fun

                          Reference: Fridays Fun from our JCG partner Bohdan Bandrivskyy at the Java User Group of Lviv blog....

Use Cases for Elasticsearch: Full Text Search

In the last post of this series on use cases for Elasticsearch we looked at the features Elasticsearch provides for storing even large amounts of documents. In this post we will look at another one of its core features: Search. I am building on some of the information in the previous post so if you haven’t read it you should do so now. As we have seen we can use Elasticsearch to store JSON documents that can even be distributed across several machine. Indexes are used to group documents and each document is stored using a certain type. Shards are used to distribute parts of an index across several nodes and replicas are copies of shards that are used for distributing load as well as for fault tolerance.   Full Text Search Everybody uses full text search. The amount of information has just become too much to access it using navigation and categories alone. Google is the most prominent example offering instant keyword search across a huge amount of information.Looking at what Google does we can already see some common features of full text search. Users only provide keywords and expect the search engine to provide good results. Relevancy of documents is expected to be good and users want the results they are looking for on the first page. How relevant a document is can be influenced by different factors like h the queried term exists in a document. Besides getting the best results the user wants to be supported during the search process. Features like suggestions and highlighting on the result excerpt can help with this. Another area where search is important is E-Commerce with Amazon being one of the dominant players.The interface looks similar to the Google one. The user can enter keywords that are then searched for. But there are also slight differences. The suggestions Amazon provides are more advanced, also hinting at categories a term might be found in. Also the result display is different, consisting of a more structured view. The structure of the documents being searched is also used for determining the facets on the left that can be used to filter the current result based on certain criteria, e.g. all results that cost between 10 and 20 €. Finally, relevance might mean something completely different when it comes to something like an online store. Often the order of the result listing is influenced by the vendor or the user can sort the results by criteria like price or release date. Though neither Google nor Amazon are using Elasticsearch you can use it to build similar solutions. Searching in Elasticsearch As with everything else, Elasticsearch can be searched using HTTP. In the most simple case you can append the _search endpoint to the url and add a parameter: curl -XGET "http://localhost:9200/conferences/talk/_search?q=elasticsearch⪯tty=true". Elasticsearch will then respond with the results, ordered by relevancy. { "took" : 81, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.067124054, "hits" : [ { "_index" : "conferences", "_type" : "talk", "_id" : "iqxb7rDoTj64aiJg55KEvA", "_score" : 0.067124054, "_source":{ "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-17T15:35:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Java Forum Stuttgart", "city" : "Stuttgart" } }} ] } } Though we have searched on a certain type now you can also search multiple types or multiple indices. Adding a parameter is easy but search requests can become more complex. We might request highlighting or filter the documents according to a criteria. Instead of using parameters for everything Elasticsearch offers the so called Query DSL, a search API that is passed in the body of the request and is expressed using JSON. This query could be the result of a user trying to search for elasticsearch but mistyping parts of it. The results are filtered so that only talks for conferences in the city of Stuttgart are returned. curl -XPOST "http://localhost:9200/conferences/_search " -d' { "query": { "match": { "title" : { "query": "elasticsaerch", "fuzziness": 2 } } }, "filter": { "term": { "conference.city": "stuttgart" } } }' This time we are querying all documents of all types in the index conferences. The query object requests one of the common queries, a match query on the title field of the document. The query attribute contains the search term that would be passed in by the user. The fuzziness attribute requests that we should also find documents that contain terms that are similar to the term requested. This will take care of the misspelled term and also return results containing elasticsearch. The filter object requests that all results should be filtered according to the city of the conference. Filters should be used whenever possible as they can be cached and do not calculate the relevancy which should make them faster. Normalizing Text As search is used everywhere users also have some expectations of how it should work. Instead of issuing exact keyword matches they might use terms that are only similar to the ones that are in the document. For example a user might be querying for the term Anwendungsfall which is the singular of the contained term Anwendungsfälle, meaning use cases in German: curl -XGET "http://localhost:9200/conferences/talk/_search?q=title:anwendungsfall⪯tty=true" { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } No results. We could try to solve this using the fuzzy search we have seen above but there is a better way. We can normalize the text during indexing so that both keywords point to the same term in the document. Lucene, the library search and storage in Elasticsearch is implemented with provides the underlying data structure for search, the inverted index. Terms are mapped to the documents they are contained in. A process called analyzing is used to split the incoming text and add, remove or modify terms.On the left we can see two documents that are indexed, on the right we can see the inverted index that maps terms to the documents they are contained in. During the analyzing process the content of the documents is split and transformed in an application specific way so it can be put in the index. Here the text is first split on whitespace or punctuation. Then all the characters are lowercased. In a final step the language dependent stemming is employed that tries to find the base form of terms. This is what transforms our Anwendungsfälle to Anwendungsfall. What kind of logic is executed during analyzing depends on the data of your application. The analyzing process is one of the main factors for determining the quality of your search and you can spend quite some time with it. For more details you might want to look at my post on the absolute basics of indexing data. In Elasticsearch, how fields are analyzed is determined by the mapping of the type. Last week we have seen that we can index documents of different structure in Elasticsearch but as we can see now Elasticsearch is not exactly schema free. The analyzing process for a certain field is determined once and cannot be changed easily. You can add additional fields but you normally don’t change how existing fields are stored. If you don’t supply a mapping Elasticsearch will do some educated guessing for the documents you are indexing. It will look at any new field it sees during indexing and do what it thinks is best. In the case of our title it uses the StandardAnalyzer because it is a string. Elasticsearch does not know what language our string is in so it doesn’t do any stemming which is a good default. To tell Elasticsearch to use the GermanAnalyzer instead we need to add a custom mapping. We first delete the index and create it again: curl -XDELETE "http://localhost:9200/conferences/"curl -XPUT "http://localhost:9200/conferences/“ We can then use the PUT mapping API to pass in the mapping for our type. curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d' { "properties": { "tags": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "analyzer": "german" } } }' We have only supplied a custom mapping for two fields. The rest of the fields will again be guessed by Elasticsearch. When creating a production app you will most likely map all of your fields in advance but the ones that are not that relevant can also be mapped automatically. Now, if we index the document again and search for the singular, the document will be found. Advanced Search Besides the features we have seen here Elasticsearch provides a lot more. You can automatically gather facets for the results using aggregations which we will look at in a later post. The suggesters can be used to perform autosuggestion for the user, terms can be highlighted, results can be sorted according to fields, you get pagination with each request, …. As Elasticsearch builds on Lucene all the goodies for building an advanced search application are available. Conclusion Search is a core part of Elasticsearch that can be combined with its distributed storage capabilities. You can use to Query DSL to build expressive queries. Analyzing is a core part of search and can be influenced by adding a custom mapping for a type. Lucene and Elasticsearch provide lots of advanced features for adding search to your application. Of course there are lots of users that are building on Elasticsearch because of its search features and its distributed nature. GitHub uses it to let users search the repositories, StackOverflow indexes all of its questions and answers in Elasticsearch and SoundCloud offers search in the metadata of the songs. In the next post we will look at another aspect of Elasticsearch: Using it to index geodata, which lets you filter and sort results by postition and distance.Reference: Use Cases for Elasticsearch: Full Text Search from our JCG partner Florian Hopf at the Dev Time blog....

Server vs Client Side Rendering (AngularJS vs Server Side MVC)

There’s a lot of discussion related to server vs client side application rendering. While there is no “one choice fits all” solution, I’ll try to argue in favor of client side (specifically AngularJS) from different points of view. First of them is architecture. Architecture Well done architecture has clearly defined separation of concerns (SoS). In most cases minimal high level configuration is:      Data storage Services API PresentationEach of those layers should have only minimal knowledge of the one above. Services need to know where to store data, API needs to know what services to call and the presentation layer can communicate with the rest only through the API. Important thing to note here is that knowledge about layers below should be non-existent. For example API should not know who or what will consume it. It should have no knowledge of the presentation layer. A lot more should be said for each of those layers and the situation in the “real world” is much more complicated than this. However, for the rest of the article the important takeaway is that the presentation layer communicates with the server through the API which, in turn, does not know anything about the “world outside”. This separation is becoming more important with ever-increasing types of machines and devices (laptop, mobile, tablet, desktop). Back-end should only provide business logic and data. Skills Taking developers skills into account is an important aspect of the architecture. If, for example, developers are used to work in Java, one should not design a system that is based on C# unless there is a clear advantage to do the change. That does not mean that skills should not be increased with new languages (who said Scala and Clojure?). It’s just that productivity of the team must be taken into account and knowledge of programming languages are an important element. No matter the existing knowledge, there are some skills required by the type of the application. For example, if the application will have a Web site as the presentation layer, HTML, CSS and JavaScript knowledge is a must. There are frameworks that can be used to avoid the need for that knowledge (i.e. Vaadin). However, usage of those frameworks only postpones the inevitable: acceptance that HTML, CSS and JS are, one way or another, languages that browser understands. The question is only whether you’ll adopt them directly or use something else (i.e. Java or C#) to write them for you. Later case might give an impression of being able to do things faster but, in many situations, there is a price to pay later when the time comes to do something those “translators” do not support. Server side is easier to cope with. There are more choices and there are good (if not great) solutions for every skill set. We might argue whether Scala is better than Java but both provide results good enough and the decisions to learn a new language are much harder to make (even though I think that a developer should continuously extend his knowledge by trying new languages and frameworks). One can code the back-end in Java, Scala, C#, Clojure, JavaScript/NodeJS, etc. We don’t have that luxury in browsers. Adobe Flash is dead, Silverlight never lifted off. Tools like Vaadin that were designed primarily to alleviate the pain that JavaScript was causing are loosing their ground due to continuous improvements we’re seeing with HTML and JavaScript. The “browser” world is changing rapidly and the situation is quite different from what it was not so long ago. Welcome to the world of HTML5. Similar can be said for development of mobile devices. There is no one language fits all. We cannot develop iPhone applications in Java. While HTML can be the solution in some cases, in others one needs to go for “native” development. The only constant is that, no matter whether it’s Web, mobile, desktop or Google glass, they should all communicate with the rest of the system using an API. The point I’m trying to make is that there must be a balance between the adoption of languages needed to do the work and switching to a new language with every new project. Some languages are a must and some are good (but not mandatory) to have. When working with Web, HTML, CSS and JavaScript are a must. Server vs client side rendering Since we established that, in case of Web sites (who said applications?) HTML with CSS and JavaScript is a must and tools that are trying to create it for us are “evil”, the question remains who renders the HTML. For most of the history of browsers, we were used to render HTML in the server and send it to the browser. There were strong reasons for that. Front-end technologies and frameworks were young and immature, browsers had serious compatibility issues and, generally speaking, working with JavaScript was painful. That picture is not valid any more. Google showed us that in many cases browser is as good as desktop. JQuery revolutionized the way we work with JavaScript by letting us manipulate DOM in a relatively easy way. Many other JS frameworks and libraries were released. However, until recently there was no substitute for the good old model-view-controller (MVC) pattern. Server rendering is a must for all but small sites. Or is it? AngularJS changed the way we perceive MVC (actually it’s model-view-whatever but let’s not get sidetracked). It can be done in the client without sacrificing productivity. I’d argue that, in many cases, with AngularJS productivity increases. There are other client side MVCs like BackboneJS and EmberJS. However, as far as my experience goes, nothing beats AngularJS. AngularJS is not without its problems. Let’s go through pros and cons of client vs server-side page rendering. By client side I’m assuming AngularJS. For this comparison, server-side can be anything (Java, C#, etc). AngularJS cons Page rendering is slower since browser needs to do the extra work of DOM manipulation, watch for changes in bind data, do additional REST requests to the server, etc. First time the application is opened, it needs to download all JavaScript files. Depending on the complexity of the application, this might or might not be a problem. Modern computers are perfectly capable to take over the extra work. Mobile devices are more powerful than older computers. In most cases, clients will not notice this increase in the work browser needs to do. Compatibility with older browsers is hard to accomplish. One would need to render alternative pages on the server. The weight of this argument depends on whether you care for (very) old browsers. The main culprit is Internet Explorer. Version 8 works (somehow) if additional directives are applied. Earlier versions are not supported. Future versions of AngularJS will drop support for Internet Explorer 8. It’s up to you to decide whether support for IE8 and earlier is important. If it is, alternative pages need to be served and that will result in additional development time. Depending on the complexity of the application, same problem might exist in non-AngularJS development. Search Engines Optimisation (SEO) is probably the biggest issue. At the moment, most common technique for mitigating this problem is to pre-render pages on the server. It’s a relatively simple process that requires a small amount of code that will work for any screen. More information can be found in How do I create an HTML snapshot? and Prerender.io. In May 2014 Understanding web pages better article appeared giving us good news about Google being able to execute JavaScript thus solving (or being on the way to solve) SEO problems for sites relying heavily on JS. AngularJS pros Server performance, if done well (clever usage of JSON, client-side caching, etc), increases. The amount of traffic between client and the server is reduced. Server itself does not need to create page before sending it to the client. It only needs to serve static files and respond to API calls with JSON. The traffic and server workload is reduced. AngularJS is designed having testing needs in mind. Together with the dependency injection, mocking objects, services and functions it is very easy to write tests (easier than in most other cases I worked with). Both unit and end-to-end tests can be written and run fast. As suggested in the architecture section, front-end is (almost) completely decoupled from the back-end. AngularJS needs to have knowledge of the REST API. Server still needs to deliver static files (HTML, CSS and JavaScript) and to pre-render screens when crawlers are visiting. However, both jobs do not need any internal knowledge of the rest of the system and can be done on the same or completely different server. Simple NodeJS HTTP server can serve the purpose. This decoupling allows us to develop back-end and front-end independently from each other. With client side rendering, browser is the API consumer in the same way as an Android, iPhone or desktop application would be. Knowledge of server-side programming languages is not needed. No matter the approach one takes (server or client rendering), HTML/CSS/JavaScript is required. Not mixing server-side into this picture makes lives of front-end developers much easier. Google support for Angular is a plus. Having someone like Google behind it makes it more likely that its support and future improvements will continue will full speed. Once used to AngularJS way of working, development speed increases. Amount of code can be greatly reduced. Elimination of the need to re-compile the back-end code allows us to see changes to the front-end almost immediately. Summary This view of the client vs server-side rendering should be taken with caution. There is no “one fits all” solution. Depending on needs and solutions employed, many pros and cons listed above are not valid or can be applied to the server-side rendering as well. Server side rendering is in many cases chosen in order to avoid the dive into HTML, CSS and JavaScript. It makes developers that are used to work with the server-side programming languages (Java, C#, etc) more comfortable thinking that there’s no need to learn “browser” languages. Also, in many cases it produces (often unintentional) coupling with the back-end code. Both situations should be avoided. I’m not arguing that server-side rendering inevitably leads to those situations but that it makes them more likely. It’s a brave new world out there. Client-side programming is quite different from what it was before. There are many reasons to at least try it out. Whatever the decision, it should be taken with enough information that can be obtained only through practical experience. Try it out and don’t give up on the first obstacle (there will be many). If you choose not to take this route, make it an informed decision. Client side MVCs like AngularJS are far from perfect. They are relatively young and have a long way to go. Many improvements will come soon and I’m convinced that the future of Web is going in that direction.Reference: Server vs Client Side Rendering (AngularJS vs Server Side MVC) from our JCG partner Viktor Farcic at the Technology conversations blog....

Applied Big Data : The Freakonomics of Healthcare

I went with a less provocative title this time because my last blog post (http://brianoneill.blogspot.com/2014/04/big-data-fixes-obamacare.html) evidently incited political flame wars. In this post, I hope to avoid that by detailing exactly how Big Data can help our healthcare system in a nonpartisan way. First, let’s decompose the problem a bit. Economics Our healthcare system is still (mostly) based on capitalism: more patients + more visits = more money. Within such a system, it is not in the best interest of healthcare providers to have healthy patients. Admittedly, this is a pessimistic view, and doctors and providers are not always prioritizing financial gain. Minimally however, at a macro-scale there exists a conflict of interest for some segment of the market, because not all healthcare providers profit entirely from preventative care. Behavior Right now, with a few exceptions, everyone pays the same for healthcare. Things are changing, but broadly speaking, there are no financial incentives to make healthy choices. We are responsible only for a fraction of the medical expenses we incur. That means everyone covered by my payer (the entity behind the curtain that actually foots the bills) is helping pay for the medical expenses I may rack up as a result of my Friday night pizza and beer binges. Government Finally, the government is trying. They are trying really hard. Through transparency, reporting, and compliance, they have the correct intentions and ideas to bend the cost curve of healthcare. But the government is the government, and large enterprises are large enterprises. And honestly, getting visibility into the disparate systems of any large single large enterprise is difficult (ask any CIO). Imagine trying to gain visibility into thousands enterprises, all at once. It’s daunting: schematic disparities, messy data, ETL galore. Again, this is a pessimistic view and there are remedies in the works. Things like high deductible plans are making people more aware of their expenses. Payers are trying to transition away from fee-for-service models. (http://en.m.wikipedia.org/wiki/Fee-for-service). But what do these remedies need to be effective? You guessed it. Data. Mounds of it. If you are a payer and want to reward the doctors that are keeping their patients healthy (and out of the doctors offices!), how would you find them? If you are a patient, and want to know who provides the most effective treatments at the cheapest prices, where would you look?  If you are the government and want to know how much pharmaceutical companies are spending on doctors, or which pharmacies are allowing fraudulent prescriptions, what systems would you need to integrate? Hopefully now, you are motivated. This is a big data problem. What’s worse is that it is a messy data problem.  At HMS, its taken us more than three years and significant blood, sweat and tears to put together a platform that deals with the big and messy mound o’ data. The technologies had to mature, along with people and processes. And finally, on sunny days, I can see a light at the end of the tunnel for US healthcare. If you are on the same mission, please don’t hesitate to reach out. Ironically, I’m posting this from a hospital bed as I recover from the bite of a brown recluse spider. I guess there are certain things that big data can’t prevent!Reference: Applied Big Data : The Freakonomics of Healthcare from our JCG partner Brian ONeill at the Brian ONeill’s Blog blog....

Developer Promotion – practical guide & statistics about the Java ecosystem

Before we launched our recent promotion for JavaOne tickets with Java Code Geeks we sought out guides/tutorials about running such a promotion, unfortunately we didn’t find anything useful so we had to improvise and work by instinct. This post will also cover some of the statistical results about the Java community so if this is the reason you are reading this post skip down for some interesting takeaways. Running a promotion tips & tricksThe idea started when one of our talks got accepted to JavaOne, suddenly we had two extra tickets (we got 3 tickets with the booth) so we could give away two tickets. From our experience we came to the conclusion that most developers don’t know about Codename One yet and that is probably the most important thing we need to improve in our outreach. So here are a few conclusions we came to in no particular order:We need a partner to handle the giveaway – this removed the bias from us. We have an incentive to give the tickets to our best customers so we can’t really be fair. That’s why we picked Java Code Geeks. We picked Java Code Geeks as a partner – we needed a partner who is focused on Java (check), who has a big enough audience (check), whose audience isn’t already familiar with our product (check). This is probably the most important step, most of our contestant came from Java Code Geeks and as you can see from the chart to the right. Most of the people who entered our contest didn’t know about Codename One in advance and were happy to find out about us. We created incentives to social share – most of the people who would read Java Code Geeks are probably up to date with the latest technology, but their friends (and friends of friends) might not be. The true value of the promotion is social share which is why we made the contest require sharing. Initially we wanted the person with the most retweets/shares to “win” but Byron from Java Code Geeks objected to this approach (in hindsight I think he was right to object) claiming that it would rule out people who don’t have many friends. It would also have promoted cheating which is a huge problem. We created very detailed rules for the competition – yes, most people didn’t read them (as is obvious by some of the submissions we got) but there are now rules that the raffle itself should follow. We used Google docs to collect details and signups which allowed us to collect a lot of structured data.We included an entirely optional survey and 99% of submitters answered questions within the survey despite the optionality! We promoted the giveaway in our respective newsletters which is usually the most effective way to get people to signup. However, a single retweet by @java is probably more valuable than an 18k strong mailing list. We used Facebook, Twitter and adsense ads – all charged us and delivered literally nothing. Facebook charged for clicks that Google Analytics didn’t show. Normally I would chuck this as a Google analytics omission however we saw a lot of clicks on the Facebook side and no submissions to the contest form… This seems to indicate that Facebook is charging for fraudulent clicks!Results  We asked developers which operating system they were using and were pretty shocked by the results. We see Macs all over JavaOne and all the conferences I go to, but it seems that Macs are a minority to Linux!        The IDE of choice brought a pretty predictable result although much closer than we would have thought as the number of answers grew the distance between the IDE’s grew as well with NetBeans surpassing IDEA and Eclipse growing in strength. However, the race between the IDE’s is still very real and despite Eclipse lead its position seems very assailable in the market. Our decision to support all 3 IDE’s seems like an excellent decision in this light.    Another shocker is the Java platform of choice. I always get the sense that all developers working with Java have migrated to Java EE. This doesn’t seem to be the case entirely, there are plenty of developers still working with Java SE. I’m actually surprised there are still 10% of developers that are working on Java ME.      What is your preferred Java news source, here we excluded Java Code Geeks since we assumed unreasonable bias in the results (according to other stats it would probably occupy well over 50%). Notice that while some outlets (e.g. dzone) are clearly dominant there is no true leader. When targeting the Java community we literally need to be everywhere.    Do Java developers use other languages besides Java? Honestly we should have asked what for? JavaScript/CoffeScript took the lead but do developers use it for HTML or do they use server JavaScript e.g. NodeJS? The main motivation behind asking this question was potential support for other languages in Codename One especially JVM languages e.g. JRuby, Scala etc. We probably should have phrased a few different questions to get refined answers.   Final thoughts I hope you found these results useful and illuminating. For the next contest/survey we will run we picked up the following lessons:Better limits on the goals of the question – e.g. with the languages besides Java we didn’t quite get an answer. I would probably ask, are you interested in other languages? Would you prefer another language? We would not waste a cent on Advertising. Partner selection is critical – Java Code Geeks delivered! As pleased as we were with the work done by the guys at Java  Code Geeks, it was crucial for us to come prepared. We phrased the contest terms and defined most of the conditions/questions. Your partner can be very helpful but only you can define your own expectations and results.Reference: Developer Promotion – practical guide & statistics about the Java ecosystem from our JCG partner Shai Almog at the Codename One blog....

Neo4j: LOAD CSV – Processing hidden arrays in your CSV documents

I was recently asked how to process an ‘array’ of values inside a column in a CSV file using Neo4j’s LOAD CSV tool and although I initially thought this wouldn’t be possible as every cell is treated as a String, Michael showed me a way of working around this which I thought was pretty neat. Let’s say we have a CSV file representing people and their friends. It might look like this:         name,friends "Mark","Michael,Peter" "Michael","Peter,Kenny" "Kenny","Anders,Michael" And what we want is to have the following nodes:Mark Michael Peter Kenny AndersAnd the following friends relationships:Mark -> Michael Mark -> Peter Michael -> Peter Michael -> Kenny Kenny -> Anders Kenny -> MichaelWe’ll start by loading the CSV file and returning each row: $ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row; +------------------------------------------------+ | row | +------------------------------------------------+ | {name -> "Mark", friends -> "Michael,Peter"} | | {name -> "Michael", friends -> "Peter,Kenny"} | | {name -> "Kenny", friends -> "Anders,Michael"} | +------------------------------------------------+ 3 rows As expected the ‘friends’ column is being treated as a String which means we can use the split function to get an array of people that we want to be friends with: $ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row, split(row.friends, ",") AS friends; +-----------------------------------------------------------------------+ | row | friends | +-----------------------------------------------------------------------+ | {name -> "Mark", friends -> "Michael,Peter"} | ["Michael","Peter"] | | {name -> "Michael", friends -> "Peter,Kenny"} | ["Peter","Kenny"] | | {name -> "Kenny", friends -> "Anders,Michael"} | ["Anders","Michael"] | +-----------------------------------------------------------------------+ 3 rows Now that we’ve got them as an array we can use UNWIND to get pairs of friends that we want to create: $ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row WITH row, split(row.friends, ",") AS friends UNWIND friends AS friend RETURN row.name, friend; +-----------------------+ | row.name | friend | +-----------------------+ | "Mark" | "Michael" | | "Mark" | "Peter" | | "Michael" | "Peter" | | "Michael" | "Kenny" | | "Kenny" | "Anders" | | "Kenny" | "Michael" | +-----------------------+ 6 rows And now we’ll introduce some MERGE statements to create the appropriate nodes and relationships: $ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row WITH row, split(row.friends, ",") AS friends UNWIND friends AS friend MERGE (p1:Person {name: row.name}) MERGE (p2:Person {name: friend}) MERGE (p1)-[:FRIENDS_WITH]->(p2); +-------------------+ | No data returned. | +-------------------+ Nodes created: 5 Relationships created: 6 Properties set: 5 Labels added: 5 373 ms And now if we query the database to get back all the nodes + relationships… $ match (p1:Person)-[r]->(p2) RETURN p1,r, p2; +------------------------------------------------------------------------+ | p1 | r | p2 | +------------------------------------------------------------------------+ | Node[0]{name:"Mark"} | :FRIENDS_WITH[0]{} | Node[1]{name:"Michael"} | | Node[0]{name:"Mark"} | :FRIENDS_WITH[1]{} | Node[2]{name:"Peter"} | | Node[1]{name:"Michael"} | :FRIENDS_WITH[2]{} | Node[2]{name:"Peter"} | | Node[1]{name:"Michael"} | :FRIENDS_WITH[3]{} | Node[3]{name:"Kenny"} | | Node[3]{name:"Kenny"} | :FRIENDS_WITH[4]{} | Node[4]{name:"Anders"} | | Node[3]{name:"Kenny"} | :FRIENDS_WITH[5]{} | Node[1]{name:"Michael"} | +------------------------------------------------------------------------+ 6 rows …you’ll see that we have everything. If instead of a comma separated list of people we have a literal array in the cell… name,friends "Mark", "[Michael,Peter]" "Michael", "[Peter,Kenny]" "Kenny", "[Anders,Michael]" …we’d need to tweak the part of the query which extracts our friends to strip off the first and last characters: $ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row RETURN row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends; +-------------------------------------------------------------------------+ | row | friends | +-------------------------------------------------------------------------+ | {name -> "Mark", friends -> "[Michael,Peter]"} | ["Michael","Peter"] | | {name -> "Michael", friends -> "[Peter,Kenny]"} | ["Peter","Kenny"] | | {name -> "Kenny", friends -> "[Anders,Michael]"} | ["Anders","Michael"] | +-------------------------------------------------------------------------+ 3 rows And then if we put the whole query together we end up with this: $ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row WITH row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends UNWIND friends AS friend MERGE (p1:Person {name: row.name}) MERGE (p2:Person {name: friend}) MERGE (p1)-[:FRIENDS_WITH]->(p2);; +-------------------+ | No data returned. | +-------------------+ Nodes created: 5 Relationships created: 6 Properties set: 5 Labels added: 5Reference: Neo4j: LOAD CSV – Processing hidden arrays in your CSV documents from our JCG partner Mark Needham at the Mark Needham Blog blog....

Is TDD Dead Or Alive?

This blog post is for those of you who are unaware that there is a major debate in contemporary software development happening now, today. People have been wondering about the value of Test-Driven Development (TDD) for a long while, but it was not until David Heinemeier Hansson of 37 Signals  posted a blog article on St. George’s Day (23rd April 2014 and a significant day personally for me) called “TDD is Dead. Long live testing”. It started an unstoppable momentum of discussion into what is now a heated and controversial topic. So I encourage you read his article, first, if you are an experienced software engineer and then watch the videos. After this article was posted to the Internet, Heinemeier Hanson, Martin Fowler and Kent Beck followed up with a series of Google Hangout recordings, in May and June, each one was about 30 minutes long. The last edition (part V and VII), which was full hour long was very revealing, because it was a question and answer sessions from the community and therefore extended the discussion to the wider audience. If you have 3 hours of free time, instead watching another Hollywood movie or re-runs of World Cup football, then it is well worth your time to listening (and/or watching) the entire series to understand the debate and different points of view. As for my own opinion on TDD, it has a place in my toolbox like another hammer or a screwdriver, and it should be treated as such. To be a TDD practitioner requires skill and discipline; and the knowledge to believe that it is not appropriate in all situations. I have some sympathy with Heinemeier Hansson’s frustrated view, “Most people cannot just leave good ideas the f**k alone”. I witnessed a certain zealotry in a couple job interviews a few years ago, when interviewers used the technique like an officer using a truncheon to beat somebody with. If you didn’t write code in a correct view for the client, you were rejected. I raised suspicions in my head about the variations of different testing styles across different organisations and sectors. In the end, I have suspected that you do not have write code test first all the time, with the plethora of unit testing frameworks out there such JUnit, ScalaTest, Scala Specs. It is more important to have the ground level, there must be tests in the application and system software that exercise the customer’s acceptance requirement and produces a Minimal Viable Product. This is definite the road to follow. PS: The first video in the series, “Is TDD Dead?”, starts here.Reference: Is TDD Dead Or Alive? from our JCG partner Peter Pilgrim at the Peter Pilgrim’s blog blog....

10 things you can do to as a developer to make your app secure: #10 Design Security In

There’s more to secure design and architecture besides properly implementing Authentication, Access Control and Logging strategies, and choosing (and properly using) a good framework. You need to consider and deal with security threats and risks at many different points in your design. Adam Shostack’s new book on Threat Modeling explores how to do this in detail, with lots of exercises and examples on how to look for and plug security holes in software design, and how to think about design risks.   But some important basic ideas in secure design will take you far: Know your Tools When deciding on the language(s) and technology stack for the system, make sure that you understand the security constraints and risks that your choices will dictate. If you’re using a new language, take time to learn about how to write code properly and safely in that language. If you’re programming in Java, C or C++ or Perl check out CERT’s secure coding guidelines for those languages. If you’re writing code on iOS, read Apple’s Secure Coding Guide. For .NET, review OWASP’s .NET Security project. Look for static analysis tools like Findbugs and PMD for Java, JSHint for Javascript, OCLint for C/C++ and Objective-C, Brakeman for Ruby, RIPS for PHP, Microsoft’s static analysis tools for .NET, or commercial tools that will help catch common security bugs and logic bugs in coding or Continuous Integration. And make sure that you (or ops) understand how to lock down or harden the O/S and to safely configure your container and database (or NoSQL data) manager. Tiering and Trust Tiering or layering, and trust in design are closely tied together. You must understand and verify trust assumptions at the boundaries of each layer in the architecture and between systems and between components in design, in order to decide what security controls need to be enforced at these boundaries: authentication, access control, data validation and encoding, encryption, logging. Understand when data or control crosses a trust boundary: to/from code that is outside of your direct control. This could be an outside system, or a browser or mobile client or other type of client, or another layer of the architecture or another component or service. Thinking about trust is much simpler and more concrete than thinking about threats. And easier to test and verify. Just ask some simple questions: Where is the data coming from? How can you be sure? Can you trust this data – has it been validated and safely encoded? Can you trust the code on the other side to protect the integrity and confidentiality of data that you pass to it? Do you know what happens if an exception or error occurs – could you lose data or data integrity, or leak data, does the code fail open or fail closed? Before you make changes to the design, make sure that you understand these assumptions and make sure that the assumptions are correct. The Application Attack Surface Finally, it’s important to understand and manage the system’s Attack Surface: all of the ways that attackers can get in, or get data out, of the system, all of the resources that are exposed to attackers. APIs, files, sockets, forms, fields, URLs, parameters, cookies. And the security plumbing that protects these parts of the system. Your goal should be to try to keep the Attack Surface as small as possible. But this is much easier said than done: each new feature and new integration point expands the Attack Surface. Try to understand the risks that you are introducing, and how serious they are. Are you creating a brand new network-facing API or designing a new business workflow that deals with money or confidential data, or changing your access control model, or swapping out an important part of your platform architecture? Or are you just adding yet another CRUD admin form, or just one more field to an existing form or file. In each case you are changing the Attack Surface, but the risks will be much different, and so will the way that you need to manage these risks. For small, well-understood changes the risks are usually negligible – just keep coding. If the risks are high enough you’ll need to do some abuse case analysis or threat modeling, or make time for a code review or pen testing. And of course, once a feature or option or interface is no longer needed, remove it and delete the code. This will reduce the system’s Attack Surface, as well as simplifying your maintenance and testing work. That’s it. We’re done. The 10 things you can do as a developer to make your app secure: from thinking about security in architectural layering and technology choices, including security in requirements, taking advantage of other people’s code by using frameworks and libraries carefully, making sure that you implement basic security controls and features like Authentication and Access Control properly, protecting data privacy, logging with security in mind, and dealing with input data and stopping injection attacks, especially SQL injection.   This isn’t an exhaustive list. But understanding and dealing with these important issues in application security – including security when you think about requirements and design and coding and testing, knowing more about your tools and using them properly – is work that all developers can do, and will take you a long, long way towards making your system secure and reliable.Reference: 10 things you can do to as a developer to make your app secure: #10 Design Security In from our JCG partner Jim Bird at the Building Real Software blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books