Featured FREE Whitepapers

What's New Here?


INTEL Perceptual Computing – RealSense Challenge 2014

           Perceptual Computing technology is redefining the boundaries between human and computer interaction. Intel invites you to claim your share of history by designing new, leading edge perceptual computing Apps. RealSense Challenge 2014 is a new contest in which developers are challenged to design perceptual computing apps. At the heart of this competition, the new Intel RealSense 3D camera and SDK allow to interact with computer by supporting hand/finger tracking, facial analysis, speech recognition, background subtraction and augmented reality.  RealSense Challenge 2014 The competition has two phases: Ideation and Development. The ideation phase will be opened until the end of September, all you are asked to do is to submit your ideas (as an individual or as a team) and try to be within the 1300 participants who will be invited to turn their ideas into working demos. Everyone participating to the development phase will be loaned the Intel 3D camera and RealSense SDK for C/C++ development. There are also two tracks for this challenge. The Pioneer track is open to all developers from around the world whereas the Ambassador track is only open to developers who submitted a demo to one of Intel Perceptual Computing Challenge 2013 or to its Ultimate Coder Challenge. Up to 1000 Pioneers and 300 Ambassadors will be chosen to move forward on the Development phase. Both contest tracks will accept entries from participants in the following Innovation categories :Gaming + Play Learning Entertainment Interact naturally Collaboration/Creation Open innovationThere are $1 Million cash prizes to be shared by the Pioneer and Ambassador groups. Each track will compete independantly though.  PioneerAmbassadorGRAND PRIZE (1) $25,000 One overall winner chosen from the first place winners of each category will win an additional $25,000 cash prize.GRAND PRIZE (1) $50,000 One overall winner chosen from the first place winners of each category will win an additional $50,000 cash prize.FIRST PLACE (5) $25,000 The top scoring demo in each category will win a $25,000 cash prizeFIRST PLACE (5) $50,000 The top scoring demo in each category will win a $50,000 cash prizeSECOND PLACE (10) $10,000 Two demos from each of the 5 categories will receive a $10,000 cash prizeSECOND PLACE (10) $20,000 Two demos from each of the 5 categories will receive a $20,000 cash prizeEARLY SUBMISSION (50) $1,000 The top scoring demos, submitted prior to the Early submission deadline, across all 5 categories will each receive a cash prize of $1,000EARLY SUBMISSION (30) $1,000 The top scoring demos, submitted prior to the Early submission deadline, across all 5 categories will each receive a cash prize of $1,000HASWELL NUC (250) The top 250 scoring demos from Phase 1, across all 5 categories, will receive a Haswell NUC device valued at nearly $600.HASWELL NUC (50) The top 50 scoring demos from Phase 1, across all 5 categories, will receive a Haswell NUC device valued at nearly $600.  If you are a Pioneer, you can sign up today and have until October 1, 2014 to submit your idea. If you are an Ambassador, you can simply go to the challenge page and sign-in with the email address used for the 2013 competition. More info and subscription on the RealSense Challenge 2014 page On the side, Intel organizes two webinars for  developers to get inspired and to learn more about natural user interface and RealSense technology : Webinar 1 : Learn about gesture recognition technology, on technical side – August 13th 2014 1pm Eastern  Webinar 2 : Wide variety of usages for natural user interface – August 20th 2014 1pm Eastern...

What DSLs are not for

Domain specific languages are special programming languages. Each fits some special “domain” and makes the business code simpler. Using a DSL the business level problem can be implemented higher level and therefore the resulting code is simpler, it is created faster, presumably contains less errors. Some DSLs in some areas make it even possible to develop business functionality by the domain experts who have limited programming experience. There are many great books on DSLs Martin Fowler’s one being at least one of, if not the best of the topic. Many times the decision to use DSL is to shorten release cycles. A mature software in a rapidly changing business domain may change frequently but many times the change is small. If it requires the change of the code then the whole release cycle is to be repeated. Code is modified, unit tested, release candidate is created, QA tests the new version and finally the release is ready after weeks the new business need arose. The obvious approach is to embed some DSL into the application and develop some business function that is likely to be changed in the future in this DSL. The “script” written in DSL may not be part of the real release and therefore the change can go through the system faster. Developers have less obvious coding, which developers usually do not like, business is happy getting the modified functions faster. Right? WRONG! But not so obviously at the first time, perhaps. The DSL functions fine, the new behavior is delivered faster and there is no problem. Some time later, however, there come a new feature that can not be implemented in the DSL and needs the change of the code application code. Why not extend the DSL and implement the new functionality in the new version of the DSL? This approach is very lucrative but it is very dangerous. DSL are like alcohol. They can have a purpose and can serve good. A cup of quality wine after a nice summer evening supper should not harm. Too much of it regularly will ruin your life. A DSL that has too many features may be dangerous. Some may use it for the good, but there is a possibility for abuse. The release process was examined and engineered when the DSL was introduced but may not be reviewed as the DSL became more and more powerful and suddenly you may face a situation when new features are developed into the software out of the release cycle. At some point the release process and the most crucial part of it, quality assurance may be ruined. DSL should be simple. Modification of the application scripting should also follow some release management. There may not be release management at all. I have heard of software projects where the software was released to public without any significant testing. If there was an error, the users complained about it and a new release came out an hour later. Fixing one bug, creating a new one. No problem if the business can stand that. The actual software was a facebook like application where new feature was more important for the users than uninterrupted use. Other applications in telecom, banking should be tested a bit more rigorous. Regulation may even demand all releases to be archived. In that case scripting out of the release cycle is out of question. And there may be something in the middle. Some part, some features of the application may need strict release management, while other may not demand that. Some part can be scripted using some DSL, other core functions need strong QA and release management. Some features may mix the both: scripted and still part of the cycle. The important message is: Application scripting in DSL does not ease release management and/or QA. If the release management cycle can be releases for some part of the application feature, DSL may be a tool to aid that, but DSL is never the reason.Reference: What DSLs are not for from our JCG partner Peter Verhas at the Java Deep blog....

Using IntelliJ bookmarks

This is a quick post about IntelliJ’s nice bookmark feature. IntelliJ gives you the option to bookmark single lines of code. After a line has been bookmarked, you can use various ways to jump directly back to this line. So it can be a good idea to bookmarks code locations you often work with. To create a new bookmark you only have to press F11 inside the code editor. Bookmarked lines show a small checkmark next to the line number.        Bookmarks can be removed by selecting the bookmarked line and pressing F11 again. To see all bookmarks you can press Shift – F11. This opens a small popup window which shows a list of all bookmarks you have created.Note that this window can be completely controlled by using the keyboard:With Up / Down you can browse the list of bookmarks With Enter you jump to the selected bookmark Esc closes the window A bookmark can be moved up or down using Alt – Up / Alt – DownNote that you can also add a mnemonic identifier to a bookmark. You do this by selecting a line and pressing Ctrl – F11. This opens a small menu in which you can choose a mnemonic identifier (which is a character or a number).You can choose an identifier by clicking on one of the menu buttons or by simply pressing the corresponding key on your keyboard. Bookmark mnemonics are also shown next to the line number. In the following image 1 was choosen as mnemonic.Mnemonics give you the option to move even quicker between bookmarks. You can directly jump to a mnemonic bookmark by opening the bookmark popup (Shift – F11) and pressing the mnemonic key (1 in this example). For numerical bookmarks even more shortcuts are available. You can toggle a numeric mnemonic on a selected line by pressing Ctrl – Shift – <number>. If you want to jump to a numeric mnemonic you use the Ctrl – <number> shortcut. For example: Ctrl – 5 brings you directly to the mnemonic bookmark 5. Note that bookmarks are also shown in the Favorites view. So if you like clicking with your mouse, this is for you!Reference: Using IntelliJ bookmarks from our JCG partner Michael Scharhag at the mscharhag, Programming and Stuff blog....

A beginner’s guide to JPA/Hibernate flush strategies

Introduction In my previous post I introduced the entity state transitions Object-relational mapping paradigm. All managed entity state transitions are translated to associated database statements when the current Persistence Context gets flushed. Hibernate’s flush behavior is not always as obvious as one might think.       Write-behind Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind. The write-behind is more related to Hibernate flushing rather than any logical or physical transaction. During a transaction, the flush may occur multiple times. The flushed changes are visible only for the current database transaction. Until the current transaction is committed, no change is visible by other concurrent transactions. The persistence context, also known as the first level cache, acts as a buffer between the current entity state transitions and the database. In caching theory, the write-behind synchronization requires that all changes happen against the cache, whose responsibility is to eventually synchronize with the backing store. Reducing lock contention Every DML statement runs inside a database transaction. Based on the current database transaction isolation level, locks (shared or explicit) may be acquired for the current selected/modified table rows. Reducing the lock holding holding time lowers the dead-lock probability, and according to the scalability theory, it increases throughput. Locks always introduce serial executions, and according to Amdahl’s law, the maximum speedup is inversely proportional with the serial part of the currently executing program. Even in READ_COMMITTED isolation level, UPDATE and DELETE statements acquire locks. This behavior prevents other concurring transactions from reading uncommitted changes or modify the rows in question. So, deferring locking statements (UPDATE/DELETE) may increase performance, but we must make sure that data consistency is not affected whatsoever. Batching Postponing the entity state transition synchronization has another major advantage. Since all changes are being flushed at once, Hibernate may benefit from the JDBC batching optimization. Batching improves performance by grouping multiple DML statements into a single operation, therefore reducing database round-trips. Read-your-own-writes consistency Since queries are always running against the database (unless second level query cache is being hit), we need to make sure that all pending changes are synchronized before the query starts running. Therefore, both JPA and Hibernate define a flush-before-query synchronization strategy. From JPA to Hibernate flushing strategiesJPA FlushModeType Hibernate FlushMode Hibernate implementation detailsAUTO AUTO The Session is sometimes flushed before query execution.COMMIT COMMIT The Session is only flushed prior to a transaction commit.ALWAYS The Session is always flushed before query execution.MANUAL The Session can only be manually flushed.NEVERDeprecated. Use MANUAL instead. This was the original name given to manual flushing, but it was misleading users into thinking that the Session won’t ever be flushed.  Current Flush scope The Persistence Context defines a default flush mode, that can be overridden upon Hibernate Session creation. Queries can also take a flush strategy, therefore overruling the current Persistence Context flush mode.Scope Hibernate JPAPersistence Context Session EntityManagerQuery Query Criteria Query TypedQuery  Stay tuned In my next post, you’ll find out that Hibernate FlushMode.AUTO breaks data consistency for SQL queries and you’ll see how you can overcome this shortcoming.Reference: A beginner’s guide to JPA/Hibernate flush strategies from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....

Feature Toggles are one of the worst kinds of Technical Debt

Feature flags or config flags aka feature toggles aka flippers are an important part of Devops practices like dark launching (releasing features immediately and incrementally), A/B testing, and branching in code or branching by abstraction (so that development teams can all work together directly on the code mainline instead of creating separate feature branches). Feature toggles can be simple Boolean switches or complex decision trees with multiple different paths. Martin Fowler differentiates between release toggles (which are used by development and ops to temporarily hide incomplete or risky features from all or part of the user base) and business toggles to control what features are available to different users (which may have a longer – even permanent – life). He suggests that these different kinds of flags should be managed separately, in different configuration files for example. But the basic idea is the same, to build conditional branches into mainline code in order to make logic available only to some users or to skip or hide logic at run-time, including code that isn’t complete (the case for branching by abstraction). Using run-time flags like this isn’t a new idea, certainly not invented at Flickr or Facebook. Using flags and conditional statements to offer different experiences to different users or to turn on code incrementally is something that many people have been practicing for a long time. And doing this in mainline code to avoid branching is in many ways a step back to the way that people built software 20+ years ago when we didn’t have reliable and easy to use code management systems. Advantages and Problems of Feature Flags Still, there are advantages to developers working this way, making merge problems go away, and eliminating the costs of maintaining and supporting long-lived branches. And carefully using feature flags can help you to reduce deployment risk through canary releases or other incremental release strategies, where you make the new code active for only some users or customers, or only on some systems, and closely check before releasing progressively to the rest of the user base – and turn off the new code if you run into problems. All of this makes it easier to get new code out faster for testing and feedback. But using feature flags creates new problems of its own. The plumbing and scaffolding logic to support branching in code becomes a nasty form of technical debt, from the moment each feature switch is introduced. Feature flags make the code more fragile and brittle, harder to test, harder to understand and maintain, harder to support, and less secure. Feature Flags need to be Short Lived Abhishek Tiwari does a good job of explaining feature toggles and how they should be used. He makes it clear that they should only be a temporary deployment/release management tool, and describes a disciplined lifecycle that all feature toggles need to follow, from when they are created by development, then turned on by operations, updated if any problems or feedback come up, and finally retired and removed when no longer needed.Feature toggles require a robust engineering process, solid technical design and a mature toggle life-cycle management. Without these 3 key considerations, use of feature toggles can be counter-productive. Remember the main purpose of toggles is to perform release with minimum risk, once release is complete toggles need to be removed.Feature Flags are Technical Debt – as soon as you add them Like other sources of technical debt, feature flags are cheap and easy to add in the short term. But the longer that they are left in the code, the more that they will end up costing you. Release toggles are supposed to make it easier and safer to push code out. You can push code out only to a limited number of users to start, reducing the impact of problems, or dark launch features incrementally, carefully assessing added performance costs as you turn on some of the logic behind the scenes, or run functions in parallel. And you can roll-back quickly by turning off features or optional behaviour if something goes wrong or if the system comes under too much load. But as you add options, it can get harder to support and debug the system, keeping track of which flags are in which state in production and test can make it harder to understand and duplicate problems. And there are dangers in releasing code that is not completely implemented, especially if you are following branching by abstraction and checking in work-in-progress code protected by a feature flag. If the scaffolding code isn’t implemented correctly you could accidentally expose some of this code at run-time with unpredictable results.…visible or not, you are still deploying code into production that you know for a fact to be buggy, untested, incomplete and quite possibly incompatible with your live data. Your if statements and configuration settings are themselves code which is subject to bugs – and furthermore can only be tested in production. They are also a lot of effort to maintain, making it all too easy to fat-finger something. Accidental exposure is a massive risk that could all too easily result in security vulnerabilities, data corruption or loss of trade secrets. Your features may not be as isolated from each other as you thought you were, and you may end up deploying bugs to your production environment” James McKay The support dangers of using – or misusing – feature flags was illustrated by a recent high-profile business failure at a major financial institution. The team used feature flags to contain operational risk when they introduced a new application feature. Unfortunately, they re-purposed a flag which was used by old code (code left in the system even though it hadn’t been used in years). Due to some operational mistakes in deployment, not all of the servers were successfully updated with the new code, and when the flag was turned on, old code and new code started to run on different computers at the same time doing completely different things with wildly inconsistent and, ultimately business-ending results. By the time that the team figured out what was going wrong, the company had lost millions of $. As more flags get added, testing of the application becomes harder and more expensive, and can lead to an explosion of combinations: If a is on and b is off and c is on and d is off then… what is supposed to happen? Fowler says that you only need to test the combinations which should reasonably be expected to happen in production, but this demands that everyone involved clearly understand what options could and should be used together – as more flags get added, this gets harder to understand and verify. And other testing needs to be done to make sure that switches can be turned on and off safely at run-time, and that features are completely and safely encapsulated by the flag settings and that behaviour doesn’t leak out by accident (especially if you are branching in code and releasing work-in-progress code). You also need to test to make sure that the structural changes to introduce the feature toggle do not introduce any regressions, all adding to testing costs and risks. More feature flags also make it harder to understand how and where to make fixes or changes, especially when you are dealing with long-lived flags and nested options. And using feature switches can make the system less secure, especially if you are hiding access to features in the UI. Adding a feature can make the attack surface of the application bigger, and hiding features at the UI level (for dark launching) won’t hide these features from bad guys. Use Feature Flags with Caution Feature flags are a convenient and flexible way to manage code, and can help you to get changes and fixes out to production more quickly. But if you are going to use flags, do so responsibly:Minimize your use of feature flags for release management, and make the implementation as simple as possible. Martin Fowler explains that it is important to minimize conditional logic to the UI and to entry points in the system. He also emphasises that:Release toggles are a useful technique and lots of teams use them. However they should be your last choice when you’re dealing with putting features into production. Your first choice should be to break the feature down so you can safely introduce parts of the feature into the product. The advantages of doing this are the same ones as any strategy based on small, frequent releases. You reduce the risk of things going wrong and you get valuable feedback on how users actually use the feature that will improve the enhancements you make later.Review flags often, make sure that you know which flags are on and which are supposed to be on and when features are going to be removed. Create dashboards (so that everyone can easily see the configuration) and health checks – run-time assertions – to make sure that important flags are on or off as appropriate. Once a feature is part of mainline, be ruthless about getting it out of the code base as soon as it isn’t used or needed any more. This means carefully cleaning up the feature flags and all of the code involved, and testing again to make sure that you didn’t break anything when you did this. Don’t leave code in the mainline just in case you might need it again some day. You can always go back and retrieve it from version control if you need to. Recognize and account for the costs of using feature flags, especially long-lived business logic branching in code.Feature toggles start off simple and easy. They provide you with new options to get changes out faster, and can help reduce the risk of deployment in the short term. But the costs and risks of relying on them too much can add up, especially over the longer term.Reference: Feature Toggles are one of the worst kinds of Technical Debt from our JCG partner Jim Bird at the Building Real Software blog....

Test Attribute #8 – Truthiness

I want to thank Steven Colbert for coining a word I can use in my title. Without him, all this would still be possible, had I not given up looking for a better word after a few minutes. Tests are about trust. We expect them to be reliable. Reliable tests tell us everything is ok when they pass, and that something is wrong when they fail. The problem is that life is not black and white, and tests are not just green and red. Tests can give false positive (fail when they shouldn’t) or false negative (pass when they shouldn’t) results. We’ve encountered the false positive ones before – these are the fragile, dependent tests. The ones that pass, instead of failing, are the problematic ones. They hide the real picture from us, and erode our trust, not just in those tests, but also in others. After all, when we find out a problematic tests, who can say the others we wrote are not problematic as well? Truthiness (how much we feel the tests are reliable) comes into play. Dependency Injection Example Or rather, injecting an example of a dependency. Let’s say we have a service (or 3rd party library) our tested code uses. It’s slow and communication is unreliable. All the things that give services a bad name. Our natural tendency is to mock the service in the test. By mocking the service, we can test our code in isolation. So, in our case, our tested Hotel class uses a Service: public class Hotel { public string GetServiceName(Service service) { var result = service.GetName(); return "Name: " + result; } } To know if the method works correctly, we’ll write this test: [TestMethod]public void GetServiceName_RoomService_NameIsRoom() { var fakeService = A.Fake<Service>(); A.CallTo(() => fakeService.GetName()).Returns("Room");var hotel = new Hotel(); Assert.AreEqual("Name: Room", hotel.GetServiceName(fakeService)); } And everything is groovy. Until, in production, the service gets disconnected and throws an exception. And our test says “B-b-b-but, I’m still passing!”. The Truthiness Is Out There Mocking is an example of obstructing the real behavior by prescriptive tests, but it’s just an example. It can happen when we test a few cases, but don’t cover others. Here’s one of my favorite examples. What’s the hidden test case here? public int Increment() { return counter++; } Tests are code examples. They work to the extent of our imagination of “what can go wrong?” Like overflow, in the last case. Much like differentiation, truthiness can not be examined by itself. The example works, but it hides a case we need another test for. We need to look at the collection of test cases, and see if we covered everything. The solution doesn’t have to be a test of the same type. We can have a unit test for the service happy path, and an end-to-end test to cover the disconnection case. Of course, if you can think of other cases in the first place, why not unit test them? So to level up your truthiness:Ideate. Before writing the tests, and if you’re doing TDD – the code, write a list of test cases. On a notebook, a whiteboard, or my favorite: empty tests. Reflect. Often, when we write a few test, new test cases come to mind. Having a visual image of the code can help think of other cases. Beware the mock. We use mocks to prescribe dependency behavior in specific cases. Every mock you make can be a potential failure point, so think about other cases to mock. Review. Do it in pairs. Four eyes are better than two.Aim for higher truthiness. Higher trust in your tests will help you sleep better.Reference: Test Attribute #8 – Truthiness from our JCG partner Gil Zilberfeld at the Geek Out of Water blog....

Introducing Hadoop Development Tools

Few days back Apache Hadoop Development Tools a.k.a. HDT was released.  The projects aims at bringing plugins in eclipse to simplify development on Hadoop platform. This blog aims to provide an overview of few great features of HDT. Single Endpoint The project can act as a single endpoint for your HDFS, Zookeeper and MR Cluster. You can connect to your HDFS/Zookeeper instance and browse or add more data. You can submit jobs to MR cluster and see status of all the running jobs.  Map Reduce Project/Templates There is support for creating Hadoop project. Just point is to the location of Hadoop, it will pull down all the required libs and generate a eclipse project. That’s not all, you could generate Mapper/Reducer/Partitioner/Driver based on the org.apache.hadoop.mapreduce API.Multiple Version Support Currently the projects supports two versions of the Hadoop platform viz 1.1 and 2.2. The project is based on eclipse plugin architecture and can  possibly support other versions like 0.23, CDH4 etc in next releases. Eclipse Support The project works with eclipse 3.6 and above. It has been tested on Indigo and Juno, and can work on Kepler as well. The projects aims to simplify the Hadoop platform for developers. It is still young and would require support from community to flourish. To learn more or to get involved with the project check the project page or the mailing lists.Reference: Introducing Hadoop Development Tools from our JCG partner Rahul Sharma at the The road so far… blog blog....

Getting A List of Available Cryptographic Algorithms

How do you learn what cryptographic algorithms are available to you? The Java spec names several required ciphers, digests, etc., but a provider often offers more than that. Fortunately this is easy to learn what’s available on our system.           public class ListAlgorithms { public static void main(String[] args) { // Security.addProvider(new // org.bouncycastle.jce.provider.BouncyCastleProvider());// get a list of services and their respective providers. final Map<String, List<Provider>> services = new TreeMap<>();for (Provider provider : Security.getProviders()) { for (Provider.Service service : provider.getServices()) { if (services.containsKey(service.getType())) { final List<Provider> providers = services.get(service .getType()); if (!providers.contains(provider)) { providers.add(provider); } } else { final List<Provider> providers = new ArrayList<>(); providers.add(provider); services.put(service.getType(), providers); } } }// now get a list of algorithms and their respective providers for (String type : services.keySet()) { final Map<String, List<Provider>> algs = new TreeMap<>(); for (Provider provider : Security.getProviders()) { for (Provider.Service service : provider.getServices()) { if (service.getType().equals(type)) { final String algorithm = service.getAlgorithm(); if (algs.containsKey(algorithm)) { final List<Provider> providers = algs .get(algorithm); if (!providers.contains(provider)) { providers.add(provider); } } else { final List<Provider> providers = new ArrayList<>(); providers.add(provider); algs.put(algorithm, providers); } } } }// write the results to standard out. System.out.printf("%20s : %s\n", "", type); for (String algorithm : algs.keySet()) { System.out.printf("%-20s : %s\n", algorithm, Arrays.toString(algs.get(algorithm).toArray())); } System.out.println(); } } } The system administrator can override the standard crypto libraries. In practice it’s safest to always load your own crypto library and either register it manually, as above, or better yet pass it as an optional parameter when creating new objects. Algorithms There are a few dozen standard algorithms. The ones we’re most likely to be interested in are: Symmetric CipherKeyGenerator – creates symmetric key SecretKeyFactor – converts between symmetric keys and raw bytes Cipher – encryption cipher AlgorithmParameters – algorithm parameters AlgorithmParameterGernerator – algorithm parametersAsymmetric CipherKeyPairGenerator – creates public/private keys KeyFactor – converts between keypairs and raw bytes Cipher – encryption cipher Signature – digital signatures AlgorithmParameters – algorithm parameters AlgorithmParameterGernerator – algorithm parametersDigestsMessageDigest – digest (MD5, SHA1, etc.) Mac – HMAC. Like a message digest but requires an encryption key as well so it can’t be forged by attackerCertificates and KeyStoresKeyStore – JKS, PKCS, etc. CertStore – like keystore but only stores certs. CertificateFactory – converts between digital certificates and raw bytes.It is critical to remember that most algorithms are provided for backward compatibility and should not be used for in greenfield development. As I write this the generally accepted advice is:Use a variant of AES. Only use AES-ECB if you know with absolute certainty that you will never encrypt more than one blocksize (16 bytes) of data. Always use a good random IV even if you’re using AES-CBC. Do not use the same IV or an easily predicted one. Do not use less than 2048 bits in an asymmetric key. Use SHA-256 or better. MD-5 is considered broken, SHA-1 will be considered broken in the near future. Use PBKDF2WithHmacSHA1 to create AES key from passwords/passphrases. (See also Creating Password-Based Encryption Keys.)Some people might want to use one of the other AES-candidate ciphers (e.g., twofish). These ciphers are probably safe but you might run into problems if you’re sharing files with other parties since they’re not in the required cipher suite. Beware US Export Restrictions Finally it’s important to remember that the standard Java distribution is crippled due to US export restrictions. You can get full functionality by installing a standard US-only file on your system but it’s hard if not impossible for developers to verify this has been done. In practice many if not most people use a third-party cryptographic library like BouncyCastle. Many inexperienced developers forget about this and unintentionally use crippled functionality.Reference: Getting A List of Available Cryptographic Algorithms from our JCG partner Bear Giles at the Invariant Properties blog....

A new proximity query for Lucene, using automatons

The simplest Apache Lucene query, TermQuery, matches any document that contains the specified term, regardless of where the term occurs inside each document. Using BooleanQuery you can combine multiple TermQuerys, with full control over which terms are optional (SHOULD) and which are required (MUST) or required not to be present (MUST_NOT), but still the matching ignores the relative positions of each term inside the document. Sometimes you do care about the positions of the terms, and for such cases Lucene has various so-called proximity queries.     The simplest proximity query is PhraseQuery, to match a specific sequence of tokens such as “Barack Obama”. Seen as a graph, a PhraseQuery is a simple linear chain:By default the phrase must precisely match, but if you set a non-zero slop factor, a document can still match even when the tokens are not exactly in sequence, as long as the edit distance is within the specified slop. For example, “Barack Obama” with a slop factor of 1 will also match a document containing “Barack Hussein Obama” or “Barack H. Obama”. It looks like this graph:Now there are multiple paths through the graph, including an any (*) transition to match an arbitrary token. (Note: while the graph cannot properly express it, this query would also match a document that had the tokens Barack and Obama on top of one another, at the same position, which is a little bit strange!) In general, proximity queries are more costly on both CPU and IO resources, since they must load, decode and visit another dimension (positions) for each potential document hit. That said, for exact (no slop) matches, using common-grams, shingles and ngrams to index additional “proximity terms” in the index can provide enormous performance improvements in some cases, at the expense of an increase in index size. MultiPhraseQuery is another proximity query. It generalizes PhraseQuery by allowing more than one token at each position, for example:This matches any document containing either domain name system or domain name service. MultiPhraseQuery also accepts a slop factor to allow for non-precise matches. Finally, span queries (e.g.SpanNearQuery, SpanFirstQuery) go even further, allowing you to build up a complex compound query based on positions where each clause matched. What makes them unique is that you can arbitrarily nest them. For example, you could first build a SpanNearQuery matching Barack Obama with slop=1, then another one matching George Bush, and then make another SpanNearQuery, containing both of those as sub-clauses, matching if they appear within 10 terms of one another. Introducing TermAutomatonQuery As of Lucene 4.10 there will be a new proximity query to further generalize on MultiPhraseQuery and the span queries: it allows you to directly build an arbitrary automaton expressing how the terms must occur in sequence, including any transitions to handle slop. Here’s an example:This is a very expert query, allowing you fine control over exactly what sequence of tokens constitutes a match. You build the automaton state-by-state and transition-by-transition, including explicitly adding any transitions (sorry, no QueryParser support yet, patches welcome!). Once that’s done, the query determinizes the automaton and then uses the same infrastructure (e.g.CompiledAutomaton) that queries like FuzzyQuery use for fast term matching, but applied to term positions instead of term bytes. The query is naively scored like a phrase query, which may not be ideal in some cases. In addition to this new query there is also a simple utility class, TokenStreamToTermAutomatonQuery, that provides loss-less translation of any graph TokenStream into the equivalent TermAutomatonQuery. This is powerful because it means even arbitrary token stream graphs will be correctly represented at search time, preserving the PositionLengthAttribute that some tokenizers now set. While this means you can finally correctly apply arbitrary token stream graph synonyms at query-time, because the index still does not store PositionLengthAttribute, index-time synonyms are still not fully correct. That said, it would be simple to build a TokenFilter that writes the position length into a payload, and then to extend the new TermAutomatonQuery to read from the payload and apply that length during matching (patches welcome!). The query is likely quite slow, because it assumes every term is optional; in many cases it would be easy to determine required terms (e.g. Obama in the above example) and optimize such cases. In the case where the query was derived from a token stream, so that it has no cycles and does not use any transitions, it may be faster to enumerate all phrases accepted by the automaton (Lucene already has the getFiniteStrings API to do this for any automaton) and construct a boolean query from those phrase queries. This would match the same set of documents, also correctly preserving PositionLengthAttribute, but would assign different scores. The code is very new and there are surely some exciting bugs! But it should be a nice start for any application that needs precise control over where terms occur inside documents.Reference: A new proximity query for Lucene, using automatons from our JCG partner Michael Mc Candless at the Changing Bits blog....

Spring Batch as Wildfly Module

For a long time, the Java EE specification was lacking a Batch Processing API. Today, this is an essential necessity for enterprise applications. This was finally fixed with the JSR-352 Batch Applications for the Java Platform now available in Java EE 7. The JSR-352 got it’s inspiration from the Spring Batch counterpart. Both cover the same concepts, although the resulting API’s are a bit different. Since the Spring team also collaborated in the JSR-352, it was only a matter of time for them to provide an implementation based on Spring Batch. The latest major version of Spring Batch (version 3), now supports the JSR-352. I’m a Spring Batch user for many years and I’ve always enjoyed that the technology had a interesting set of built-in readers and writers. These allowed you to perform the most common operations required by batch processing. Do you need to read data from a database? You could use JdbcCursorItemReader, how about writing data in a fixed format? Use FlatFileItemWriter, and so on. Unfortunately, JSR-352 implementations do not have the amount of readers and writers available in Spring Batch. We have to remember that JSR-352 is very recent and didn’t have time to catch up. jBeret, the Wildfly implementation for JSR-352 already provides a few custom readers and writers. What’s the point? I was hoping that with the latest release, all the readers and writers from the original Spring Batch would be available as well. This is not the case yet, since it would take a lot of work, but there are plans to make them available in future versions. This would allow us to migrate native Spring Batch applications into JSR-352. We still have the issue of the implementation vendor lock-in, but it may be interesting in some cases. Motivation I’m one of the main test contributors for the Java EE Samples in the JSR-352 specification. I wanted to find out if the tests I’ve implemented have the same behaviour using the Spring Batch implementation. How can we do that? Code I think this exercise is not only interesting because of the original motivation, but it’s also useful to learn about modules and class loading on Wildfly. First we need to decide how are we going to deploy the needed Spring Batch dependencies. We could deploy them directly with the application, or use a Wildfly module. Modules have the advantage to be bundled directly into the application server and can be reused by all deployed applications. Adding Wildfly Module with Maven With a bit of work it’s possible to add the module automatically with the Wildfly Maven Plugin and the CLI (command line). Let’s start to create two files that represent the CLI commands that we need to create and remove the module: wildfly-add-spring-batch.cli wildfly-add-spring-batch.cli # Connect to Wildfly instance connect# Create Spring Batch Module # If the module already exists, Wildfly will output a message saying that the module already exists and the script exits. module add \ --name=org.springframework.batch \ --dependencies=javax.api,javaee.api \ --resources=${wildfly.module.classpath} The module --name is important. We’re going to need it to reference it in our application. The --resources is a pain, since you need to indicate a full classpath to all the required module dependencies, but we’re generating the paths in the next few steps. wildfly-remove-spring-batch.cli wildfly-remove-spring-batch.cli # Connect to Wildfly instance connect# Remove Oracle JDBC Driver Module module remove --name=org.springframework.batch Note wildfly.module.classpath. This property will hold the complete classpath for the required Spring Batch dependencies. We can generate it with Maven Dependency plugin: pom-maven-dependency-plugin.xml<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-dependency-plugin</artifactId> <version>${version.plugin.dependency}</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>build-classpath</goal> </goals> <configuration> <outputProperty>wildfly.module.classpath</outputProperty> <pathSeparator>:</pathSeparator> <excludeGroupIds>javax</excludeGroupIds> <excludeScope>test</excludeScope> <includeScope>provided</includeScope> </configuration> </execution> </executions> </plugin> This is going to pick all dependencies (including transitive), exclude javax (since they are already present in Wildfly) and exclude test scope dependencies. We need the following dependencies for Spring Batch: pom-dependencies.xml <!-- Needed for Wildfly module --> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-core</artifactId> <version>3.0.0.RELEASE</version> <scope>provided</scope> </dependency><dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>4.0.5.RELEASE</version> <scope>provided</scope> </dependency><dependency> <groupId>commons-dbcp</groupId> <artifactId>commons-dbcp</artifactId> <version>1.4</version> <scope>provided</scope> </dependency><dependency> <groupId>org.hsqldb</groupId> <artifactId>hsqldb</artifactId> <version>2.3.2</version> <scope>provided</scope> </dependency> Now, we need to replace the property in the file. Let’s use Maven Resources plugin: pom-maven-resources-plugin.xml <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <version>${version.plugin.resources}</version> <executions> <execution> <id>copy-resources</id> <phase>process-resources</phase> <goals> <goal>copy-resources</goal> </goals> <configuration> <outputDirectory>${basedir}/target/scripts</outputDirectory> <resources> <resource> <directory>src/main/resources/scripts</directory> <filtering>true</filtering> </resource> </resources> </configuration> </execution> </executions> </plugin> This will filter the configured files and replace the property wildfly.module.classpath with the value we generated previously. This is a classpath pointing to the dependencies in your local Maven repository. Now with Wildfly Maven Plugin we can execute this script (you need to have Wildfly running): pom-maven-wildfly-plugin.xml <plugin> <groupId>org.wildfly.plugins</groupId> <artifactId>wildfly-maven-plugin</artifactId> <version>${version.plugin.wildfly}</version> <configuration> <skip>false</skip> <executeCommands> <batch>false</batch> <scripts> <!--suppress MavenModelInspection --> <script>target/scripts/${cli.file}</script> </scripts> </executeCommands> </configuration> </plugin> And these profiles: pom-profiles.xml <profiles> <profile> <id>install-spring-batch</id> <properties> <cli.file>wildfly-add-spring-batch.cli</cli.file> </properties> </profile><profile> <id>remove-spring-batch</id> <properties> <cli.file>wildfly-remove-spring-batch.cli</cli.file> </properties> </profile> </profiles> (For the full pom.xml contents, check here) We can add the module by executing: mvn process-resources wildfly:execute-commands -P install-spring-batch. Or remove the module by executing: mvn wildfly:execute-commands -P remove-spring-batch. This strategy works for any module that you want to create into Wildfly. Think about adding a JDBC driver. You usually use a module to add it into the server, but all the documentation I’ve found about this is always a manual process. This works great for CI builds, so you can have everything you need to setup your environment. Use Spring-Batch Ok, I have my module there, but how can I instruct Wildfly to use it instead of jBeret? We need to add a the following file in META-INF folder of our application: jboss-deployment-structure.xml jboss-deployment-structure.xml <?xml version="1.0" encoding="UTF-8"?> <jboss-deployment-structure> <deployment> <exclusions> <module name="org.wildfly.jberet"/> <module name="org.jberet.jberet-core"/> </exclusions><dependencies> <module name="org.springframework.batch" services="import" meta-inf="import"/> </dependencies> </deployment> </jboss-deployment-structure> Since the JSR-352 uses a Service Loader to load the implementation, the only possible outcome would be to load the service specified in org.springframework.batch module. Your batch code will now run with the Spring Batch implementation. Testing The github repository code, has Arquillian sample tests that demonstrate the behaviour. Check the Resources section below. Resources You can clone a full working copy from my github repository. You can find instructions there to deploy it. Wildfly – Spring Batch Since I may modify the code in the future, you can download the original source of this post from the release 1.0. In alternative, clone the repo, and checkout the tag from release 1.0 with the following command: git checkout 1.0. Future I’ve still need to apply this to the Java EE Samples. It’s on my TODO list.Reference: Spring Batch as Wildfly Module from our JCG partner Roberto Cortez at the Roberto Cortez Java Blog blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: