Featured FREE Whitepapers

What's New Here?


Java Code Geeks and ZK are giving away FREE ZK Charts Licenses (worth over $2700)!

Struggling with the creation of fancy charts for your Java based UI? Then we have something especially for you! We are partnering with ZK, creator of cool Java tools, and we are running a contest giving away FREE perpetual licenses for the kick-ass ZK Charts library. ZK Charts is an interactive charting library for your Java Web Applications. ZK Charts is built upon and works with the ZK Framework which is one of the most established and recognizable Java Web Frameworks on the market. Since its first release in 2005 ZK has amassed well over 1.5 million downloads and is deployed in a large number of Fortune 500 companies including Barclays, eBay, Roche, Deutsche Bank, Sony and Audi.! ZK Charts provides a comprehensive Java API for controlling your charts from the server-side with the communication between the client and server being taken care of transparently by the library. Additionally, ZK Charts is powered by models which makes creating and updating charts intuitive and effortless for developers. ZK’s model based system is used throughout our components to provide the best development experience available to all our users.ZK Charts comes with an extensive chart set, including but not limited to:Line charts Area charts Column & bar charts Pie charts Scatter charts Bubble charts Master-detail chart Angular gauge chart Dual axes guage chart Spider web chart Polar chart Waterfall chart Funnel chart and many more, along with the ability to combine themEnter the contest now to win your very own FREE ZK Charts Perpetual License. There will be a total of 3 winners! In addition, we will send you free tips and the latest news from the Java community to master your technical knowledge (you can unsubscribe at any time). In order to increase your chances of winning, don’t forget to refer as much of your friends as possible! You will get 3 more entries for every friend you refer, that is 3 times more chances! Make sure to use your lucky URL to spread the word! You can share it on your social media channels, or even mention it on a blog post if you are a blogger! Good luck and may the force be with you! UPDATE: The giveaway has ended! Here is the list of the lucky winners! (emails hidden for privacy)ph…on@gmail.com ja…nn@gmail.com se…as@gmail.comWe like to thank you all for participating to this giveaway. Till next time, Keep up the good work!...

3 Simple Guidelines to Rule Development, Design and Traceability

(Article guest authored together with John Hurlocker, Senior Middleware Consultant at Red Hat in North America) In this tips and tricks article we present some background and guidelines for the design cycle encountered when one is working with rules projects. This article is not the only standard or all encompassing source of how each and every rules and events project will evolve over time. What it is going to do is provide you with the basics as we have encountered them in many projects in real life organizations. They are meant to give you some confidence as you embark on your very own rules and events adventure within the world of JBoss BRMS. We will discuss some of the requirements phase around rules development, touch on a few of the design choices that will be encountered and elaborate on the options available for including requirements traceability within the projects. 1. Requirements A rule author will analyze project requirements to determine the number of rules that will need to be created and also works with the requirements team so they can provide answers to any questions that might arise. Analyzing rules requirements is the phase where a look at the following questions takes place:Are there any WHEN or THEN conditions that are unclear when reviewing the requirements? Are some of these rules data validations? Can multiple requirements be combined into one rule?By spending some pre-development time examining and validating the project requirements you will be able to narrow the scope of the work to be done in your development cycles. These questions have been dealt with in previous articles in the tips and tricks. 2. Design In the design phases an enterprise rule administrator will need to work with the organization and ask some of the following questions: Will the organization need to host a central rules repository or would that not be beneficial?  Who owns these rules and is responsible for updating and releasing new versions? Are there common rules that can be reused between groups?A central repository is one JBoss BRMS server available for the entire organization to author, store, and build rules. It promotes rule reuse, is easier to manage and maintain instead of deploying multiple repositories in an organization. If a set of rules is going to be shared with other groups then one of the groups will need to take ownership and will be responsible for updating and releasing new versions. The rule author will need to work with the application team(s) to determine what rule format or formats will be used and which tool will be used to author rules. Some of the questions to be dealt with are: Should rules be developed in the BRMS Dashboard or through JBoss Developer Studio (JBDS)? What are your rule authors more comfortable with? Who will maintain the rules in the future?Java developers, business analystsDo the requirements work better in one format vs. another? e.g.  web based data table, business guided rule, DSLWhat type of testing is required?  JUnit and BRMS test scenarios?These topics have been laid out in previous articles, please refer to them for a deeper discussion. 3. Traceability  Once the rules and events are being implemented it is vital to have some sort of requirements traceability attached to the rules that links them to the originating requirements. With JBoss BRMS rule authors can set meta data on the rules for traceability to requirement(s), for instance:Associated requirements can be set on rules in the description section. Associated requirements can also be set as an external link on the rule meta data.  Reports can be generated by pulling meta data information from the repository.In a future article we will dig deeper into how you can use meta data fields within your rules implementation to trace your requirements and extract this information to generate documentation around these requirements.Reference: 3 Simple Guidelines to Rule Development, Design and Traceability from our JCG partner Eric Schabell at the Eric Schabell’s blog blog....

Hibernate application-level repeatable reads

Introduction In my previous post I described how application-level transactions offer a suitable concurrency control mechanism for long conversations. All entities are loaded within the context of a Hibernate Session, acting as a transactional write-behind cache. A Hibernate persistence context can hold one and only one reference of a given entity. The first level cache guarantees session-level repeatable reads. If the conversation spans over multiple requests we can have application-level repeatable reads. Long conversations are inherently stateful so we can opt for detached objects or long persistence contexts. But application-level repeatable reads require an application-level concurrency control strategy such as optimistic locking. The catch But this behavior may prove unexpected at times. If your Hibernate Session has already loaded a given entity then any successive entity query (JPQL/HQL) is going to return the very same object reference (disregarding the current loaded database snapshot):In this example we can see that the first level cache prevents overwriting an already loaded entity. To prove this behavior, I came up with the following test case: final ExecutorService executorService = Executors.newSingleThreadExecutor();doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { Product product = new Product(); product.setId(1L); product.setQuantity(7L); session.persist(product); return null; } });doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session session) { final Product product = (Product) session.get(Product.class, 1L); try { executorService.submit(new Callable<Void>() { @Override public Void call() throws Exception { return doInTransaction(new TransactionCallable<Void>() { @Override public Void execute(Session _session) { Product otherThreadProduct = (Product) _session.get(Product.class, 1L); assertNotSame(product, otherThreadProduct); otherThreadProduct.setQuantity(6L); return null; } }); } }).get(); Product reloadedProduct = (Product) session.createQuery("from Product").uniqueResult(); assertEquals(7L, reloadedProduct.getQuantity()); assertEquals(6L, ((Number) session.createSQLQuery("select quantity from Product where id = :id").setParameter("id", product.getId()).uniqueResult()).longValue()); } catch (Exception e) { fail(e.getMessage()); } return null; } }); This test case clearly illustrates the differences between entity queries and SQL projections. While SQL query projections always load the latest database state, entity query results are managed by the first level cache, ensuring session-level repeatable reads. Workaround 1: If your use case demands reloading the latest database entity state then you can simply refresh the entity in question.Workaround 2: If you want an entity to be disassociated from the Hibernate first level cache you can easily evict it, so the next entity query can use the latest database entity value. Beyond prejudice Hibernate is a means, not a goal. A data access layer requires both reads and writes and neither plain-old JDBC nor Hibernate are one-size-fits-all solutions. A data knowledge stack is much more appropriate for getting the most of your data read queries and write DML statements. While native SQL remains the de facto relational data reading technique, Hibernate excels in writing data. Hibernate is a persistence framework and you should never forget that. Loading entities makes sense if you plan on propagating changes back to the database. You don’t need to load entities for displaying read-only views, an SQL projection being a much better alternative in this case. Session-level repeatable reads prevent lost updates in concurrent writes scenarios, so there’s a good reason why entities don’t get refreshed automatically. Maybe we’ve chosen to manually flush dirty properties and an automated entity refresh might overwrite synchronized pending changes. Designing the data access patterns is not a trivial task to do and a solid integration testing foundation is worth investing in. To avoid any unknown behaviors, I strongly advise you to validate all automatically generated SQL statements to prove their effectiveness and efficiency.Code available on GitHub.Reference: Hibernate application-level repeatable reads from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog....

Explore Git Internals with the JGit API

Have you ever wondered how a commit and its content is stored in Git? Well, I have, and I had some spare time over the last rainy weekend, so I did a little research. Because I feel more at home with Java than with Bash, I used JGit and a couple of learning tests to explore the Git internals of commits. Here are my findings: Git – an Object Database Git at the core is a simple content-addressable data store. This means that you can insert any kind of content into it and it will return a key that you can use to retrieve the data again at a later point in time. In the case of Git, the key is the 20 byte SHA-1 hash that is computed from the content. The content is also referred to as an object in Git terminology and consequently the data store is also called an object database. Let’s see how JGit can be used to store and retrieve content. Blobs In JGit, the ObjectInserter is used to store content into the object database. It can be seen as the rough equivalent to git hash-object in Git. With its insert() method you can write an object to the data store whereas its idFor() methods only compute the SHA-1 hash of the given bytes. Hence the code to store a string looks like this: ObjectInserter objectInserter = repository.newObjectInserter(); byte[] bytes = "Hello World!".getBytes( "utf-8" ); ObjectId blobId = objectInserter.insert( Constants.OBJ_BLOB, bytes ); objectInserter.flush(); All code examples assume that the repository varaible points to an empty repository that was created outside of the snippet. The first parameter denotes the object type of the object to be inserted, a blob type in this case. There are further object types as we will learn later. The blob type is used to store arbitrary content. The payload must be given in the second parameter, as a byte array in this case. An overloaded method that accepts an InputStream is also available. And finally, the ObjectInserter needs to be flushed to make the changes visible to others accessing the repository. The insert() method returns the SHA-1 hash that is computed from the type, the content length and the content bytes. In JGit, though, a SHA-1 hash is represented through the ObjectId class, an immutable data structure that can be converted to and from bytes, ints, and strings. Now you can use the returned blobId to retrieve the content back and thus ensure that the above code actually wrote the content. ObjectReader objectReader = repository.newObjectReader(); ObjectLoader objectLoader = objectReader.open( blobId ); int type = objectLoader.getType(); // Constants.OBJ_BLOB byte[] bytes = objectLoader.getBytes(); String helloWorld = new String( bytes, "utf-8" ) // Hello World! The ObjectReader’s open() method returns an ObjectLoader that can be used to access the object identified by the given object ID. With the help of an ObjectLoader you can get an object’s type, its size and of course its content as a byte array or stream. To verify that the object written by JGit is compatible with native Git you can retrieve its content with git cat-file. $ git cat-file -p c57eff55ebc0c54973903af5f72bac72762cf4f4 Hello World! git cat-file -t c57eff55ebc0c54973903af5f72bac72762cf4f4 blob If you look inside the .git/objects directory of the repository, you’ll find a directory named ‘c5′ with a file named ‘7eff55ebc0c54973903af5f72bac72762cf4f4′ in it. This is how the content is stored initially: as a single file per object, named with the SHA-1 hash of the content. The subdirectory is named with the first two characters of the SHA-1 and the filename consists of the remaining characters. Now that you can store the content of a file, the next step is to store its name. And probably also more than just one file, since a commit usually consists of a group of files. To hold this kind of information, Git uses so called tree objects. Tree Objects A tree object can be seen as a simplified file system structure that contains information about files and directories. It contains any number of tree entries. Each entry has a path name, a file mode and points to either the content of a file (a blob object) or another (sub) tree object if it represents a directory. The pointer of course is a SHA-1 hash of either the blob object or the tree object. To start with, you can create a tree that holds a single entry for a file named ‘hello-world.txt’ that points to the above stored ‘Hello World!’ content. TreeFormatter treeFormatter = new TreeFormatter(); treeFormatter.append( "hello-world.txt", FileMode.REGULAR_FILE, blobId ); ObjectId treeId = objectInserter.insert( treeFormatter ); objectInserter.flush(); The TreeFormatter is used here to construct an in-memory tree object. By calling append() an entry is added with the given path name, mode and the ID under which its content is stored. Fundamentally, you are free to chose any path name. However, Git expects the path name to be relative to the working directory without a leading ‘/’. The file mode used here indicates a normal file. Other modes are EXECUTABLE_FILE, which means it’s an executable file, and SYMLINK, which specifies a symbolic link. For directory entries, the file mode is always TREE. Again, you will need an ObjectInserter. One of its overloaded insert() methods accepts a TreeFormatter and writes it to the object database. You can now use a TreeWalk to retrieve and examine the tree object: TreeWalk treeWalk = new TreeWalk( repository ); treeWalk.addTree( treeId ); treeWalk.next(); String filename = treeWalk.getPathString(); // hello-world.txt Actually, a TreeWalk is meant to iterate over the added trees and their subtrees. But since we know that there is exactly one entry, a single call to next() is sufficient. If you look at the just written tree object with native Git you will see the following: $ git cat-file -p 44d52a975c793e5a4115e315b8d89369e2919e51 100644 blob c57eff55ebc0c54973903af5f72bac72762cf4f4 hello-world.txt Now that you have the necessary ingredients for a commit, let’s create the commit object itself. Commit Objects A commit object references the files (through the tree object) that constitute the commit along with some meta data. In detail a commit consists of:a pointer to the tree object pointers to zero or more parent commits (more on that later) a commit message and an author and committerSince a commit object is just another object in the object database, it is also sealed with the SHA-1 hash that was computed over its content. To form a commit object, JGit offers the CommitBuilder utility class. CommitBuilder commitBuilder = new CommitBuilder(); commitBuilder.setTreeId( treeId ); commitBuilder.setMessage( "My first commit!" ); PersonIdent person = new PersonIdent( "me", "me@example.com" ); commitBuilder.setAuthor( person ); commitBuilder.setCommitter( person ); ObjectInserter objectInserter = repository.newObjectInserter(); ObjectId commitId = objectInserter.insert( commitBuilder ); objectInserter.flush(); Using it is straightforward, it has setter methods for all the attributes of a commit. The author and committer are represented through the PersonIdent class which holds the name, email, timestamp and time zone. The constructor used here applies the given name and email and takes the current time and time zone. And the rest should be familiar already: an ObjectInserter is used to actually write the commit object and returns the commit ID. To retrieve the commit object from the repository, you can again use the ObjectReader: ObjectReader objectReader = repository.newObjectReader(); ObjectLoader objectLoader = objectReader.open( commitId ); RevCommit commit = RevCommit.parse( objectLoader.getBytes() ); The resulting RevCommit represents a commit with the same attributes that were specified in the CommitBuilder. And once again – to double-check – the output of git cat-file: $ git cat-file -p 783341299c95ddda51e6b2393c16deaf0c92d5a0 tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 author me <me@example.com> 1412872859 +0200 committer me <me@example.com> 1412872859 +0200My first commit! Parents The chain of parents form the history of a Git repository and model a directed acyclic graph. This means that the commits ‘follow’ one direction A commit can have zero or more parents. The first commit in a repository does not have a parent (aka root commit). The second commit in turn has the first as its parent, and so on. It is perfectly legal to create more than one root commit. If you use git checkout --orphan new_branch a new orphan branch will be created and switched to. The first commit made on this branch will have no parents and will form the root of a new history that is disconnected from all other commits.If you start branching and eventually merge the divergent lines of changes, this usually results in a merge commit. And such a commit has the head commits of the divergent branches as its parents. In order to construct a parented commit, the ID of the parent commit needs to be specified in the CommitBuilder. commitBuilder.setParents( parentId ); A RevCommit class, which represents a commit within the repository, can also be queried about its parents. Its getParents() and getParent(int) methods return all or the nth parent RevCommit. Be warned however, that though the methods return RevCommits these are not fully resolved. While their ID attribute is set, all other attributes (fullMessage, author, committer, etc.) are not. Thus, an attempt to call parent.getFullMessage() for example will throw a NullPointerException. In order to actually use the parent commit you need to either retrieve a full RevCommit by means of the ObjectReader like outlined above or use a RevWalk to load and parse the commit header: RevWalk revWalk = new RevWalk( repository ); revWalk.parseHeaders( parentCommit ); All in all, keep in mind to treat the returned parent commits as if they were ObjectIds instead of RevCommits. More on Tree Objects If you are to store files in sub-directories you need to construct the sub-trees yourself. Say you want to store the content of a file ‘file.txt’ in folder ‘folder’. First, create and store a TreeFormatter for the subtree, the one that has an entry for the file: TreeFormatter subtreeFormatter = new TreeFormatter(); subtreeFormatter.append( "file.txt", FileMode.REGULAR_FILE, blobId ); ObjectId subtreeId = objectInserter.insert( subtreeFormatter ); And then, create and store a TreeFormatter with an entry that denotes the folder and points to the just created subtree. TreeFormatter treeFormatter = new TreeFormatter(); treeFormatter.append( "folder", FileMode.TREE, subtreeId ); ObjectId treeId = objectInserter.insert( treeFormatter );The file mode of the entry is TREE to indicate a directory and its ID points to the subtree that holds the file-entry. The returned treeId is the one that would be passed to the CommitBuilder. Git requires a certain sort order for entries in tree objects. The ‘Git Data Formats’ document that I found here states that:Tree entries are sorted by the byte sequence that comprises the entry name. However, for the purposes of the sort comparison, entries for tree objects are compared as if the entry name byte sequence has a trailing ASCII ‘/’ (0x2f).To read the contents of the tree object you can again use a TreeWalk. But this time, you need to tell it to recurse into subtrees if you which to visit all entries. And also, don’t forget to set the postOrderTraversal to true if you whish to see entries that point to a tree. They would be skipped otherwise. The whole TreeWalk loop will look like this in the end:   TreeWalk treeWalk = new TreeWalk( repository ); treeWalk.addTree( treeId ); treeWalk.setRecursive( true ); treeWalk.setPostOrderTraversal( true ); while( treeWalk.next() ) { int fileMode = Integer.parseInt( treeWalk.getFileMode( 0 ).toString() ); String objectId = treeWalk.getObjectId( 0 ).name(); String path = treeWalk.getPathString(); System.out.println( String.format( "%06d %s %s", fileMode, objectId, path ) ); } …and will lead to this output: 100644 6b584e8ece562ebffc15d38808cd6b98fc3d97ea folder/file.txt 040000 541550ddcf8a29bcd80b0800a142a7d47890cfd6 folder Although I find the API not very intuitive it gets the job done and reveals all the details of the tree object. Concluding Git Internals No doubt, that for common use cases the high-level Add- and CommitCommands are the recommended way to commit files to the repository. Still, I found it worthwhile digging in to the deeper levels of JGit and Git and hope you did so, too. And in the – admittedly less common – case that you need to commit files to a repository without a working directory and/or index, the information provided here might help. If you like to try out the examples listed here for yourself, I recommend to setup JGit with access to its sources and JavaDoc so that you have meaningful context information, content assist, debug-sources, etc.The complete source code is hosted here: https://gist.github.com/rherrmann/02d8d4fe81bb60d9049eFor brevity, the samples shown here omit the code to release allocated resources. Please refer to the complete source code to get all the details.Reference: Explore Git Internals with the JGit API from our JCG partner Rudiger Herrmann at the Code Affine blog....

Installing Drupal on FreeBSD

Drupal ports have been available on FreeBSD since quite a long time, and binary packages can be installed very quickly. However, manual setup is required to connect Drupal to the database and have Apache serve the Drupal website. In this post I’ll describe the setup procedure of Drupal 7 on FreeBSD 10.0. The process will not be very different if different versions of Drupal or FreeBSD are used. Installing Drupal Drupal ports are available on FreeBSD and, in fact, multiple versions are available:     # pkg search drupal drupal6-6.31 [...snip...] drupal7-7.31 [...snip...] Unless there is a compelling reason not to do so, install the latest one: # pkg install drupal7 To successfully run Drupal, you need:The Apache HTTP Server. PHP. A supported database server (PostgreSQL or MySQL).The Drupal port, however, does not enforce these dependencies, so that you have to satisfy them manually. Installing the Apache HTTP Server Unless there is a compelling reason not to do so, install the latest available Apache port (apache24 at the time of writing): # pkg install apache24 Once the port is installed, enable the corresponding service adding the following line to /etc/rc.conf: apache24_enable="YES" Installing the Database Drupal supports both PostgreSQL and MySQL but the Drupal port does not install any, by default, although it installs the MySQL client utilities. In this post MySQL will be used but if you prefer using PostgreSQL instead, just skip this section and read this article instead. Since the Drupal port by default defines the MYSQL option, when you install the binary package using pkg you’ll also get a MySQL client port, such as what I got at the time of writing: mysql55-client-5.5.40 As a consequence, you have to install the matching mysqlXX-server port: # pkg install mysql55-server-5.5.40 If you try to install a different version (at the time of writing mysql56 is available), you may be requested to remove Drupal itself because of the inter-dependencies between the client and server MySQL packages. Once MySQL is installed, enable the corresponding service adding the following line to /etc/rc.conf: mysql_enable="YES" Installing PHP The installation of PHP is taken care of by the Drupal port. However, the PHP module for the Apache HTTP Server is not installed and must be installed manually. Make sure you install the PHP module that corresponds with the PHP version installed by the Drupal port. At the time of writing, the following modules are available: # pkg search mod_php mod_php5-5.4.33_1,1 mod_php55-5.5.17_1 mod_php56-5.6.1 Since the port depends on php5, then mod_php5-5.4.33_1,1 should be installed: # pkg install mod_php5-5.4.33_1,1 The port takes care of modifying the Apache HTTP Server configuration file so that the PHP module is loaded. If you did not install the packages in order suggested by this post, then you may have lost that piece of configuration. In any case, make sure a line similar to the following is present in /usr/local/etc/apache24/httpd.conf: LoadModule php5_module        libexec/apache24/libphp5.so Installing drush drush is an optional package offering an amazingly good command line interface to perform many Drupal-related management tasks: drush could even be used to install Drupal, but this topic is not covered by this post, since I prefer relying on a port tested specifically on FreeBSD. However, once you have tested that a specific release works correctly, you may find drush very useful to streamline your installations. If you are interested in using drush, you will find plenty of good information on the Internet. To install drush, the following command may be used: # pkg install drush Creating a Database for Drupal A database for Drupal must be created in the DB server installed in the previous section. The MySQL port sets no password for the root user connecting from localhost; for this reason, setting its password is recommended (in bold the text input by the user): # mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. [...snip...] mysql> set password for root@localhost=PASSWORD('your-password') mysql> exit Bye Once the password is set, you can try to reconnect (in italic bold the variables whose name should be changed with your values of choice): # mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. [...snip...] mysql> create database drupal_database_name; mysql> create user 'drupal_user'@'localhost' identified by 'password'; mysql> grant all privileges on drupal_database_name.* to 'drupal_user'@'localhost' with grant option; mysql> flush privileges; Configuring the Apache HTTP Server Now that everything is in place, we can configure the web server so that it starts serving the Drupal web application. The tasks to perform are the followingConfiguring the required modules. Configuring a virtual host to serve Drupal. Configuring a MIME/type for PHP.The modules required to run Drupal are mod_rewrite and the PHP module. The latter was configured automatically by the PHP module port, and the latter can be configured uncommenting the following line from /usr/local/etc/apache24/httpd.conf: LoadModule rewrite_module libexec/apache24/mod_rewrite.so The cleanest way to segregate the Drupal configuration is creating a virtual host for it. An additional advantage of this approach is that Drupal will be served from the root path (/) and you won’t need to use any rewrite rule to achieve the same result. Assuming the host name and the port where Drupal will be published is drupal.host.name:80, then create a file in /usr/local/etc/apache24/Includes named drupal.conf and define the skeleton of the virtual host: <VirtualHost *:80> ServerName drupal.host.name# Drupal directory configuration placeholder ErrorLog ${APACHE_LOG_DIR}/drupal-error.log LogLevel warn CustomLog ${APACHE_LOG_DIR}/drupal-access.log combined </VirtualHost> In the default configuration of Apache in FreeBSD, any .conf file in this directory is included automatically, so that no additional code is required to add the virtual host to the web server configuration. In this fragment I’ve used an environment variable (${APACHE_LOG_DIR}) to separate some server configuration variables that could be reused in external scripts. To define environment variables, a .env file must be created in /usr/local/etc/apache24/envvars.d such as: The Drupal directory fragment defines the DocumentRoot of the virtual host and some of the required options: DocumentRoot /usr/local/www/drupal7 <Directory "/usr/local/www/drupal7"> Options Indexes FollowSymLinks AllowOverride All Require all granted </Directory>The option AllowOverride set to All is required so that .htaccess files shipped with Drupal are taken into account by the Apache HTTP Server. In this fragment, the path of the Drupal installation directory of the FreeBSD port is used. If you installed Drupal using alternative methods (such as drush), update the path accordingly. The complete virtual host configuration file is:ServerName drupal.host.nameDocumentRoot /usr/local/www/drupal7 Options Indexes FollowSymLinks AllowOverride All Require all grantedErrorLog ${APACHE_LOG_DIR}/drupal-error.log LogLevel warn CustomLog ${APACHE_LOG_DIR}/drupal-access.log combinedFinally, the Apache HTTP Server must be instructed to execute the PHP code contained in PHP pages and to do so we need to add a MIME/type for them adding the following line in httpd.conf: <IfModule mime_module># Content has been trimmed# Add MIME type for PHP AddType application/x-httpd-php .php</IfModule> Once all the settings are in place, Apache can be restarted and you can point your browser to http://drupal.host.name/ where the Drupal installation wizard will welcome you and will require you to input the database configuration and other Drupal website settings. To restart the Apache HTTP Server, the following command can be used: # service apache24 restart Configuring Drupal behind a Proxy Server Machines connecting to enterprise network often are not connected directly to the Internet but require the use of a web proxy server instead. Drupal can be configured to use a web proxy server by setting the following variables in ${DRUPAL_HOME}/sites/default/settings.php. If this file does not exist, copy the file default.settings.php (shipped with Drupal) into settings.php. The configuration variable that enable proxy support are the following: /** * External access proxy settings: * * If your site must access the Internet via a web proxy then you can enter * the proxy settings here. Currently only basic authentication is supported * by using the username and password variables. The proxy_user_agent variable * can be set to NULL for proxies that require no User-Agent header or to a * non-empty string for proxies that limit requests to a specific agent. The * proxy_exceptions variable is an array of host names to be accessed directly, * not via proxy. */ $conf['proxy_server'] = 'web-proxy'; $conf['proxy_port'] = 3128; # $conf['proxy_username'] = ''; # $conf['proxy_password'] = ''; # $conf['proxy_user_agent'] = ''; # $conf['proxy_exceptions'] = array('', 'localhost'); Depending on your proxy settings, different values may be used. Beware that although Drupal itself (Core) supports a proxy, many third-party modules still do not. One notable exception at the time of writing is the reCaptcha module which will not work without a direct Internet connection. Setting Up Clean URLs Last but not least, clean URLs support may be enabled. Drupal performs a sanity check and will not allow you to enable the Clean URLs feature if the test does not pass. However, I have found plenty of false negatives when running Drupal 7 on FreeBSD: if the Clean URL test fails in your installation, try checking if clean URLs are working and use the workaround described in the official Drupal documentation and forcibly enable Clean URLs.Reference: Installing Drupal on FreeBSD from our JCG partner Enrico Crisostomo at the The Grey Blog blog....

Spring Boot / Java 8 / Tomcat 8 on Openshift with DIY

DIY cartridge is an experimental cartridge that provides a way to test unsupported languages on OpenShift. It provides a minimal, free-form scaffolding which leaves all details of the cartridge to the application developer. This blog post illustrates the use of Spring Boot / Java 8 / Tomcat 8 application with PostgreSQL service bound to it. Creating new application Prerequisite Before we can start building the application, we need to have an OpenShift free account and client tools installed. Step 1: Create DIY application To create an application using client tools, type the following command: rhc app create boot diy-0.1 This command creates an application boot using DIY cartridge and clones the repository to boot directory. Step 2: Add PostgreSQL cartridge to application The application we are creating will use PostgreSQL database, hence we need to add appropriate cartridge to the application: rhc cartridge add postgresql-9.2 --app boot After creating the cartridge, it is possible to check its status with the following command: rhc cartridge status postgresql-9.2 --app boot Step 3: Delete Template Application Source code OpenShift creates a template project that can be freely removed: git rm -rf .openshift README.md diy misc Commit the changes: git commit -am "Removed template application source code" Step 4: Pull Source code from GitHub git remote add upstream https://github.com/kolorobot/openshift-diy-spring-boot-sample.git git pull -s recursive -X theirs upstream master Step 5: Push changes The basic template is ready to be pushed: git push The initial deployment (build and application startup) will take some time (up to several minutes). Subsequent deployments are a bit faster, although starting Spring Boot application may take even more than 2 minutes on small Gear: Tomcat started on port(s): 8080/http Started Application in 125.511 seconds You can now browse to: http://boot-yournamespace.rhcloud.com/manage/health and you should see: { "status": "UP", "database": "PostgreSQL", "hello": 1 } You can also browser the API. To find out what options you have, navigate to the root of the application. You should see the resource root with links to available resources: { "_links" : { "person" : { "href" : "http://boot-yournamespace.rhcloud.com/people{?page,size,sort}", "templated" : true } } } Navigating to http://boot-yournamespace.rhcloud.com/people should return all people from the database. Step 6: Adding Jenkins Using Jenkins has some advantages. One of them is that the build takes place in it’s own Gear. To build with Jenkins, OpenShift needs a server and a Jenkins client cartridge attached to the application. Creating Jenkins application: rhc app create ci jenkins And attaching Jenkins client to the application: rhc cartridge add jenkins-client --app boot You can now browse to: http://ci-.rhcloud.com and login with the credentials provided. When you make next changes and push them, the build will be triggered by Jenkins: remote: Executing Jenkins build. remote: remote: You can track your build at https://ci-<namespace>.rhcloud.com/job/boot-build remote: remote: Waiting for build to schedule......... And when you observe the build result, the application starts a bit faster on Jenkins. Under the hood Why DIY? Spring Boot application can be deployed to Tomcat cartridge on OpenShift. But at this moment no Tomcat 8 and Java 8 support exists, therefore DIY was selected. DIY has limitations: it cannot be scaled for example. But it is perfect for trying and playing with new things. Application structure The application is a regular Spring Boot application, that one can bootstrapped with http://start.spring.io. Build system used is Maven, packaging type is Jar. Tomcat 8 with Java 8 used. Spring Boot uses Tomcat 7 by default, to change it the following property was added: <properties> <tomcat.version>8.0.9</tomcat.version> </properties> The Maven was selected, since currently only Gradle 1.6 can be used on OpenShift. This is due to a bug in Gradle. Gradle 2.2 fixes this issue. Maven settings.xml The settings.xml file is pretty important, as it contains the location of Maven repository: ${OPENSHIFT_DATA_DIR}/m2/repository. On OpenShift, write permissions are only in $OPENSHIFT_DATA_DIR. Data source configuration The application uses Spring Data REST to export repositories over REST. The required dependencies are:spring-boot-starter-data-jpa – repositories configuration spring-boot-starter-data-rest – exposing repositoties over REST hsqldb – for embedded database support postgresql – for PostgreSQL support. Since currently OpenShift uses PostgreSQL 9.2, the appropriate driver’s version is usedCommon properties – application.properties By default (default profile, src/main/resources/application.properties), the application will use embedded HSQLDB and populate it with the src/main/resources/data.sql. The data file will work on both HSQLDB and PostrgeSQL, so we don’t need to provide platform specific files (which is possible with Spring Boot).spring.datasource.initialize = true must be used, so Spring Boot picks up the data file and loads it to the database. spring.jpa.generate-ddl = true makes sure that the schema will be exported.OpenShift properties – application-openshift.properties OpenShift specific configuration (src/main/resources/application-openshift.properties) allows the use of PostgreSQL service. The configuration uses OpenShift env variables to setup the connection properties:$OPENSHIFT_POSTGRESQL_DB_HOST – for the database host $OPENSHIFT_POSTGRESQL_DB_PORT – for the database port $OPENSHIFT_APP_NAME – for the database name $OPENSHIFT_POSTGRESQL_DB_USERNAME – for the database username $OPENSHIFT_POSTGRESQL_DB_PASSWORD – for the database passwordSpring allows to use env variables directly in properties with ${} syntax, e.g.: spring.datasource.username = ${OPENSHIFT_POSTGRESQL_DB_USERNAME} To let Spring Boot activate OpenShift profile, the spring.profiles.active property is passed to the application at startup: java -jar <name>.jar --spring.profiles.active=openshift. Logging on OpenShift The logging file will be stored in $OPENSHIFT_DATA_DIR: logging.file=${OPENSHIFT_DATA_DIR}/logs/app.log Actuator Actuator default management context path is /. This is changed to /manage, because OpenShift exposes /health endpoint itself that covers Actuator’s /health endpoint . management.context-path=/manage OpenShift action_hooks OpenShift executes action hooks script files at specific points during the deployment process. All hooks are placed in the .openshift/action_hooks directory in the application repository. Files must have be executable. In Windows, in Git Bash, the following command can be used: git update-index --chmod=+x .openshift/action_hooks/* Deploying the application The deploy script downloads Java and Maven, creates some directories and exports couple of environment variables required to properly run Java 8 / Maven build. The final command of the deployment is to run Maven goals: mvn -s settings.xml clean install Starting the application When deploy script finishes successfully, the target directory will contain a single jar with the Spring Boot application assembled. The application is started and bound to the server address and port provided by OpenShift. In addition, the profile name is provided, so a valid data source will be created. The final command that runs the application: nohup java -Xms384m -Xmx412m -jar target/*.jar --server.port=${OPENSHIFT_DIY_PORT} --server.address=${OPENSHIFT_DIY_IP} --spring.profiles.active=openshift & Stopping the application The stop script is looking for a Java process and when it finds it… you know what happens. Summary I am pretty happy with the evaluation of OpenShift with Do It Yourself cartridge. Not everything went smooth as I expected, mostly due to memory limitations on small Gear. I spent some time to figure it out and have proper configuration. But still, OpenShift with DIY is worth trying and playing with for a short while. Especially, that to get started is completely for free. ReferencesThe project source code, used throughout this article, can be found on GitHub: https://github.com/kolorobot/openshift-diy-spring-boot-sample. Spring Boot documentation: http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#cloud-deployment-openshift Some OpenShift references used while creating this article: https://blog.openshift.com/run-gradle-builds-on-openshift https://blog.openshift.com/tips-for-creating-openshift-apps-with-windowsReference: Spring Boot / Java 8 / Tomcat 8 on Openshift with DIY from our JCG partner Rafal Borowiec at the Codeleak.pl blog....

Let’s Stream a Map in Java 8 with jOOλ

I wanted to find an easy way to stream a Map in Java 8. Guess what? There isn’t! What I would’ve expected for convenience is the following method:                   public interface Map<K, V> {default Stream<Entry<K, V>> stream() { return entrySet().stream(); } } But there’s no such method. There are probably a variety of reasons why such a method shouldn’t exist, e.g.:There’s no “clear” preference for entrySet() being chosen over keySet() or values(), as a stream source Map isn’t really a collection. It’s not even an Iterable That wasn’t the design goal The EG didn’t have enough timeWell, there is a very compelling reason for Map to have been retrofitted to provide both an entrySet().stream() and to finally implement Iterable<Entry<K, V>>. And that reason is the fact that we now have Map.forEach(): default void forEach( BiConsumer<? super K, ? super V> action) { Objects.requireNonNull(action); for (Map.Entry<K, V> entry : entrySet()) { K k; V v; try { k = entry.getKey(); v = entry.getValue(); } catch(IllegalStateException ise) { // this usually means the entry is no longer in the map. throw new ConcurrentModificationException(ise); } action.accept(k, v); } } forEach() in this case accepts a BiConsumer that really consumes entries in the map. If you search through JDK source code, there are really very few references to the BiConsumer type outside of Map.forEach() and perhaps a couple of CompletableFuture methods and a couple of streams collection methods. So, one could almost assume that BiConsumer was strongly driven by the needs of this forEach() method, which would be a strong case for making Map.Entry a more important type throughout the collections API (we would have preferred the type Tuple2, of course). Let’s continue this line of thought. There is also Iterable.forEach(): public interface Iterable<T> { default void forEach(Consumer<? super T> action) { Objects.requireNonNull(action); for (T t : this) { action.accept(t); } } } Both Map.forEach() and Iterable.forEach() intuitively iterate the “entries” of their respective collection model, although there is a subtle difference:Iterable.forEach() expects a Consumer taking a single value Map.forEach() expects a BiConsumer taking two values: the key and the value (NOT a Map.Entry!)Think about it this way:This makes the two methods incompatible in a “duck typing sense”, which makes the two types even more differentBummer! Improving Map with jOOλ We find that quirky and counter-intuitive. forEach() is really not the only use-case of Map traversal and transformation. We’d love to have a Stream<Entry<K, V>>, or even better, a Stream<Tuple2<T1, T2>>. So we implemented that in jOOλ, a library which we’ve developed for our integration tests at jOOQ. With jOOλ, you can now wrap a Map in a Seq type (“Seq” for sequential stream, a stream with many more functional features): Map<Integer, String> map = new LinkedHashMap<>(); map.put(1, "a"); map.put(2, "b"); map.put(3, "c");assertEquals( Arrays.asList( tuple(1, "a"), tuple(2, "b"), tuple(3, "c") ),Seq.seq(map).toList() ); What you can do with it? How about creating a new Map, swapping keys and values in one go: System.out.println( Seq.seq(map) .map(Tuple2::swap) .toMap(Tuple2::v1, Tuple2::v2) );System.out.println( Seq.seq(map) .toMap(Tuple2::v2, Tuple2::v1) ); Both of the above will yield: {a=1, b=2, c=3} Just for the record, here’s how to swap keys and values with standard JDK API: System.out.println( map.entrySet() .stream() .collect(Collectors.toMap( Map.Entry::getValue, Map.Entry::getKey )) ); It can be done, but the every day verbosity of standard Java API makes things a bit hard to read / write.Reference: Let’s Stream a Map in Java 8 with jOOλ from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

Stop Claiming that you’re Using a Schemaless Database

One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database: Why Schemaless? MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.     And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:The one in your head when you designed the data structures The one that your database really implemented to store your data structures The one you should have implemented to fulfill your requirementsEvery time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here: With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written. Everything above is true as well. “Schema-less” vs. “Schema-ful” But let’s translate this to SQL (or use any other “schema-ful” database instead): ALTER TABLE student ADD gpa VARCHAR(10); And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”: ALTER TABLE student DROP gpa; Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database. Let’s summarise:We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones We guarantee that a migration takes place (and it’s instant, too) We guarantee data integrity when we roll back the changeWhat about more real-world DDL? Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy! CREATE TABLE student_to_teacher AS SELECT id AS student_id, teacher_id FROM student;ALTER TABLE student DROP teacher_id; … and we’re done! (of course, we’d be adding constraints and indexes) Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript! Let’s face the truth Schemalessness is about a misleading term as much as NoSQL is: History of NoSQL according to @markmadsen #strataconf pic.twitter.com/XHXMJsXHjV — Edd Dumbill (@edd) November 12, 2013And again, MongoDB’s blog post is telling the truth (and an interesting one, too): Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural. When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas. (And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD) This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty. An example: LOL JavaScript: > null * 1 0 > null == 0 false pic.twitter.com/Hc2NR2tsMP — Lukas Eder (@lukaseder) October 15, 2014… and more:So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”. But let me tell you the ugly truth: Anything that can possibly go wrong, does – Murphy We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.Reference: Stop Claiming that you’re Using a Schemaless Database from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog....

ConEmu – Windows console emulator with tabs

After switching to Git some time ago, I started working more and more with Git Bash on Windows. Git Bash is pretty cool as it provides (apart from Git) Bash supported with basic Unix tools including curl or ssh. Git Bash in Windows has some limitation though including limited customization options and lack of good copy & paste options supported with keyboard shortcuts. Fortunately, there is ConEmu that not only fills that gap but adds various features that make working with console applications more productive and more enjoyable for me.         Introduction ConEmu is a Windows console emulator with tabs, which presents multiple consoles and simple GUI applications as one customizable GUI window with various features. And not only working with Git Bash is far better with ConEmu, but with other tools I use too:Far Manager – a program for managing files and archives in Windows – handy Notepad++ – source code editor and Notepad replacement – naturally! cmd (Windows command prompt) – I still use it, rarely but stillPractically, running any tool should not be a problem. Let’s say I want to run my favorite password manager in ConEmu I can execute the following command: $ <KeePassHome>/keepass.exe -new_console -new_console switch instructs ConEmu to start an application in a new console. Working with tabs Controlling tabs I disabled most confirmation on tabs actions (Settings > Main > Confirm), so now I can fully control tab creation, closing and switching between them with shortcuts without additional confirmations. Most commonly used shortcuts for working with tabs:Win + N – show New Console Dialog (e.g. for running tasks with no shortcuts assigned) Win + X – new cmd console Win + Delete – for closing active tab Win + <Num> – switch between tabs (alternative Ctrl+Tab)Split Screen ConEmu may split any tab into several panes:The most common shortcuts to work with Split Screen:Win+N – show New Console Dialog and select Split Screen options Ctrl+Shift+O – duplicate shell from active pane and split horizontally Ctrl+Shift+E – duplicate shell from active pane and split verticallyYou navigate between screens in Split Screen mode just like you navigate between tabs. Tasks Git Bash is my favorite shell on Windows, therefore I made it a startup task in ConEmu:In addition I added Far Manager and Notepad++ tasks and I associated hot keys for them:Win+B,F,P – Git Bash, FAR and Notepad++.Even if you choose shortcuts that are used by Windows, ConEmu will intercept them (once active). Working with text Highlighting, copying & pasting with mouse and keyboard shortcuts makes it really convenient. This is one of the features I appreciate most in ConEmu.Shortcuts:copying current selection with Ctrl + C pasting with Shift+Insert, Ctrl+V (only first line) or with right mouse click, selecting text Shift+Arrow Keys/Home/End or with right click and dragAdditionally, scrolling buffer is also easy withCtrl+Up/Down/PgUp/PgDown shortcuts. Notepad++ Notepad++ is one of my favorite editors for Windows. ConEmu can run Notepad++ in a tab with no problem. I created a task for Notepad++ so I can start it in a new tab whenever I want. In addition, I made possible to run it in console with the file loaded that is passed as argument. This is very easy with Git Bash. Make sure Notepad++ is in the PATH and create an alias: alias edit="notepad++ -new_console" Now, edit filename will run Notepad++ with filename loaded in a new tab.In case you want this alias to be always available, create .bashrc file in your home directory (if does not exist) and add the alias so it is automatically loaded on Git Bash startup. Summary I’ve been using ConEmu for several weeks now and I am far from knowing everything about it, but I already can’t imagine my Windows without it! With ConEmu I can use my favorite tools like Git Bash, cmd, Far Manager, and Notepad++ in one application with great tabbing experience supported with shortcuts. Font anti-aliasing, transparency (can be configured for active and inactive window separately), full screen, split screen and great mark, highlight, copy & paste options makes ConEmu is a complete application and a great choice for developers looking for improved productivity in Windows. I truly recommend ConEmu to every professional! References Project home page: https://code.google.com/p/conemu-maximus5Reference: ConEmu – Windows console emulator with tabs from our JCG partner Rafal Borowiec at the Codeleak.pl blog....

Neo4j: Cypher – Avoiding the Eager

  Although I love how easy Cypher’s LOAD CSV command makes it to get data into Neo4j, it currently breaks the rule of least surprise in the way it eagerly loads in all rows for some queries even those using periodic commit. This is something that my colleague Michael noted in the second of his blog posts explaining how to use LOAD CSV successfully:  The biggest issue that people ran into, even when following the advice I gave earlier, was that for large imports of more than one million rows, Cypher ran into an out-of-memory situation. That was not related to commit sizes, so it happened even with PERIODIC COMMIT of small batches.I recently spent a few days importing data into Neo4j on a Windows machine with 4GB RAM so I was seeing this problem even earlier than Michael suggested. Michael explains how to work out whether your query is suffering from unexpected eager evaluation:If you profile that query you see that there is an “Eager” step in the query plan. That is where the “pull in all data” happens.You can profile queries by prefixing the word ‘PROFILE’. You’ll need to run your query in the console of /webadmin in your web browser or with the Neo4j shell. I did this for my queries and was able to identify query patterns which get evaluated eagerly and in some cases we can work around it. We’ll use the Northwind data set to demonstrate how the Eager pipe can sneak into our queries but keep in mind that this data set is sufficiently small to not cause issues. This is what a row in the file looks like: $ head -n 2 data/customerDb.csv OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath,OrderID,ProductID,UnitPrice,Quantity,Discount,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued,SupplierID,SupplierCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage,CategoryID,CategoryName,Description,Picture 10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,VINET,Vins et alcools Chevalier,Paul Henriot,Accounting Manager,59 rue de l'Abbaye,Reims,,51100,France,,,5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04,1993-10-17,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,\x,"Steven Buchanan graduated from St. Andrews University, Scotland, with a BSC degree in 1976. Upon joining the company as a sales representative in 1992, he spent 6 months in an orientation program at the Seattle office and then returned to his permanent post in London. He was promoted to sales manager in March 1993. Mr. Buchanan has completed the courses ""Successful Telemarketing"" and ""International Sales Management."" He is fluent in French.",2,http://accweb/emmployees/buchanan.bmp,10248,11,14,12,0,11,Queso Cabrales,5,4,1 kg pkg.,21,22,30,30,0,5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,,4,Dairy Products,Cheeses,\x MERGE, MERGE, MERGE The first thing we want to do is create a node for each employee and each order and then create a relationship between them. We might start with the following query: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order) This does the job but if we profile the query like so… PROFILE LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row WITH row LIMIT 0 MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order) …we’ll notice an ‘Eager’ lurking on the third line: ==> +----------------+------+--------+----------------------------------+-----------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+----------------------------------+-----------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | employee, order, UNNAMED216 | MergePattern | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+----------------------------------+-----------------------------------------+ You’ll notice that when we profile each query we’re stripping off the periodic commit section and adding a ‘WITH row LIMIT 0′. This allows us to generate enough of the query plan to identify the ‘Eager’ operator without actually importing any data. We want to split that query into two so it can be processed in a non eager manner: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row WITH row LIMIT 0 MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) ==> +-------------+------+--------+----------------------------------+-----------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+----------------------------------+-----------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+----------------------------------+-----------------------------------------+ Now that we’ve created the employees and orders we can join them together: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order) ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, order, UNNAMED216 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(0) | 0 | 0 | order, order | :Order | ==> | Filter(1) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(1) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ Not an Eager in sight! MATCH, MATCH, MATCH, MERGE, MERGE If we fast forward a few steps we may now have refactored our import script to the point where we create our nodes in one query and the relationships in another query. Our create query works as expected: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (product:Product {productId: row.ProductID}) ==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, employee, order, order, product, product | MergeNode; :Employee; MergeNode; :Order; MergeNode; :Product | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+----------------------------------------------------+------------------------------------------------------------ We’ve now got employees, products and orders in the graph. Now let’s create relationships between the trio: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MATCH (product:Product {productId: row.ProductID}) MERGE (employee)-[:SOLD]->(order) MERGE (order)-[:PRODUCT]->(product) If we profile that we’ll notice Eager has sneaked in again! ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | order, product, UNNAMED318 | MergePattern | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | employee, order, UNNAMED287 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(product,productId) == Property(row,ProductID) | ==> | NodeByLabel(0) | 0 | 0 | product, product | :Product | ==> | Filter(1) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(1) | 0 | 0 | order, order | :Order | ==> | Filter(2) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(2) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ In this case the Eager happens on our second call to MERGE as Michael identified in his post:The issue is that within a single Cypher statement you have to isolate changes that affect matches further on, e.g. when you CREATE nodes with a label that are suddenly matched by a later MATCH or MERGE operation.We can work around the problem in this case by having separate queries to create the relationships: LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order) ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, order, UNNAMED236 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(0) | 0 | 0 | order, order | :Order | ==> | Filter(1) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(1) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (order:Order {orderId: row.OrderID}) MATCH (product:Product {productId: row.ProductID}) MERGE (order)-[:PRODUCT]->(product) ==> +----------------+------+--------+------------------------------+--------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+------------------------------+--------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | order, product, UNNAMED229 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(product,productId) == Property(row,ProductID) | ==> | NodeByLabel(0) | 0 | 0 | product, product | :Product | ==> | Filter(1) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(1) | 0 | 0 | order, order | :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+------------------------------+--------------------------------------------------------+ MERGE, SET I try to make LOAD CSV scripts as idempotent as possible so that if we add more rows or columns of data to our CSV we can rerun the query without having to recreate everything. This can lead you towards the following pattern where we’re creating suppliers: USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (supplier:Supplier {supplierId: row.SupplierID}) SET supplier.companyName = row.SupplierCompanyName We want to ensure that there’s only one Supplier with that SupplierID but we might be incrementally adding new properties and decide to just replace everything by using the ‘SET’ command. If we profile that query, the Eager lurks: ==> +----------------+------+--------+--------------------+----------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+--------------------+----------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | | PropertySet | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | supplier, supplier | MergeNode; :Supplier | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+--------------------+----------------------+ We can work around this at the cost of a bit of duplication using ‘ON CREATE SET’ and ‘ON MATCH SET': USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (supplier:Supplier {supplierId: row.SupplierID}) ON CREATE SET supplier.companyName = row.SupplierCompanyName ON MATCH SET supplier.companyName = row.SupplierCompanyName ==> +-------------+------+--------+--------------------+----------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+--------------------+----------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | supplier, supplier | MergeNode; :Supplier | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+--------------------+----------------------+ With the data set I’ve been working with I was able to avoid OutOfMemory exceptions in some cases and reduce the amount of time it took to run the query by a factor of 3 in others. As time goes on I expect all of these scenarios will be addressed but as of Neo4j 2.1.5 these are the patterns that I’ve identified as being overly eager. If you know of any others do let me know and I can add them to the post or write a second part.Reference: Neo4j: Cypher – Avoiding the Eager from our JCG partner Mark Needham at the Mark Needham Blog blog....
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: