Featured FREE Whitepapers

What's New Here?

git-logo

Git: Committing vs Pushing vs Stashing – OR: “what to do if I just want to work locally”

Many people ask me for advice when they’re trying to modify some code locally without the changes finding their way into the remote repository or – gods forbid – the production. This makes me realize that there’s some gap in understanding what Git is and how it works. When you perform a ‘git clone ‘, what you’re saying is “I’d like to make some contribution to the project at the remote repository” (or fork it, but that’s another use case that doesn’t really interest most of you so we’ll ignore it). Git copies the remote repository to your local machine, and marking your copy as a remote tracking branch of the repository you cloned from.   After your changes have reached some point when they form a meaningful, atomic change to the system, you commit them into your local repository using ‘git commit’. By that, you’re saying “Gee, this looks like a good idea, I’d like to contribute it back to the project. I don’t know when exactly I’ll physically send my copy back to the original repository but I definitely want it to get there.”. Then, at some point, you send your changes to the remote repository using ‘git push’, thereby saying “I have reached a point in time where the original project needs / could use my work, so let’s send it there now”. In most cases, before you can do that you’ll have to update your local copy with the changes that occurred in the remote repository since your clone / since the last time you updated. You do that by using ‘git pull’ (or ‘git pull –rebase’, if you prefer, like me, using rebase rather than merge). After merging/rebasing your local repository against its remote counterpart, you can push your changes, making the product guys happy that your change is finally available for testing / deployment. All that is jolly fun, but what if you want to make some temporary, local changes that you never want to merge with the remote repository. If you’re working on a new file, you can just add it to your local .gitignore file, making Git ignore this file for all eternity. But if you’re just modifying some pre-existing file, you’re out of luck – you’ll just have to edit the file and remember never to commit it. Which is fine, until you have to pull changes from the remote repository, whereupon Git explodes telling you that you have some uncommitted / untracked changes. At this point, you have to make your changes “temporarily go away” so that Git can successfully complete the pull. The proper way to do that is by using ‘git stash’. This will make all local, uncommitted changes go away temporarily, until you call ‘git stash apply’, which will then merge your local changes with the latest updates from the remote repository. Sure, you could always commit your changes, pull from the remote repository, then revert your commit, but that’s kinda like beating a baby seal to death with a baseball bat. You just don’t do that, no matter how tempting this might sound to your deranged mind. In summary:Your local repository is a full copy of the remote repository. It’s not a working copy. When you commit a file, you’re saying ‘This change should eventually reach production’. When you push your local repository to its remote counterpart, you’re saying ‘The production could sure use my amazing new code right about now’. If you don’t want your change to reach production, don’t commit it. If you’re having trouble merging/rebasing because of untracked local changes, perform the following:git stash git pull / git pull –rebase make sure that the operation completed successfully. You should read the stuff Git reports after performing a pull, often it will tell you of conflicts that make your rebase partial; in this case you should amend the conflicts and continue the rebase using ‘git rebase –continue’. git stash apply  Reference: Git: Committing vs Pushing vs Stashing – OR: “what to do if I just want to work locally” from our JCG partner Shai Yallin at the Wix IO blog. ...
software-development-2-logo

Recycle Bin in Database

Oracle introduced in database 10g new feature called ‘Recycle Bin’ to store the dropped database objects. If any table is dropped then any associated object to this table such as indexes, constraints and any other dependent object are renamed with a prefix of bin$$. Use of Recycle Bin If user drop an important object accidentally, and he want to get it again. With Recycle Bin feature user can easily restore the dropped objects.     Enable and Disable Recycle Bin You can use the below query to distinguish which Recycle Bin is enabled or no SELECT Value FROM V$parameter WHERE Name = 'recyclebin'; It will return on or off  on means that Recycle Bin is enabled and off is disabled. You can enable and disable Recycle Bin per session and system, there fore you can use the below scripts to enable and disable Recycle Bin per session or system. ALTER SYSTEM SET recyclebin = ON;ALTER SESSION SET recyclebin = ON;ALTER SYSTEM SET recyclebin = OFF;ALTER SESSION SET recyclebin = OFF; Get Contents of Recycle Bin To get the dropped object in Recycle Bin, you can use any one of the below query statements. SELECT * FROM RECYCLEBIN;SELECT * FROM USER_RECYCLEBIN;SELECT * FROM DBA_RECYCLEBIN; Restore Dropped Objects You can use the below syntax to restore dropped objects FLASHBACK TABLE <<Dropped_Table_Name>> TO BEFORE DROP RENAME TO <<New_Table_Name>>; Note that RENAME TO portion in restore statement is optional and you should use it if you want to restore dropped object with new name. Clearing the Recycle Bin You can clear specific entries in Recycle Bin or complete Recycle Bin a- Clear Specific Table PURGE TABLE <<Table_NAME>>; b- Clear specific index PURGE INDEX <<Index_NAME>>; c- Clear every objects associated with specific table space PURGE TABLESPACE<<Table_NAME>>; d- Clear objects of a specific user in table space PURGE TABLESPACE<<Table_NAME>> USER <<User_Name>>; e- Clear complete Recycle Bin PURGE TABLE <<Table_NAME>>; e- Clear Complete Recycle Bin PURGE RECYCLEBIN; f- You can clear the table from RECYCLE Bin while dropping it DROP TABLE <<Table_Name>> PURGE; Demo Now I will take a demo and for clarifying Recycle Bin feature 1-Enable Recycle Bin feature ALTER SYSTEM SET recyclebin = ON; 2- Create DEMO_RECYCLEBIN database table CREATE TABLE DEMO_RECYCLEBIN (COL1 NUMBER); 3- Insert one record in DEMO_RECYCLEBIN table INSERT INTO DEMO_RECYCLEBIN (COL1) VALUES (1); COMMIT; 4- Drop DEMO_RECYCLEBIN table DROP TABLE DEMO_RECYCLEBIN; 5- Query the Recycle Bin contents SELECT * FROM USER_RECYCLEBIN; The data will be like below6- Restore DEMO_RECYCLEBIN table from Recycle Bin FLASHBACK TABLE DEMO_RECYCLEBIN TO BEFORE DROP; 7- Quert DEMO_REYCLEBIN after restoring SELECT * FROM DEMO_RECYCLEBIN; It will return the data existed before dropping 8- Drop table again and clear the Recycle Bin DROP TABLE DEMO_RECYCLEBIN PURGE;   Reference: Recycle Bin in Database from our JCG partner Mahmoud A. ElSayed at the Dive in Oracle blog. ...
android-logo

ADT Bundle – Just a single step to setup android development environment

I got many queries from college students and engineers regarding installation and setup of Android development environment, so for them here is good news. Before this post i was used to send them below steps for the installation and setup of Android development environment: 1. Download Eclipse 2. Download JDK and install it, set the environment path 3. Download ADT plugin inside Eclipse 4. set the Preference with Android-SDK path 5. Download the latest platform-tools and everything But now i would suggest to download ADT Bundle to new Android developer. ADT Bundle:Android ADT BundleThe ADT Bundle provides everything you need to start developing apps, including a version of the Eclipse IDE with built-in ADT (Android Developer Tools) to streamline your Android app development. so now we can say its single step download to setup Android development environment.           In short, with a single download, the ADT Bundle includes everything you need to begin developing apps:Eclipse + ADT plugin Android SDK Tools Android Platform-tools The latest Android platform The latest Android system image for the emulatorYes there are also possible ways if you want to use existing version of Eclipse or any other IDE. Here you go for download: ADT Bundle.  Setting Up the ADT Bundle: As you have downloaded ADT bundle, follow below steps to setup it:Unpack the ZIP file (named adt-bundle-<os_platform>.zip) and save it to an appropriate location, such as a “Development” directory in your home directory. Open the adt-bundle-<os_platform>/eclipse/ directory and launch eclipse.That’s it! The IDE is already loaded with the Android Developer Tools plugin and the SDK is ready to go.   Reference: ADT Bundle – Just a single step to setup android development environment from our JCG partner Paresh N. Mayani at the TechnoTalkative blog. ...
java-interview-questions-answers

Google Guava EventBus and Java 7 WatchService for Event Programming

This post is going to cover using the Guava EventBus to publish changes to a directory or sub-directories detected by the Java 7 WatchService. The Guava EventBus is a great way to add publish/subscribe communication to an application. The WatchService, new in the Java 7 java.nio.file package, is used to monitor a directory for changes. Since the EventBus and WatchService have been covered in previous posts, we will not be covering these topics in any depth here. For more information, the reader is encouraged to view the EventBus and WatchService posts. [NOTE: post updated on 02/28/2012 for clarity.]       Why Use the EventBus There are two main reasons for using the EventBus with a WatchService.We don’t want poll for events, but would rather receive asynchronous notification. Once events are processed, the WatchKey.reset method needs to be called to enable any new changes to be queued. While the WatchKey object is thread safe, it’s important that the reset method is called only after all threads have finished processing events, leading to somewhat of a coordination hassle. Using a single thread to process the events, invoke the reset method, then publish the changes via the EventBus, eliminates this problem.Our plan to accomplish this is simple and will involve taking the following steps:Instantiate an instance of the WatchService. Register every directory recursively, starting with a given Path object. Take events off the WatchService queue, then process and publish those events. Start up a separate thread for taking events off the queue and publishing.The code examples that follow are the more relevant highlights from the DirectoryEventWatcherImpl class that is going to do all of this work. Registering Directories with the WatchService While adding or deleting a sub-directory will generate an event, any changes inside a sub-directory of a watched directory will not. We are going to compensate for this by recursively going through all sub-directories (via the Files.walkFileTree method) and register each one with the WatchService object (previously defined in the example here): private void registerDirectories() throws IOException { Files.walkFileTree(startPath, new WatchServiceRegisteringVisitor()); }private class WatchServiceRegisteringVisitor extends SimpleFileVisitor<Path>{ @Override public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException { dir.register(watchService,ENTRY_CREATE,ENTRY_DELETE,ENTRY_MODIFY); return FileVisitResult.CONTINUE; } } On line 2 the Files.walkFileTree method uses the WatchServiceRegisteringVisitor class defined on line 5 to register every directory with the WatchService. The registered events are creation of files/directories, deletion of files/directories or updates to a file. Publishing Events The next step is to create a FutureTask that will do the work of checking the queue and publishing the events. private void createWatchTask() { watchTask = new FutureTask<>(new Callable<Integer>() { private int totalEventCount; @Override public Integer call() throws Exception { while (keepWatching) { WatchKey watchKey = watchService.poll(10, TimeUnit.SECONDS); if (watchKey != null) { List<WatchEvent<?>> events = watchKey.pollEvents(); Path watched = (Path) watchKey.watchable(); PathEvents pathEvents = new PathEvents(watchKey.isValid(), watched); for (WatchEvent event : events) { pathEvents.add(new PathEvent((Path) event.context(), event.kind())); totalEventCount++; } watchKey.reset(); eventBus.post(pathEvents); } } return totalEventCount; } }); }private void startWatching() { new Thread(watchTask).start(); } On line 7, we are checking the WatchService every 10 seconds for queued events. When a valid WatchKey is returned, the first step is to retrieve the events (line 9) then get the directory where the events occurred (line 10). On line 11 a PathEvents object is created, taking a boolean and the watched directory as constructor arguments. Lines 12-15 are looping over the events retrieved on line 9, using the target Path and event type as arguments to create PathEvent object. The WatchKey.reset method is called on line 16, setting the WatchKey state back to ready, making it eligible to receive new events and be placed back into the queue. Finally on line 17 the EventBus publishes the PathEvents object to all subscribers. It’s important to note here that the PathEvents and PathEvent classes are immutable. The totalEventCount that is returned from the Callable is never exposed in the API, but is used for testing purposes. The startWatching method on line 25 starts the thread to run the watching/publishing task defined above. Conclusion By pairing the WatchService with the Guava EventBus we are able to manage the WatchKey and process events in a single thread and notify any number of subscribers asynchronously of the events. It is hoped the reader found this example useful. As always comments and suggestions are welcomed. ResourcesSource code and unit test for this post EventBus API WatchService API Previous post on the WatchService. Previous post on the EventBus  Reference: Event Programming Example: Google Guava EventBus and Java 7 WatchService from our JCG partner Bill Bejeck at the Random Thoughts On Coding blog. ...
java-logo

Using Cryptography in Java Applications

This post describes how to use the Java Cryptography Architecture (JCA) that allows you to use cryptographic services in your applications. Java Cryptography Architecture Services The JCA provides a number of cryptographic services, like message digests and signatures. These services are accessible through service specific APIs, like MessageDigest and Signature. Cryptographic services abstract different algorithms. For digests, for instance, you could use MD5 or SHA1. You specify the algorithm as a parameter to the getInstance() method of the cryptographic service class: MessageDigest digest = MessageDigest.getInstance("MD5"); You find the value of the parameter for your algorithm in the JCA Standard Algorithm Name Documentation. Some algorithms have parameters. For instance, an algorithm to generate a private/public key pair will take the key size as a parameter. You specify the parameter(s) using the initialize() method: KeyPairGenerator generator = KeyPairGenerator.getInstance("DSA"); generator.initialize(1024);If you don’t call the initialize() method, some default value will be used, which may or may not be what you want. Unfortunately, the API for initialization is not 100% consistent across services. For instance, the Cipher class uses init() with an argument indicating encryption or decryption, while the Signature class uses initSign() for signing and initVerify() for verification.   Java Cryptography Architecture Providers The JCA keeps your code independent from a particular cryptographic algorithm’s implementation through the provider system. Providers are ranked according to a preference order, which is configurable (see below). The best preference is 1, the next best is 2, etc. The preference order allows the JCA to select the best available provider that implements a given algorithm. Alternatively, you can specify a specific provider in the second argument to getInstance(): Signature signature = Signature.getInstance("SHA1withDSA", "SUN"); The JRE comes with a bunch of providers from Oracle by default. However, due to historical export restrictions, these are not the most secure implementations. To get access to better algorithms and larger key sizes, install the Java Cryptography Extension Unlimited Strength Jurisdiction Policy Files. Update: Note that the above statement is true for the Oracle JRE. OpenJDK doesn’t have the same limitation. Make Your Use of Cryptography Configurable You should always make sure that the cryptographic services that your application uses are configurable. If you do that, you can change the cryptographic algorithm and/or implementation without issuing a patch. This is particularly valuable when a new attack on an (implementation of an) algorithm becomes available. The JCA makes it easy to configure the use of cryptography. The getInstance() method accepts both the name of the algorithm and the name of the provider implementing that algorithm. You should read both and any values for the algorithm’s parameters from some sort of configuration file. Also make sure you keep your code DRY and instantiate cryptographic services in a single place. Check that the requested algorithm and/or provider are actually available. The getInstance() method throws NoSuchAlgorithmException when a given algorithm or provider is not available, so you should catch that. The safest option then is to fail and have someone make sure the system is configured properly. If you continue despite a configuration error, you may end up with a system that is less secure than required. Note that Oracle recommends not specifying the provider. The reasons they provide is that not all providers may be available on all platforms, and that specifying a provider may mean that you miss out on optimizations. You should weigh those disadvantages against the risk of being vulnerable. Deploying specific providers with known characteristics with your application may neutralize the disadvantages that Oracle mentions. Adding Cryptographic Service Providers The provider system is extensible, so you can add providers. For example, you could use the open source Bouncy Castle or the commercial RSA BSAFE providers. In order to add a provider, you must make sure that its jar is available to the application. You can put it on the classpath for this purpose. Alternatively, you can make it an installed extension by placing it in the $JAVA_HOME/lib/ext directory, where $JAVA_HOME is the location of your JDK/JRE distribution. The major difference between the two approaches is that installed extensions are granted all permissions by default whereas code on the classpath is not. This is significant when (part of) your code runs in a sandbox. Some services, like Cipher, require the provider jar to be signed. The next step is to register the provider with the JCA provider system. The simplest way is to use Security.addProvider(): Security.addProvider(new BouncyCastleProvider()); You can also set the provider’s preference order by using the Security.insertProviderAt() method: Security.insertProviderAt (new JsafeJCE(), 1); One downside of this approach is that it couples your code to the provider, since you have to import the provider class. This may not be an important issue in an modular system like OSGi. Another thing to look out for is that code requires SecurityPermission to add a provider programmatically. The provider can also be configured as part of your environment via static registration by adding an entry to the java.security properties file (found in $JAVA_HOME/jre/lib/security/java.security):   security.provider.1=com.rsa.jsafe.provider.JsafeJCE security.provider.2=sun.security.provider.Sun The property names in this file start with security.provider. and end with the provider’s preference. The property value is the fully qualified name of the class implementing Provider. Implementing Your Own Cryptographic Service Provider Don’t do it. You will get it wrong and be vulnerable to attacks. Using Cryptographic Service Providers The documentation for the provider should tell you what provider name to use as the second argument to getInstance(). For instance, Bouncy Castle uses BC, while RSA BSAFE uses JsafeJCE. Most providers have custom APIs as well as JCA conformant APIs. Do not use the custom APIs, since that will make it impossible to configure the algorithms and providers used. Not All Algorithms and Implementations Are Created Equal It’s important to note that different algorithms and implementations have different characteristics and that those may make them more or less suitable for your situation. For instance, some organizations will only allow algorithms and implementations that are FIPS 140-2 certified or are on the list of NSA Suite B cryptographic algorithms. Always make sure you understand your customer’s cryptographic needs and requirements.  Using JCA in an OSGi environment The getInstance() method is a factory method that uses the Service Provider Interface (SPI). That is problematic in an OSGi world, since OSGi violates the SPI framework’s assumption that there is a single classpath. Another potential issue is that JCA requires some jars to be signed. If those jars are not valid OSGi bundles, you can’t run them through bnd to make them so, since that would make the signature invalid. Fortunately, you can kill both birds with one stone. Put your provider jars on the classpath of your main program, that is the program that starts the OSGi framework. Then export the provider package from the OSGi system bundle using the org.osgi.framework.system.packages.extra system property. This will make the system bundle export that package. Now you can simply use Import-Package on the provider package in your bundles. There are other options for resolving these problems if you can’t use the above solution.   Reference: Using Cryptography in Java Applications from our JCG partner Remon Sinnema at the Secure Software Development blog. ...
java-logo

Google Guava MultiMaps

Guava? This is the first in a series of posts where I’ll be attempting to explain and explore Google’s awesome Guava java library. I first came across Guava whilst searching for generic versions of Apache Commons Collections – I needed a Bimap and was fed up with having to pepper my code with casts – however what I found was much much better. Not only does it contain various implementations of more complex (but useful) collection types-Multimaps,Multisets,Bimaps- which I’ll discuss in detail, but also facilities to support a more functional style of programming with immutable collections, andfunction andpredicate objects. This has both completely changed the way I write java, and at the same time made me increasingly frustrated with Java’s sometimes clunky syntax, something I intend to explore in further posts. Anyway enough with the introduction, and on with the good stuff. The first thing I’d like to take a look at is the Multimap, which is probably the single Guava feature I’ve made the most use of. Mutlimaps So, how often have you needed a data structure like the following? Map<String,List<MyClass>> myClassListMap test2 = new HashMap<String,List<MyClass>>() If you’re anything like me, fairly frequently. And don’t you find yourself writing the same boilerplate code over and over again? To put a key/value pair into this map, you need to first check if a list already exists for your key, and if it doesn’t create it. You’ll end up writing something along the lines of the following: void putMyObject(String key, Object value) { List<Object> myClassList = myClassListMap.get(key); if(myClassList == null) { myClassList = new ArrayList<object>(); myClassListMap.put(key,myClassList); } myClassList.add(value); } Bit of a pain, and what if you need methods to check a value exists, or remove a value, or even iterate over the entire data structure. That can be quite a lot of code. Never fear Guava is here! Just like the standard java collections, Guava defines several interfaces and matching implementations. Usually you want to code to an interface, and only worry about the implementation when you create it. In this case we’re interested in Multimaps. So using a multimap, we could replace the data structure declaration with the following: Multimap<String,Object> myMultimap = ArrayListMultimap.create(); There’s a few things to note here. The generic type declaration should look very familiar, this is exactly how you would declare a normal Map. You may have been expecting to see new ArrayListMultimap<String,Object>() on the right-hand side of the equals. Well, all Guava collection implementations offer a create method, which is usually more concise and has the advantage that you do not have to duplicate the generic type information. Guava in fact adds similar functionality to the standard Java collections. For example, if you examine com.google.common.collect.Lists, you’ll see static newArrayList(), and newLinkedList() methods, so you can take advantage of this conciseness even with the standard Java collections. (I’ll aim to cover this in more detail in a future post). So we’ve declared and instantiated a multimap, how do we go about using them? Easy just like a normal map! public class MutliMapTest { public static void main(String... args) { Multimap<String, String> myMultimap = ArrayListMultimap.create();// Adding some key/value myMultimap.put('Fruits', 'Bannana'); myMultimap.put('Fruits', 'Apple'); myMultimap.put('Fruits', 'Pear'); myMultimap.put('Vegetables', 'Carrot');// Getting the size int size = myMultimap.size(); System.out.println(size); // 4// Getting values Collection<string> fruits = myMultimap.get('Fruits'); System.out.println(fruits); // [Bannana, Apple, Pear]Collection<string> vegetables = myMultimap.get('Vegetables'); System.out.println(vegetables); // [Carrot]// Iterating over entire Mutlimap for(String value : myMultimap.values()) { System.out.println(value); }// Removing a single value myMultimap.remove('Fruits','Pear'); System.out.println(myMultimap.get('Fruits')); // [Bannana, Pear]// Remove all values for a key myMultimap.removeAll('Fruits'); System.out.println(myMultimap.get('Fruits')); // [] (Empty Collection!) } } One thing you may be wondering, is why does the get method return a Collection and not a List, that would be much more useful. Indeed it would. The problem is there are several different implementations available, some use Lists-ArrayListMultimap, LinkedListMultimap etc. – and some use Sets – HashMultimap,TreeMultimap among others. To handle this – if you need to work directly with the Lists, or Sets in the map – there are several subinterfaces defined. ListMultimap, SetMultimap, and SortedSetMultimap. These all do what you’d expect, and their methods that return collections, will return one of the approprite type. ie ListMutlimap<String,String> myMutlimap = ArrayListMultimap.create();List<string> myValues = myMutlimap.get('myKey'); // Returns a List, not a Collection. That’s basically all there is to them. I recommend looking at the API:http://docs.guava-libraries.googlecode.com/git-history/release09/javadoc/com/google/common/collect/Multimap.html, where you can find the various implementations, you should be able to find one that suits your needs.Reference: Multimaps – Google Guava from our JCG partner Tom Jefferys at the Tom’s Programming Blog blog. ...
java-logo

How much memory do I need

What is retained heap? How much memory will I need? This is a question you might have asked yourself (or others) when building a solution, creating a data structure or choosing an algorithm. Will this graph of mine fit in my 3G heap if it contains 1,000,000 edges and I use a HashMap to store it? Can I use the standard Collections API while building my custom caching solution or is the overhead posed by them too much? Apparently, the answer to the simple question is a bit more complex. In this post we’ll take a first peek at it and see how deep the rabbit hole actually is. The answer to the question in the headline comes in several parts. At first we need to understand whether you are interested in shallow or retained heap sizes. The shallow heap is easy – it consists of only the heap occupied by the object itself. There are some nuances to how to calculate it, but for the scope of this article we leave it as is. Stay tuned for future posts on the same topic. The retained heap is in many ways more interesting. Only rarely are you interested in the shallow heap, in most cases your actual question can be translated to “If I remove this object from the memory, how much memory can now be freed by the garbage collector”. Now, as we all remember, all Java garbage collection (GC) algorithms follow this logic:There are some objects which are considered “important” by the GC. These are called GC roots and are (almost) never discarded. They are, for example, currently executing method’s local variables and input parameters, application threads, references from native code and similar “global” objects. Any objects referenced from those GC roots are assumed to be in use and hence not discarded by the GC. One object can reference another in different ways in Java, in the most common case an object A is stored in a field of an object B. In such case we say “B references A”. The process is repeated until all objects that can be transitively reached from GC roots are visited and marked as “in use”. Everything else is unused and can be thrown away.Now to illustrate how to calculate the retained heap, let’s follow the aforementioned algorithm with the following example objects:To simplify the sample, let’s estimate that all the objects O1-O4 have the shallow heap of 1024B = 1kB. Lets start calculating the retained sizes of those objects.O4 has no references to other objects, so its retained size is equal to its shallow size of 1kB. O3 has a reference to O4. Garbage collecting O3 would thus mean O4 would also be eligible for garbage collection and so we can say that O3 retained heap is 2kB. O2 has a reference to O3. But it is now important to note that removing the pointer from O2 to O3 does not make O3 eligible for GC, as O1 still has got a pointer to it. So O2 retained heap is only 1kB. O1 on the other hand is the object keeping all the references in this small graph, so if we would remove O1, everything on this graph would be garbage collected. So O1 retained heap is 4kB.Which implications does this have in practice? In fact, understanding the differences between shallow and retained heap sizes makes it possible to work with tools such as memory profilers and heap dump analyzers – for example digging into Eclipse MAT might prove to be impossible if you don’t know how to distinguish these two types of heap size measurements. What is shallow heap? This article is the second post in the series where we try to answer those questions. The last post explained the difference between retained and shallow sizes of an object. In the article we also offered an example of how to calculate retained heap size of a data structure. In today’s article we will expand on what we called “simple” in the previous post. Namely - what is and how to measure shallow heap used by an object. In the first post we pushed a whole lot of complexity away by stating that calculating shallow heap size is easy – it consists of only the heap occupied by the object itself. But how do you calculate how much memory does the object “itself” require? Apparently there is a formula for it: Shallow Heap Size = [reference to the class definition] + space for superclass fields + space for instance fields + [alignment] Does not seem too helpful, eh? Let’s try to apply the formula using the following sample code: class X {    int a;    byte b;    java.lang.Integer c = new java.lang.Integer(); } class Y extends X {    java.util.List d;    java.util.Date e; } Now, the question we strive to answer is – how much shallow heap size does an instance of a Y require? Lets start calculating it, assuming that we are on a 32-bit x86 architecture: As a starting point – Y is a subclass of X, so its size includes “something” from the superclass. Thus, before calculating the size of Y, we look into calculating the shallow size of X. Jumping into the calculations on X, first 8 bytes are used to refer its class definition. This reference is always present in all Java objects and is used by JVM to define the memory layout of the following state. It also has three instance variables – an int, an Integer and a byte. Those instance variables require heap as follows:a byte is what it is supposed to be. 1 byte in a memory. an int in our 32bit architecture requires 4 bytes. a reference to the Integer requires also 4 bytes. Note that when calculating retained heap, we should also take into account the size of a primitive wrapped into the Integer object, but as we are calculating shallow heap here, we only use the reference size of 4 bytes in our calculations.So – is that it? Shallow heap of X = 8 bytes from reference to the class definition + 1 byte (the byte) + 4 bytes (the int) + 4 bytes (reference to the Integer) = 17 bytes? In fact – no. What now comes into play is called alignment (also called padding). It means that the JVM allocates the memory in multiples of 8 bytes, so instead of 17 bytes we would allocate 24 bytes if we would create an instance of X. If you could follow us until here, good, but now we try to get things even more complex. We are NOT creating an instance of X, but an instance of Y. What this means is – we can deduct the 8 bytes from the reference to the class definition and the alignment. It might not be too obvious at first place but – did you note that while calculating the shallow size of X we did not take into account that it also extends java.lang.Object as all classes do even if you do not explicitly state it in your source code? We do not have to take into account the header sizes of superclasses, because JVM is smart enough to check it from the class definitions itself, instead of having to copy it into the object headers all the time. The same goes for alignment – when creating an object you only align once, not at the boundaries of superclass/subclass definitions. So we are safe to say that when creating a subclass to X you will only inherit 9 bytes from the instance variables. Finally we can jump to the initial task and start calculating the size of Y. As we saw, we have already lost 9 bytes to the superclass fields. Let’s see what will be added when we actually construct an instance of Y.Y’s headers referring to its class definition consume 8 bytes. The same as with previous ones. The Date is a reference to an object. 4 bytes. Easy. The List is a reference to a collection. Again 4 bytes. Trivial.So in addition to the 9 bytes from the superclass we have 8 bytes from the header, 2×4 bytes from the two references (the List and the Date). The total shallow size for the instance of Y would be 25 bytes, which get aligned to 32. To make the calculations somewhat easier to follow, we have aggregated it on the following diagram:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32Align Align Align AlignX Object a b cY Object a b c d eWhat can you do with this knowledge? Together with the skills to calculate the size of retained heap (covered in my recent post), you now possess the ultimate power to calculate how much memory your data structures actually require. To make things even more interesting, we have created an utility that measures the sizes of both shallow and retained heap for your objects. In the very near future we will release the tool for free use. Stay tuned by subscribing to our Twitter feed! Measure, don’t guess What looks like as an easy task can in reality become somewhat complicated. There is a whole lot of different aspects you have to bear in mind when calculating the memory footprint of your objects:Do I need to measure shallow or retained heap size? Do I make the calculations for 32 or 64bit architecture? Am I running on x86, SPARC, POWER or on something even beyond imagination? Do I use compressed or uncompressed ordinary object pointers? [enter something else you are afraid or do not completely understand here]Bearing all those aspects in mind when trying to estimate the size of your data structures is simply unreasonable when trying to meet yet another deadline. So we went ahead and packaged the code published by Java Champion Heinz Kabutz as a java agent and provided an easy way to add it to your application. Adding the agent gives you an easy way to trace how much memory your data structures on your actual environment take. And does it without the complexity introduced by the alternatives. In the following four easy steps you are up&running and finally understanding how much memory do your precious caches actually consume: Step 1: Download the agent. Don’t worry, its just few kilobytes. Step 2: Unzip the downloaded agent. You see it is packaged along with it’s source code and a sample on how to use it. Feel free to play around with the code. nikita-mb:sizeof nikita$ ls -l total 16 -rw-r--r-- 1 nikita staff 1696 Aug 28 22:12 build.xml -rw-r--r-- 1 nikita staff 3938 Aug 28 22:33 sizeofagent.jar drwxr-xr-x 5 nikita staff  170 Aug 28 10:44 src Step 3: Experiment with the bundled testcase. The bundled testcase measures the same data structure we described in our blog post about shallow heap size measurement. For those who do not bother clicking back and forth, here is the code again: class X {    int a;    byte b;    java.lang.Integer c = new java.lang.Integer(); } class Y extends X {    java.util.List d;    java.util.Date e; } The testcase is shipped with Ant tests to compile and run the samples. Run ant test or ant test-32 if you are on a 32-bit architecture. You should see the following output when running all the tests with ant test:nikita-mb:sizeof nikita$ ant testBuildfile: /Users/nikita/workspace/sizeof/build.xmlinit:compile:test32:[java] java.lang.Object: shallow size=8 bytes, retained=8 bytes [java] eu.plumbr.sizeof.test.X: shallow size=24 bytes, retained=40 bytes [java] eu.plumbr.sizeof.test.Y: shallow size=32 bytes, retained=48 bytestest64+UseCompressedOops:[java] java.lang.Object: shallow size=16 bytes, retained=16 bytes [java] eu.plumbr.sizeof.test.X: shallow size=24 bytes, retained=40 bytes [java] eu.plumbr.sizeof.test.Y: shallow size=32 bytes, retained=48 bytestest64-UseCompressedOops:[java] java.lang.Object: shallow size=16 bytes, retained=16 bytes [java] eu.plumbr.sizeof.test.X: shallow size=32 bytes, retained=56 bytes [java] eu.plumbr.sizeof.test.Y: shallow size=48 bytes, retained=72 bytestest:BUILD SUCCESSFUL Total time: 2 secondsFrom the test above you can see for example that on 32bit architecture, the shallow heap of Y consumes 32 bytes and retained heap 48 bytes. On 64bit architecture with -XX:-UseCompressedOops the shallow size increases to 48 bytes and retained heap size to 72 bytes. If it bedazzles you how do we calculate those numbers, then check out what is and how to calculate shallow and retained heap sizes from our previous posts in the series. Step 4: Attach the agent to your very own Java application. To do this, add -javaagent:path-to/sizeofagent.jar to your JVM startup scripts. Now you can measure shallow heap consumption by invoking MemoryCounterAgent.sizeOf(yourObject) or measure retained heap consumption by invoking MemoryCounterAgent.deepSizeOf(yourObject) directly in your code. See the bundled ant scripts and eu.plumbr.sizeof.test.SizeOfSample class also in case you get confused while doing it. Of course you have got numerous alternatives, especially in forms of memory profilers and APM solutions. But this small agent will do its task quickly and requires next to no set-up nor learning. Well, at minimum we had fun playing with it. Instead of crunching through our product backlog. PS. While writing this article, the following online resources were used for inspiration:http://memoryanalyzer.blogspot.com/2010/02/heap-dump-analysis-with-memory-analyzer.html http://www.javamex.com/tutorials/memory/object_memory_usage.shtml http://www.javamex.com/tutorials/memory/instrumentation.shtml http://kohlerm.blogspot.com/2008/12/how-much-memory-is-used-by-my-java.html http://www.javaspecialists.eu/archive/Issue142.htmlAnd – do not forget to send your congratulations for this code to Heinz Kabutz, who published it originally in its Java Specialists’ Newsletter in March 2007.   Reference: How much memory do I need (part 1) – What is retained heap?, How much memory do I need (part 2) – What is shallow heap?, How much memory do I need (part 3) – measure, don’t guess from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog. ...
opscode-chef-logo

Chef Happens – Managing Solaris with Chef

Adding Solaris servers to be managed by Chef was the most annoying entry in our Wix.com DevOps backlog for almost a year. We moved our MySQL databases to Solaris more than a year ago. We automate everything, but getting Solaris into the Chef kitchen was not that trivial. There is minimal support for Solaris in Chef, so I have made several additions to Chef which other happy Solaris Chef masters might find useful. My first challenge in setting up Chef on Solaris was that there is no omnibus installer for Solaris 5.10 x86. Unfortunately, it takes quite a bit of work to go from a bare Solaris install to one that can install the chef gem. So I’ve written a bootstrap file that does that work for you. This bootstrap file does the following:Adds /opt/csw/lib and /usr/local/lib to the library path (via crle). Installs pkgutil from OpenCSW. Installs libgcc_s1, coreutils, libssl1_0_0, wget, gsed, binutils and gmake via pkgutil. Installs ruby from http://www.sunfreeware.com/ (The ruby from OpenCSW does not work correctly). Re-names some files so that ruby can build new gems. Installs the ohai and chef gems. Adds a patch so that adding users to groups works (see CHEF-3245). Creates the initial Chef files.You can get this bootstrap file from GitHub. Once you have downloaded this file, put it in .chef/bootstrap/solaris.erb in the root of your Chef repository. If you are the only user who needs it, you can put it in your home directory instead. Once you have the bootstrap file, (or if you are using another bootstrap file), you can install Chef. Installing Chef on Solaris:Login to the machine you want to install Chef on. Set the hostname. Enable root login via SSH. (Set PermitRootLogin yes in /etc/ssh/sshd_config) svcadm restart ssh cd into the root of the Chef Git repository knife bootstrap -d solaris login to the machine as root and run: chef-client  Using OpenCSW packages: My next challenge was that I want to be able to install OpenCSW packages from Chef. To I’ve written an LWRP for pkgutil and uploaded it to the Opscode community cookbook site. You can install this to your Chef repository by doing “knife cookbook site install pkgutil”. Once you have done this, you can start using OpenCSW packages in your cookbooks. In the cookbook that has the pkgutil_package resources, add a dependency on the pkgutil cookbook in your metadata.rb file, like this: depends 'pkgutil' Then use the resources as follows: pkgutil_package 'vim' Or: pkgutil_package 'vim' do action :install endpkgutil_package 'top' do action :upgrade endpkgutil_package 'less' do action :remove end   Using zpools, zfs and zones: The next challenge was managing zpools, zfs filesystems and zones via Chef. To do that, I’ve written LWRPs for them as well, which you can install as you did for pkgutil. To use these resources, in the cookbook that has the resources, add a dependency on the appropriate cookbook in your metadata.rb file, like this: depends 'zpool' depends 'zfs' Or: depends 'zone' Then on the global zone, include a recipe like this: zpool 'zones' do disks [ 'c0t2d0s0' ] endzfs 'zones/test'directory '/zones/test' do mode '0700' endzone 'test' do path '/zones/test' limitpriv 'default,dtrace_proc,dtrace_user' password 'whbFxl4vH5guE' nets [ '192.168.0.9/24:e1000g0' ] end   Putting it all together: My final challenge was to combine this all into a single step to create new zones. This was done in order to ease the transition into Chef for our Solaris administrators, who are used to creating new zones with a bunch of shell scripts. You can get this script on GitHub. In order to run the script, ruby 1.9 is required, as well as the chef, git and net/ssh gems. Chef must be installed on the global zone and the zpool for the zone must already be created. It is very strongly recommended to setup DNS for the new zone before beginning! To see all of the options, run the script with -h: shell$ create_zone.rb -h Usage: ./create_zone.rb (options) -d, --debug Turn on debugging messages -t, --git Add/commit/push new recipes to git -g, --global GLOBAL The FQDN of the server to create the zone on (required) -c, --config KNIFE_CONFIG Knife configuration file (defaults to ~/.chef/knife.rb) -n, --net NET Network information for the new zone, in the form: ipaddress[/cidr]:interface (required) -r, --run_list RUN_LIST Specify the run_list for the new zone -s, --password SSH_PASSWORD SSH password to use (required) -P, --port SSH_PORT SSH port to use (defaults to 22) -z, --zone ZONE The hostname of the new zone (required) -p, --zpool ZPOOL Name of the zpool to use (defaults to rpool) -h, --help Show this message</pre> # Here is an example of how to create a new zone named test on the host global.example.com and install mysql in the new zone: shell$ create_zone.rb -n 192.168.0.9/24:e1000g0 -z test -g global.example.com -s testpw -p zones -r 'recipe[mysql::server]' The script will:Generate the recipes to create the zfs filesystems and the zone. If -t is specified, add, commit and push the new recipes to git. Upload the cookbook to the Chef server (using the knife.rb configuration specified with -c). Add the new recipe to global host. Run Chef on the global host, creating the new zfs and zone. Knife bootstrap the new zone, with an initial run_list specified with -r.You can take this script and modify it for your environment (you might want to change the template for the generated recipe), but hopefully you will find it helpful!   Reference: Chef Happens – Managing Solaris with Chef from our JCG partner Yoav Abrahami at the Wix IO blog. ...
software-development-2-logo

Nurturing Leadership

I had an email conversation with a colleague about when you let people fail versus when you rescue them—how you nurture leadership. The context is with people who are new to management, or new to a particular piece of work in a project. If you’re agile, I say you pair these people and be done with it. No problem. But my colleague is not agile. So, the first option is not going to work. Well, fine. We need more options. The next question is when is the point of no return? When is the trigger point for when the risk turns into a disaster? If you are coaching people, or helping people learn something new, you want to help them see their options before the disaster. We need to find the most responsible moment to make a decision. Let’s talk about the most responsible moment for a little bit. Notice that I did not say the last responsible moment. I said the most responsible moment. That’s because while I might be able to take more time to make a decision, someone with less experience might need more time to make the decision. Hmm, I bet we need an example about now. My colleague is a test manager in a plan-driven organization. Let’s imagine he is grooming a tester to shoulder more responsibility as a technical lead. This tester is not as able to make the decisions about what the defect data means, nor about what the test case plan/run/pass data means. The tester doesn’t quite understand the fault feedback ratio either. My colleague is coaching the tester to see the big picture from the data. This is fine. My colleague decides to work with the tester once a week on the test dashboard and ask the tester for the tester’s interpretation about the project’s status. They discuss it, write down their interpretation, and continue through the project. Near the end, as the pressure mounts, they start to meet every other day, and finally every day. The big question is this: When do the testers have to retest which areas of the system? Part of that question is depends on what the developers change. Part of that question depends on what has regressed. So the aspiring technical lead needs to know about the guts of the system with respect to regressions, and decide when the testers have done enough regression testing. In a plan-driven project, this is a critical decision. Too much regression testing, and you never release. Too little regression testing and you release defects. You need to understand the release criteria and how the changes affect the release criteria. It’s a delicate balance. Definitely a decision that requires solution space domain expertise. You can’t make the decision the day before the release; everyone needs some notice. What should my colleague do? This is where my colleague’s brilliance of writing down their interpretations all the way through the project shines. They have a history of their conclusions based on whatever data they’ve had throughout the project. They have a way of discussing their decisions. The test manager can be transparent about the decision with aspiring lead, for this project and the next. Maybe after one or two projects, the test manager can leave the decision with the lead. To recap the options:Try pairing if possible. Actually, the practice of working with the manager and discussing the decisions during the first couple of projects is a form of pairing. Work with someone and be transparent about your decisions even if you are making the decisions so they can see how you think. Help people see the date/time by which they need to act or decide when the decision is theirs. Help people create plans with multiple checkpoints when the decision is theirs. Practice making decisions on smaller projects/with less risky decisions when the decision is theirs.Things to not do:Don’t tell people they can make the decision and then make it for them. That’s grabbing the decision back and is not nurturing leadership. Don’t tell people the decision point is artificially early. That’s artificially rescuing them and doesn’t allow them to even come close to failing. It also closes options too early. If you think you are seeing someone “fail,” offer feedback. Offer help. Do not interfere. No matter how much it hurts you. Unless you see someone in physical distress. Do not be offended if people don’t want anything from you. Tough.When you nurture new leaders, you ask them the questions before you let them fail. You don’t wait for them to go past the decision point. You don’t take over the decision from them. The problem is when is the decision point. And what happens if people “fail”? Well how bad can the failure possibly be? When you nurture leaders, you help people learn how to learn from their failures. If you rescue people all the time, they can’t possibly learn.   Reference: Nurturing Leadership from our JCG partner Johanna Rothman at the Managing Product Development blog. ...
apache-hadoop-mapreduce-logo

Calculating A Co-Occurrence Matrix with Hadoop

This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book. This time we will be creating a word co-occurrence matrix from a corpus of text. Previous posts in this series are:Working Through Data-Intensive Text Processing with MapReduce Working Through Data-Intensive Text Processing with MapReduce – Local Aggregation Part IIA co-occurrence matrix could be described as the tracking of an event, and given a certain window of time or space, what other events seem to occur. For the purposes of this post, our “events” are the individual words found in the text and we will track what other words occur within our “window”, a position relative to the target word. For example, consider the phrase “The quick brown fox jumped over the lazy dog”. With a window value of 2, the co-occurrence for the word “jumped” would be [brown,fox,over,the]. A co-occurrence matrix could be applied to other areas that require investigation into when “this” event occurs, what other events seem to happen at the same time. To build our text co-occurrence matrix, we will be implementing the Pairs and Stripes algorithms found in chapter 3 of Data-Intensive Text Processing with MapReduce. The body of text used to create our co-occurrence matrix is the collective works of William Shakespeare. Pairs Implementing the pairs approach is straightforward. For each line passed in when the map function is called, we will split on spaces creating a String Array. The next step would be to construct two loops. The outer loop will iterate over each word in the array and the inner loop will iterate over the “neighbors” of the current word. The number of iterations for the inner loop is dictated by the size of our “window” to capture neighbors of the current word. At the bottom of each iteration in the inner loop, we will emit a WordPair object (consisting of the current word on the left and the neighbor word on the right) as the key, and a count of one as the value. Here is the code for the Pairs implementation: public class PairsOccurrenceMapper extends Mapper<LongWritable, Text, WordPair, IntWritable> { private WordPair wordPair = new WordPair(); private IntWritable ONE = new IntWritable(1);@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { int neighbors = context.getConfiguration().getInt('neighbors', 2); String[] tokens = value.toString().split('\\s+'); if (tokens.length > 1) { for (int i = 0; i < tokens.length; i++) { wordPair.setWord(tokens[i]);int start = (i - neighbors < 0) ? 0 : i - neighbors; int end = (i + neighbors >= tokens.length) ? tokens.length - 1 : i + neighbors; for (int j = start; j <= end; j++) { if (j == i) continue; wordPair.setNeighbor(tokens[j]); context.write(wordPair, ONE); } } } } } The Reducer for the Pairs implementation will simply sum all of the numbers for the given WordPair key: public class PairsReducer extends Reducer<WordPair,IntWritable,WordPair,IntWritable> { private IntWritable totalCount = new IntWritable(); @Override protected void reduce(WordPair key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int count = 0; for (IntWritable value : values) { count += value.get(); } totalCount.set(count); context.write(key,totalCount); } }   Stripes Implementing the stripes approach to co-occurrence is equally straightforward. The approach is the same, but all of the “neighbor” words are collected in a HashMap with the neighbor word as the key and an integer count as the value. When all of the values have been collected for a given word (the bottom of the outer loop), the word and the hashmap are emitted. Here is the code for our Stripes implementation: public class StripesOccurrenceMapper extends Mapper<LongWritable,Text,Text,MapWritable> { private MapWritable occurrenceMap = new MapWritable(); private Text word = new Text();@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { int neighbors = context.getConfiguration().getInt('neighbors', 2); String[] tokens = value.toString().split('\\s+'); if (tokens.length > 1) { for (int i = 0; i < tokens.length; i++) { word.set(tokens[i]); occurrenceMap.clear();int start = (i - neighbors < 0) ? 0 : i - neighbors; int end = (i + neighbors >= tokens.length) ? tokens.length - 1 : i + neighbors; for (int j = start; j <= end; j++) { if (j == i) continue; Text neighbor = new Text(tokens[j]); if(occurrenceMap.containsKey(neighbor)){ IntWritable count = (IntWritable)occurrenceMap.get(neighbor); count.set(count.get()+1); }else{ occurrenceMap.put(neighbor,new IntWritable(1)); } } context.write(word,occurrenceMap); } } } } The Reducer for the Stripes approach is a little more involved due to the fact we will need to iterate over a collection of maps, then for each map, iterate over all of the values in the map: public class StripesReducer extends Reducer<Text, MapWritable, Text, MapWritable> { private MapWritable incrementingMap = new MapWritable();@Override protected void reduce(Text key, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException { incrementingMap.clear(); for (MapWritable value : values) { addAll(value); } context.write(key, incrementingMap); }private void addAll(MapWritable mapWritable) { Set<Writable> keys = mapWritable.keySet(); for (Writable key : keys) { IntWritable fromCount = (IntWritable) mapWritable.get(key); if (incrementingMap.containsKey(key)) { IntWritable count = (IntWritable) incrementingMap.get(key); count.set(count.get() + fromCount.get()); } else { incrementingMap.put(key, fromCount); } } } }   Conclusion When looking at the two approaches, we can see that the Pairs algorithm will generate more key value pairs compared to the Stripes algorithm. Also, the Pairs algorithm captures each individual co-occurrence event while the Stripes algorithm captures all co-occurrences for a given event. Both the Pairs and Stripes implementations would benefit from using a Combiner. Because both produce commutative and associative results, we can simply re-use each Mapper’s Reducer as the Combiner. As stated before, creating a co-occurrence matrix has applicability to other fields beyond text processing, and represent useful MapReduce algorithms to have in one’s arsenal. Thanks for your time. ResourcesData-Intensive Processing with MapReduce by Jimmy Lin and Chris Dyer Hadoop: The Definitive Guide by Tom White Source Code and Tests from blog Hadoop API MRUnit for unit testing Apache Hadoop map reduce jobs  Reference: Calculating A Co-Occurrence Matrix with Hadoop from our JCG partner Bill Bejeck at the Random Thoughts On Coding blog. ...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close