Featured FREE Whitepapers

What's New Here?


When using direct memory can be faster

Overview Using direct memory is no guarantee of improving performance.  Given it adds complexity, it should be avoided unless you have a compelling reason to use it. This excellent article by Sergio Oliveira Jr shows its not simply a matter of using direct memory to improve performance Which one is faster: Java heap or native memory? Where direct memory and memory mapped files can help is when you have a large amounts of data and/or you have to perform some IO with that data. Time series data. Time series data tends to have both a large number of entries and involve IO to load and store the data.  This makes it a good candidate for memory mapped files and direct memory. I have provided an example here;  main and tests where the same operations are performed on regular objects and on memory mapped files.  Note: I am not suggesting that the access to the Objects is slow but the overhead of using objects which is the issue. e.g. loading, creating, the size of the object headers, garbage collection and saving the objects. The test loads time series data with the time and two columns, bid and ask prices (normalised as int values) This is used to calculate and save a simple mid price basis point movement.  The test performs one GC to include the overhead of managing the objects involved.Storage 1 million 10 million 30 million 100 million 250 millionObjects   0.55s   4.4s   16.7s   86.7s   225sMMap   0.056s   0.46s     1.3s     4.5s      11sThe full results Not only is memory mapped data 10x faster for smaller data sets, it scales better for large data sizes becauseMemory mapped data is available as soon as it is mapped into memory It creates only a small number of objects, which has almost no heap foot print reducing GC times. It can be arranged in memory as you desire reducing the over head per row as it doesn’t have an object per row. It doesn’t have to do anything extra to save the data.Using direct memory and memory mapped files is not as simple as using Java objects, but if you have big data requirements it can make a big difference. Using direct memory and memory mapped files can also make a big difference for low latency requirements, something I have discussed in previous articles.   Reference: When using direct memory can be faster from our JCG partner Peter Lawrey at the Vanilla Java blog. ...

When Premature Optimization Isn’t

Earlier this month, I decided I wanted to write a post on not all optimization being premature optimization after hearing more than one developer use this mantra as an excuse for not making a better decision in the same week. Bozhidar Bozhanov beat me to it with his post Not All Optimization Is Premature, which makes some excellent but different points than I had planned to make in postulating that there is nothing wrong in early optimizing ‘if neither readability nor maintainability are damaged and the time taken is negligible.’ Bozhanov’s post reminded me that I wanted to write this post. I use this post to provide additional examples and support to backup my claims that too many in our community have allowed ‘avoid premature optimization’ to become a ‘bumper sticker practice.’ In my opinion, some developers have taken the appropriate advice to ‘avoid premature optimization’ out of context or do not want to spend the time and effort to really think about the reasons behind this statement. It may seem easier to blindly apply it to all situations, but that is just as dangerous as prematurely optimizing. Good Design is Not Premature Optimization I like Rod Johnson‘s differentiation between ‘design’ and ‘code optimization’ in his book Expert One-on-One J2EE Design and Development (2002). Perhaps the most common situations in which I have seen developers make bad decisions under the pretense of ‘avoiding premature optimization’ is making bad architecture or design choices. The incident earlier this month that prompted me to want to write this post was a developer’s assertion that we should not design our services to be coarser grained than he wanted because that was premature optimization and his idea of making numerous successive calls on the same service was ‘easiest’ to implement. In my mind, this developer was mistaking the valid warning about not prematurely optimizing code as an excuse to not consider an appropriate design that might require a barely noticeable amount of extra effort. In his differentiation between design and code optimization, Johnson highlighted, ‘Minimize the need for optimization by heading off problems with good design.’ In that same section he warns against ‘code optimization’ for four main reasons: ‘optimization is hard’ (‘few things in programming are harder than optimizing existing code’), ‘most optimization is pointless,’ ‘optimization causes many bugs,’ and ‘inappropriate optimization may reduce maintainability forever.’ Johnson states (and I agree), ‘There is no conflict between designing for performance and maintainability, but optimization may be more problematic.’ Applying Appropriate Language Practices is Not Premature Optimization Most programming languages I’m familiar with often offer multiple ways to accomplish the same thing. In many cases, one of the alternatives has well-known advantages. In such cases, is it really premature optimization to use the better performing alternative? For example, if I’m writing Java code to append characters onto a String within a loop, why would I ever do this with a Java String instead of StringBuilder? Use of StringBuilder is not much different in terms of maintainability or readability for even a relatively new Java developer and there is a known performance benefit that requires no profiling to recognize. It seems silly to write it with String in the name of ‘avoiding premature optimization’ and only change it to StringBuilder when the profiler shows it’s a performance issue. That being stated, it would be just as silly to use a StringBuilder for simple concatenations outside of a loop ‘just in case’ that code was ever placed within a loop. Similarly, it’s not ‘premature optimization’ to write a conditional such that the most likely condition is encountered first as long as doing so does not make the code confusing. Another example is the common use of short circuit evaluation in conditionals that can be effective without being premature optimization. Finally, there are cases where certain data structures or collections are more likely to be appropriate than others for a given operation or set of expected operations. Writing Cleaner Code is Not Premature Optimization Some developers might confuse more ‘efficient’ (cleaner) source code with premature optimization. Optimizing source code for readability and maintainability (such as in refactoring or carefully crafting original code) has nothing to do with Knuth’s original quote. Writing cleaner code often leads to better performance, but this does not mean writing cleaner code is a form of premature optimization. Others’ Thoughts on Misunderstanding of Premature Optimization Besides the Not All Optimization Is Premature post, other posts on the misapplication of the ‘avoid premature optimization’ mantra include Joe Duffy’s The ‘premature optimization is evil’ myth and ‘Avoid Premature Optimization’ Does Not Mean ‘Write Dumb Code’. Duffy puts it so well that I’ll quote him directly here: I have heard the ‘premature optimization is the root of all evil’ statement used by programmers of varying experience at every stage of the software lifecycle, to defend all sorts of choices, ranging from poor architectures, to gratuitous memory allocations, to inappropriate choices of data structures and algorithms, to complete disregard for variable latency in latency-sensitive situations, among others. Mostly this quip is used defend sloppy decision-making, or to justify the indefinite deferral of decision-making. In other words, laziness.The James Hague post points out one of the signs of potentially having crossed into premature optimization: ‘The warning sign is when you start sacrificing clarity and reliability while chasing some vague notion of performance.’ Hague also writes, ‘What’s often missed in these discussions is that the advice to ‘avoid premature optimization’ is not the same thing as ‘write dumb code.” I like this last sentence because I believe that just as some developers have adulterated the agile concept to mean (to them) ‘no documentation,’ some developers have adulterated the sound advice to ‘avoid premature optimization’ to mean (to them) ‘blindly write code.’ Premature Optimization is a Real Problem Premature optimization is a problem we developers must guard against. As Johnson states in the previously cited book, ‘Few things in programming are harder than optimizing existing code. Unfortunately, this is why optimization is uniquely satisfying to any programmer’s ego. The problem is that the resources devoted to such optimization may well be wasted.’ There makings examples of where premature optimization wastes significant resources and in some cases even makes things perform worse. There is indeed a reason that the well-regarded Knuth wrote that ‘premature optimization is the root of all evil.’ I’m not saying that premature optimization doesn’t exist or that it’s not harmful. I’m just saying that avoiding this admitted dysfunctional behavior is often used an an excuse to avoid thinking or to avoid implementing sound decisions. Conclusion Like pithy bumper stickers on cars that naively boil down complex issues to a few clever and catchy words, use of ‘avoid premature optimization’ is often used much more broadly than it was intended. Even the best of recommended practices can cause more harm than benefit when applied improperly and the misuse of ‘avoid premature optimization’ is one of the best examples of this. I have seen the high cost paid in maintainability, readability, and even in performance when supposed ‘optimization’ was implemented too early and at the expense of readability and maintainability for no measurable benefit. However, just as high of a cost can be incurred by blindly using ‘avoid premature optimization’ as an excuse to avoid designing and writing better performing software. ‘Avoiding premature optimization’ is not an excuse to stop thinking.   Reference: When Premature Optimization Isn’t from our JCG partner Dustin Marx at the Inspired by Actual Events blog. ...

Forcing Tomcat to log through SLF4J/Logback

So you have your executable web application in JAR with bundled Tomcat (make sure to read that one first). However there are these annoying Tomcat logs at the beginning, independent from our application logs and not customizable:                 Nov 24, 2012 11:44:02 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-bio-8080"] Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Tomcat Nov 24, 2012 11:44:02 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.30 Nov 24, 2012 11:44:05 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-bio-8080"] I would really like to quite them down, or even better save them somewhere since they sometimes reveal important failures. But I definitely don’t want to have a separate java.util.logging configuration. Did you wonder after reading the previous article how did I knew that runnable Tomcat JAR supports -httpPort parameter and few others? Well, I checked the sources, but it’s simpler to just ask for help: $ java -jar target/standalone.jar -help usage: java -jar [path to your exec war jar] -ajpPort <ajpPort> ajp port to use -clientAuth enable client authentication for https -D <arg> key=value -extractDirectory <extractDirectory> path to extract war content, default value: .extract -h,--help help -httpPort <httpPort> http port to use -httpProtocol <httpProtocol> http protocol to use: HTTP/1.1 or org.apache.coyote.http11.Http11Nio Protocol -httpsPort <httpsPort> https port to use -keyAlias <keyAlias> alias from keystore for ssl -loggerName <loggerName> logger to use: slf4j to use slf4j bridge on top of jul -obfuscate <password> obfuscate the password and exit -resetExtract clean previous extract directory -serverXmlPath <serverXmlPath> server.xml to use, optional -uriEncoding <uriEncoding> connector uriEncoding default ISO-8859-1 -X,--debug debug The -loggerName parameter looks quite promising. First try: $ java -jar target/standalone.jar -loggerName slf4j WARNING: issue configuring slf4j jul bridge, skip it No good. Quick look at the source code again and it turns out that SLF4J library is missing. Since this parameter is interpreted during Tomcat bootstrapping (way before web application is deployed), slf4j-api.jar inside my web application is not enough, it has to be available for root class loader (equivalent to /lib directory in packaged Tomcat). Luckily plugin exposes <extraDependencies/> configuration parameter: <configuration> <path>/standalone</path> <enableNaming>false</enableNaming> <finalName>standalone.jar</finalName> <charset>utf-8</charset> <extraDependencies> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.7.2</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>jul-to-slf4j</artifactId> <version>1.7.2</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> <version>1.0.7</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-core</artifactId> <version>1.0.7</version> </dependency> </extraDependencies> </configuration> Running Tomcat and… success! 00:01:27.110 [main] INFO o.a.coyote.http11.Http11Protocol - Initializing ProtocolHandler ["http-bio-8080"] 00:01:27.127 [main] INFO o.a.catalina.core.StandardService - Starting service Tomcat 00:01:27.128 [main] INFO o.a.catalina.core.StandardEngine - Starting Servlet Engine: Apache Tomcat/7.0.33 00:01:29.645 [main] INFO o.a.coyote.http11.Http11Protocol - Starting ProtocolHandler ["http-bio-8080"] Well, not quite. If you use Logback on a daily basis you are familiar with default console logging pattern. We are not picking up any logback.xml. From my experience it seems that placing logback.xml externally somewhere in your file system is superior to putting it inside your binary, especially with auto refreshing feature turned on: <configuration scan="true" scanPeriod="5 seconds"> <!-- ... --> </configuration> Put some fallback logback.xml file in the root of your CLASSPATH in case no other file was specified like below and voilà: $ java -jar standalone.jar -httpPort=8081 -loggerName=slf4j \ -Dlogback.configurationFile=/etc/foo/logback.xml Finally, clean and consistent logging, most likely to a single file.   Reference: Forcing Tomcat to log through SLF4J/Logback from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog. ...

Event streaming with MongoDB

MongoDB is a really great “NoSQL” database, with a very wide range of applications. In one project that we are developing at SoftwareMill, we used it as a replicated event storage, from which we stream the events to other components. Introduction The basic idea is pretty simple (see also Martin Fowler’s article on Event Sourcing). Our system generates a series of events. These events are persisted in the event storage. Other components in the system follow the stream of events and do “something” with them; for example they can get aggregated and written into a reporting database (this, on the other hand, resembles CQRS). Such an approach has many advantages:reading and writing of the events is decoupled (asynchronous) any following-component may die and then “catch up”, given that it wasn’t dead for too long there may be multiple followers. The followers may read the data from slave replicas, for better scalability bursts of event activity have a reduced impact on event sinks; at worst, the reports will get generated slowerThe key component here is of course a fast and reliable event storage. The three key features of MongoDB that we used to implement one are:capped collections and tailable cursors fast collection appends replica sets  Collection As the base, we are using a capped collection, which by definition is size-constrained. If writing a new event would cause the collection to exceed the size limit, the oldest events are overwritten. This gives us something similar to a circular buffer for events. (Plus we are also quite safe from out-of-disk-space errors.) Until version 2.2, capped collection didn’t have an _id field by default (and hence no index). However, as we wanted the events to be written reliably across the replica set, both the _id field and an index on it are mandatory.Writing events Writing events is a simple Mongo insert operation; inserts can also be done in batches. Depending on how tolerant we are of event loss, we may use various Mongo write concerns (e.g. waiting for a write confirmation from a single-node or from multiple nodes). All of the events are immutable. Apart from nicer, thread-safe Java code, this is a necessity for event streaming; if the events were mutable, how would the event sink know what was updated? Also, this has good Mongo performance implications. As the data is never changed, the documents that are written to disk never shrink or expand, so there is no need to move blocks on disk. In fact, in a capped collection, Mongo doesn’t allow to grow a document that was once written.Reading events Reading the event stream is a little bit more complex. First of all, there may be multiple readers, each with a different level of advancement in the stream. Secondly, if there are no events in the stream, we would like the reader to wait until some events are available, and avoid active polling. Finally, we would like to process the events in batches, to improve performance. Tailable cursors solve these problems. To create such a cursor we have to provide a starting point – an id of an event, from which we’ll start reading; if an id is not provided, the cursor will return events from the oldest one available. Thus each reader must store the last event that it has read and processed. More importantly, tailable cursors can optionally block for some amount of time if no new data is available, solving the active polling problem. (By the way, the oplog collection that mongo uses to replicate data across a replica set, is also a capped collection. Slave Mongo instances tail this collection, streaming the “events”, which are database operations, and applying them locally in order.)Reading events in Java When using the Mongo Java Driver, there are a few “catches”. First of all you need to initialise the cursor. To do that, we need to provide (1) the last event id, if present; (2) an order in which we want to read the events (here: natural, that is the insertion order); and (3) two crucial cursor options, that we want the cursor to be tailable, and that we want to block if there’s no new data: DBObject query = lastReceivedEventId.isPresent() ? BasicDBObjectBuilder.start('_id', BasicDBObjectBuilder .start('$gte', lastReceivedEventId.get()).get()) .get() : null;DBObject sortBy = BasicDBObjectBuilder.start('$natural', 1).get();DBCollection collection = ... // must be a capped collection DBCursor cursor = collection .find(query) .sort(sortBy) .addOption(Bytes.QUERYOPTION_TAILABLE) .addOption(Bytes.QUERYOPTION_AWAITDATA); You may wonder why we used >= last_id instead of >. That is needed here because of the way Mongo ObjectIds are generated. With a simple > last_id we may miss some events that have been generated in the same second as the last_id event, but after it. This also means that our Java code must take care of this fact and discard the first event that was received. The cursor’s class extends the basic Java Iterator interface, so it’s fairly easy to use. So now we can take care of batching. When iterating over a cursor, the driver receives the data from the Mongo server in batches; so we may simply call hasNext() and next(), as with any other iterator, to receive subsequent elements, and only some calls will actually cause network communication with the server. In the Mongo Java driver the call that is actually potentially blocking is hasNext(). If we want to process the events in batches, we need to (a) read the elements as long as they are available, and (b) have some way of knowing before getting blocked that there are no more events, and that we can process the events already batched. And as hasNext() can block, we can’t do this directly. That’s why we introduced an intermediate queue (LinkedBlockingQueue). In a separate thread, events read from the cursor are put on the queue as they come. If there are no events, the thread will block on cursor.hasNext(). The blocking queue has an optional size limit, so if it’s full, putting an element will block as well until space is available. In the event-consumer thread, we first try to read a single element from the queue, in a blocking fashion (using .poll, so here we wait until any event is available). Then we try to drain the whole content of the queue to a temporary collection (using .drainTo, building the batch, and potentially getting 0 elements, but we always have the first one).An important thing to mention is that if the collection is empty, Mongo won’t block, so we have to fall back to active polling. We also have to take care of the fact that the cursor may die during this wait; to check this we should verify that cursor.getCursorId() != 0, where 0 is an id of a “dead cursor”. In such a case we simply need to re-instantiate the cursor. Summing up To sum up, we got a very fast event sourcing/streaming solution. It is “self regulating”, in the sense that if there’s a peak of event activity, they will be read by the event sinks with a delay, in large batches. If the event activity is low, they will be processed quickly in small batches. We’re also using the same Mongo instance for other purposes; having a single DB system to cluster and maintain both for regular data and events is certainly great from an ops point of view.   Reference: Event streaming with MongoDB from our JCG partner Adam Warski at the Blog of Adam Warski blog. ...

Not All Optimization Is Premature

The other day the reddit community discarded my advice for switching from text-based to binary serialization formats. It was labeled “premature optimization”. I’ll zoom out of the particular case, and discuss why not all optimization is premature. Everyone has heard of Donald Knuth’s phrase “[..] premature optimization is the root of all evil”. And as with every well-known phrase, this one is usually misinterpreted. And to such an extent that people think optimizing something which is not a bottleneck is bad. That being the case, many system are unnecessarily heavy and consume a lot of resources…because there is no bottleneck. What has Knuth meant? That it is wrong to optimize if that is done at the cost of other important variables: readability, maintainability, time. Optimizing an algorithm can make it harder to read. Optimizing a big system can make it harder to maintain. Optimizing anything can take time that should probably be spent implementing functionality or fixing bugs. In practice, this means that you should not add sneaky if-clauses and memory workarounds in your code, that you shouldn’t introduce new tools or layers in your system for the sake of some potential gains in processing speed, and you shouldn’t spend a week on gaining 5% in performance. However, most interpretations say “you shouldn’t optimize for performance until it hits you”. And that’s where my opinion differs. If you wait for something to “hit” you, then you are potentially in trouble. You must make your system optimal before it goes into production, otherwise it may be too late (meaning – a lot of downtime, lost customers, huge bills for hardware/hosting). Furthermore, “bottlenecks” are not that obvious with big systems. If you have 20 servers, will you notice that one piece of code takes up 70% more resource than it should? What if there are 10 such pieces. There is no obvious bottleneck, but optimizing the code may save you 2-3 servers. That’s why writing optimal code is not optional and is certainly not “premature optimization”. But let me give a few examples:you notice that in some algorithms that are supposed to be invoked thousands of times, a linked list is used where random access is required. Is it premature optimization to change it to array/array list? No – it takes very little time, and does not make the code harder to read. Yet, it may increase the speed of the application a lot (how much is ‘a lot’ doesn’t even matter in that case) you realize that a piece of code (including db access) is executed many times, but the data doesn’t change. This rarely accounts for a big percentage of the time needed to process a request. Is it premature optimization to cache the results (provided you have a caching framework that can handle cache invalidation, cache lifetime, etc.)? No – caching the things would save tons of database requests, without making your code harder to read (with declarative caching it will be just an annotation). you measure that if you switch from a text to a binary format for transmitting messages within internal components you can do it 50%+ faster with half the memory. The system does not have huge memory needs, and the CPU is steady below 50%. Is replacing the text format with a binary one a premature optimization? No, because it costs 1 day, you don’t loose code readability (the change is only one line of configuration) and you don’t loose the means to debug your messages (you can dump them before/after serialization, or you can switch to text-based format in development mode. (yeah, that’s my case from the previous blogpost)So, with these kinds of things, you saved a lot of processing power and memory even though you didn’t have any problems with that. But you didn’t have the problems either because you had enough hardware to mask them or you didn’t have enough traffic/utilization to actually see them. And performance tests/profiling didn’t show a clear bottle-neck. Then you optimize “in advance”, but not prematurely. An important note here is that I mean mainly web applications. For desktop applications the deficiencies do not multiply. If you have a piece of desktop code that makes the system consume 20% more memory, (ask Adobe) then whatever – people have a lot of memory nowadays. But if your web application consumes 20% more memory for each user on the system, and you have 1 millions users, then the absolute value if huge (although it’s still “just” 20%). The question is – is there a fine line between premature and proper optimization? Anything that makes the code “ugly” and does not solve a big problem is premature. Anything that takes two weeks to improve performance 5% is premature. Anything that is explained with “but what if some day trillions of aliens use our software” is premature. But anything that improves performance without affecting readability is a must. And anything that improves performance by just a better configuration is a must. And anything that makes the system consume 30% less resources and takes a day to implement is a must. To summarize – if neither readability, not maintainability are damaged and the time taken is negligible – go for it. If every optimization is labeled as “premature”, a system may fail without any visible performance bottleneck. So assess each optimization rather than automatically concluding it’s premature.   Reference: Not All Optimization Is Premature from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog. ...

How many Java developers are there in the world?

Oracle says it’s 9,000,000. Wikipedia claims it’s 10,000,000. And the guys from NumberOf.net seem to be the most precise – they know that there are exactly 9,007,346 Java developers out there. Nice numbers. I have used those articles as reference points while speaking about the potential market size for our memory leak detection tool. But something in these numbers has bothered me for years – there is no trustworthy and public analysis behind those numbers. Its just conjured up from thin air. So I finally thought I would do something about it and try to figure it out for good. It proved out to be a challenging task. After all – with more than seven billion people on our planet I couldn’t call everyone and ask them. Well, maybe I could, but if every call would take on average 20 seconds I would need at least 4,439 years to complete the survey. If I would not sleep nor eat nor rest. So I had to use other ways for estimation. After playing around with different sources of information, I decided to dig into four of them for a closer look:Labour statistics provided by different governments Language popularity sites such as Tiobe and Langpop Employment portals using Indeed.com and Monster.com Download numbers on popular Java tools and libraries – namely Eclipse and Tomcat.Using that information I wanted to estimate the number using three different calculations – based on language popularity indexes, labour statistics and download figures. So, here we go. How many programmers could there be in total? World population is currently above seven billion. Out of those seven billion we can leave out sub-Saharan Africa (900M) and rural Asia (about 50% of its 2.2B population) as negligible. This leaves us with approximately 5 billion people living in regions where overall economical and cultural background can be considered suitable for software industries to spawn. Now, out of those 5,000,000,000 how many could be actually developing software? A good answer at StackExchange gives us some pointers as to where we can find information on the percentage of software developers in different countries. Using the US, Japan, Canada, the EU27 and the UK as a baseline we can estimate that 0.86% of the population is employed as a software developer or programmer:Country Population Developers %Canada 33,476,688 387,000 1.16%EU27 502,486,499 5,900,000 1.17%Japan 127,799,000 1,016,929 0.80%UK 63,162,000 333,000 0.53%US 313,931,000 1,336,300 0.43%Weighted average: 0.86%0.86% out of five billion is 43,000,000. Lets remember this number, as it will be used as a baseline in following calculations. Popularity contests In the popularity contest we will use two channels for the source of data – the TIOBE index and the Langpop one. Other sources such as Dataist figures were hard to interpret, so we’ll stick just to those two. For the background – the TIOBE ratings are calculated by counting hits of the most popular search engines. The search query that is used is +”<language> programming”, e.g. +“Java programming” in our case.Langpop uses more sources for input besides search engine queries – in equal weights it traces open job positions, book titles, search engine results, the number of open source projects and other data to calculate its popularity score. Simplifying TIOBE and Langpop results, we can conclude that according to TIOBE 17% and according to Langpop ~15% of the programmers in the world are using Java. Averaging those numbers we can say that around 16% out of the 43,000,000 developers in the world use Java. This translates to 6,880,000 Java developers out there. Job portals Job portals, especially when considering both available positions and uploaded resumes, are definitely a good source of information. The larger ones also provide nice reports on labour market, which we will dig into next. Note that we used Indeed.com and Monster.com – if you can point us towards more and/or better sources of information, we would be glad to correct our calculations. But using this analysis from Monster.com and the aggregated statistics from Indeed.com we can say that ~18% of Monster.com applicants can program in Java and ~16% of open engineering / programming positions scanned by Indeed.com are looking for Java talent. Averaging those numbers we arrive at 17%. Which out of 43,000,000 programmers in total would translate to 7,310,000 Java guys and girls in the world. Software downloads Every Java developer uses something to build the application. Well, we expect them to use at least a JVM and a compiler. If you happen to know anyone who can get away without those two, please let us know. We would hire him immediately. But most of us tend to use more than just a compiler and a virtual machine. We use IDEs, application servers, build tools, etc. So we figured that we would look into the publicly available download numbers of these tools and try to estimate the number of developers from the download numbers. When calculating the total number of developers from estimated number of users, we take into account the market share of the corresponding software. To estimate the market share we use Zeroturnaround’s statistics gathered in the spring of 2012. Eclipse downloads. Eclipse Juno was released on June 27 and has been downloaded 1,200,000 times during the first 20 days. Looking into the historical data published by eclipse.org we can predict that Juno will be downloaded approximately 8,000,000 times in total. Last four major Eclipse releases have all been released using a yearly release calendar and all the releases took place in June:Juno – 8,000,000 (in a year, expecting the trend to continue. Currently has 1,200,000 downloads in first 20 days). Indigo – 6,000,000 downloads Helios – 4,100,000 downloads Galileo – 2,200,000 downloadsAveraging Juno estimates and Indigo results, we can say that Eclipse is downloaded approximately 7,000,000 times a year. Using the Zeroturnaround’s statistics, we expect 68% of Java developers to use Eclipse as a (primary) IDE. If we now make a bold claim that each Java developer on Eclipse will download the IDE exactly once a year, expect the number of downloads per year to be 7,000,000 and consider that 32% of Java developers do not use Eclipse at all, we come to a conclusion that there should be 10,300,000 Java developers in total. Apache Tomcat downloads. Vadim Gritsenko has put together some nice statistics on top of Apache logs. From there we can see that during the last year Tomcat has been downloaded approximately 550,000 times/month. This gives us a yearly total of 6,600,000 Tomcat downloads.Applying now statistics from the same report used for calculating Eclipse’s market share we can estimate that 59% of Java developers are using Tomcat as one of their development platform. If we now again make a bold claim that each Java developer on Tomcat will download every major release exactly once and consider that 41% of Java developers do not use Tomcat, we reach to conclusion that there should be 11,186,000 Java developers out there. Averaging the numbers from Eclipse and Tomcat downloads, we end up with 10,743,000 Java developers. Conclusions We used three different sources for estimation – popularity contests, job market analysis and download numbers of popular Java development infrastructure products. The numbers varied quite a bit – from 6,880,000 to 10,743,000. Aggressively averaging the three numbers we can conclude that there are 8,311,000 Java developers out there. Not quite as much as Oracle or Wikipedia think, but still enough to build a business that provides developing tools for the Java community. Lies. Damn lies. And statistics.   Reference: How many Java developers are there in the world? from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog. ...

Interview Prep For Geeks

Failing an interview due to a lack of qualifications is forgivable, but it is tragic when highly qualified candidates do not get an offer due to being unprepared. With the amount of information freely available today, the time and effort required to prepare for an interview is extremely low and a relatively small investment to make in your career. Typically a candidate will have at least two or three days advance notice to do some research and prepare for any interview. Here is a checklist of things for technologists to investigate to be sure you are ready for what will come your way.Company intel – Learn as much as you can about the company, and try to have at least one minute of material memorized to answer the “What do you know about us?” question. Be prepared to present on the company history, the products or services the company provides, details on the business model, and the industry itself (competitors, health of the market, etc.). For technologists, the ability to give an eloquent response to the “Describe what the company does” question is a huge asset that should not be overlooked and could be a significant factor in your success. Gathering substantial information on a young company’s funding status or finances might be difficult, but there will generally be at least some info in press releases from venture partners.Tech environment – Get specific details about the technical environment by doing some basic web research, reviewing any available job descriptions or LinkedIn employee profiles, and talking to your recruiter or any appropriate company contacts you may have. What frameworks, languages, databases, operating systems, and hardware are they using? Even if the details aren’t all entirely relevant to your interview, it will show that you are taking this process seriously. Look up any buzzwords or acronyms you don’t recognize so you can at least discover if you may have experience with a related item (“I haven’t worked with ______, but I’m familiar with ________ which appears to be a similar tool/language”).Tech moves – Knowing the company’s current tech details is valuable, but knowing about some of the company’s technical history will show great initiative while also providing potential insight into how the company views technology and makes tech decisions. Has the company made significant changes to their stack, and if so, why? Are they heavily invested in open source? Do they seem closely linked to a specific vendor? Does the company have an engineering blog or a company GitHub account for you to explore that might contain this information?Interviewer intel – Insight into the technical background and past employers of the individual(s) you will meet is a great advantage, as you may have some similar history. Personal GitHub or Twitter accounts? Technical blog posts? A LinkedIn or web search of the interviewer(s) might turn up some helpful details to use during the interview, as long as you use the info wisely. Showing that you did some research displays initiative, as long as you respect personal space.Confirm the basics – Where are you going and who should you ask for when you get there? Who are you meeting with and what is his/her/their role in the company? What is the preferred dress code? (NOTE: Some companies actually ask that candidates dress more casual, so be sure to ask)Prepare questions and anecdotes – Most interviews will provide you with at least a brief opportunity to ask questions. Although you ideally want to have these memorized, it is generally a good idea to have some questions listed so you don’t forget them under possible duress. There are also some fairly standard questions in the “tell me about a time when…” family which are commonly answered with anecdotes. Give some thought to past challenges, failures, and successes, and especially what lessons you learned from each project.Documents – Some companies may ask you to fill out an application and other relevant documents before the interview. Find out if this is the case and if so get those completed before interview day. Make sure to print out at least three copies of your resume and one copy of your list of questions. Think about who you will list as references if asked on the application, and have their info (name, email) available.Keep in mind that making a solid impression in an interview is something that can make a huge impact down the road, whether or not you get the job. Interviewers remember candidates who impressed, and they absolutely will remember those who crashed and burned as well. Do your homework and take interviews seriously, not just for the sake of getting this job but for opportunities later in your career.   Reference: Interview Prep For Geeks from our JCG partner Dave Fecak at the Job Tips For Geeks blog. ...

Why Scrum Won

In the 1990s and early 2000s a number of different lightweight ‘agile’ development methods sprung up. Today a few shops use Extreme Programming, including most notably ThoughtWorks and Industrial Logic. But if you ask around, especially in enterprise shops, almost everybody who is “doing Agile” today is following Scrum or something based on Scrum. What happened? Why did Scrum win out over XP, FDD, DSDM, Crystal, Adaptive Software Development, Lean, and all of the other approaches that have come and gone? Why are most organizations following Scrum or planning to adopt Scrum and not the Agile Unified Process or Crystal Clear (or Crystal Yellow, or Orange, Red, Violet, Magenta or Blue, Diamond or Sapphire for that matter)? Is Scrum that much better than every other idea that came out of the Agile development movement? Simplicity wins out Scrum’s principal strength is that it is simpler to understand and follow than most other methods – almost naively simple. There isn’t much to it: short incremental sprints, daily standup meetings, a couple of other regular and short planning and review meetings around the start and end of each sprint, some work to prioritize (or order) the backlog and keep it up-to-date, simple progress reporting, and a flat, simple team structure. You can explain Scrum in detail in a few pages and understand it less than an hour. This means that Scrum is easy for managers to get their heads around and easy to implement, at a team-level at least (how to successfully scale to enterprise-level Scrum in large integrated programs with distributed teams using Scrum of Scrums or Communities of Practice or however you are supposed to do it, is still fuzzy as hell). Scrum is easy for developers to understand too and easy for them to follow. Unlike XP or some of the other more well-defined Agile methods, Scrum is not prescriptive and doesn’t demand a lot of technical discipline. It lets the team decide what they should do and how they should do it. They can get up to speed and start “doing Agile” quickly and cheaply. But simplicity isn’t the whole answer But there’s more to Scrum’s success than simplicity. The real trick that put Scrum out front is certification. There’s no such thing as a Certified Extreme Programmer but there are thousands of certified ScrumMasters and certified product owners and advanced certified developers and even more advanced certified professionals and the certified trainers and coaches and officially registered training providers that certified them. And now the PMI has got involved with its PMI-ACP Certified Agile Practitioner designation which basically ensures that people understand Scrum, with a bit of XP, Lean and Kanban thrown in to be comprehensive. Whether Scrum certification is good or bad or useful at all is beside the point. Certification helped Scrum succeed for several reasons. First, certification lead to early codification and standardization of what Scrum is all about. Consultants still have their own ideas and continue to fight between themselves over the only right way to do Scrum and the future direction of Scrum and what should be in Scrum and what shouldn’t, but the people who are implementing Scrum don’t need to worry about the differences or get caught up in politics and religious wars. Certification is a win win win… Managers like standardization and certification – especially busy, risk-adverse managers in large mainstream organizations. If they are going to “do Agile”, they want to make sure that they do it right. By paying for standardized certified training and coaching on a standardized method, they can be reassured that they should get the same results as everyone else. Because of standardization and certification, getting started with Scrum is low risk: it’s easy to find registered certified trainers and coaches offering good quality professional training programs and implementation assistance. Scrum has become a product – everyone knows what it looks like and what to expect. Certification also makes it easier for managers to hire new people (demand a certification upfront and you know that new hires will understand the fundamentals of Scrum and be able to fit in right away) and it’s easier to move people between teams and projects that are all following the same standardized approach. Developers like this too, because certification (even the modest CSM) helps to make them more employable, and it doesn’t take a lot of time, money or work to get certified. But most importantly, certification has created a small army of consultants and trainers who are constantly busy training and coaching a bigger army of certified Scrum practitioners. There is serious competition between these providers, pushing each other to do something to get noticed in the crowd, saturating the Internet with books and articles and videos and webinars and blogs on Scrum and Scrumness, effectively drowning out everything else about Agile development. And the standardization of Scrum has also helped create an industry of companies selling software tools to help manage Scrum projects, another thing that managers in large organizations like, because these tools help them to get some control over what teams are doing and give them even more confidence that Scrum is real. The tool vendors are happy to sponsor studies and presentations and conferences about Agile (er, Scrum), adding to the noise and momentum behind Scrum. Scrum certification is a win win win: for managers, developers, authors, consultants and vendors. It looks like David Anderson may be trying to do a similar thing with Kanban certification. It’s hard to see Kanban taking over the world of software development – while it’s great for managing support and maintenance teams, and helps to control work flow at a micro-level, Kanban doesn’t fit for larger project work. But then neither does Scrum. And who would have picked Scrum as the winner 10 years ago?   Reference: Why Scrum Won from our JCG partner Jim Bird at the Building Real Software blog. ...

Type-safe Empty Collections in Java

I have blogged before on the utility of the Java Collections class and have specifically blogged on Using Collections Methods emptyList(), emptyMap(), and emptySet(). In this post, I look at the sometimes subtle but significant differences between using the relevant fields of the Collections class for accessing an empty collection versus using the relevant methods of the Collections class for accessing an empty collection. The following code demonstrates accessing Collections‘s fields directly to specify empty collections. Using Collections’s Fields for Empty Collections /** * Instantiate my collections with empty versions using Collections fields. * This will result in javac compiler warnings stating 'warning: [unchecked] * unchecked conversion'. */ public void instantiateWithEmptyCollectionsFieldsAssigment() { this.stringsList = Collections.EMPTY_LIST; this.stringsSet = Collections.EMPTY_SET; this.stringsMap = Collections.EMPTY_MAP; } The code above compiles with javac, but leads to the warning message (generated by NetBeans and Ant in this case): -do-compile: [javac] Compiling 1 source file to C:\java\examples\typesafeEmptyCollections\build\classes [javac] Note: C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. Specifying -Xlint:unchecked as an argument to javac (in this case via the javac.compilerargs=-Xlint:unchecked in the NetBeans project.properties file) helps get more specific warning messages for the earlier listed code: [javac] Compiling 1 source file to C:\java\examples\typesafeEmptyCollections\build\classes [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:27: warning: [unchecked] unchecked conversion [javac] this.stringsList = Collections.EMPTY_LIST; [javac] ^ [javac] required: List<String> [javac] found: List [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:28: warning: [unchecked] unchecked conversion [javac] this.stringsSet = Collections.EMPTY_SET; [javac] ^ [javac] required: Set<String> [javac] found: Set [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:29: warning: [unchecked] unchecked conversion [javac] this.stringsMap = Collections.EMPTY_MAP; [javac] ^ [javac] required: Map<String,String> [javac] found: Map NetBeans will also show these warnings if the appropriate hint box is checked in its options. The next three images demonstrate ensuring that the appropriate hint is set to see these warnings in NetBeans and provides an example of how NetBeans presents the code shown above with warnings.Fortunately, it is easy to take advantage of the utility of the Collections class and access empty collections in a typesafe manner that won’t lead to these javac warnings and corresponding NetBeans hints. That approach is to use Collections‘s methods rather than its fields. This is demonstrated in the next simple code listing. Using Collections’s Methods for Empty Collections /** * Instantiate my collections with empty versions using Collections methods. * This will avoid the javac compiler warnings alluding to 'unchecked conversion'. */ public void instantiateWithEmptyCollectionsMethodsTypeInferred() { this.stringsList = Collections.emptyList(); this.stringsSet = Collections.emptySet(); this.stringsMap = Collections.emptyMap(); } The above code will compile without warning and no NetBeans hints will be shown either. The Javadoc documentation for each field of the Collections class does not address why these warnings occur for the fields, but the documentation for each of the like-named methods does discuss this. Specifically, the documentation for Collections.emptyList(), Collections.emptySet(), and Collections.emptyMap() each state, ‘(Unlike this method, the field does not provide type safety.)’Use of the Collections methods for empty collections shown in the last code listing provided type safety without the need to explicitly specify the types stored within that collection because type was inferred by use of the Collections methods in assignments to known and already declared instance attributes with explicitly specified element types. When type cannot be inferred, compiler errors will result when using the Collections methods without an explicitly specified type. This is shown in the next screen snapshot of attempting to do this in NetBeans.The specific compiler error message is: [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:62: error: method populateList in class Main cannot be applied to given types; [javac] populateList(Collections.emptyList()); [javac] ^ [javac] required: List<String> [javac] found: List<Object> [javac] reason: actual argument List<Object> cannot be converted to List<String> by method invocation conversion [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:63: error: method populateSet in class Main cannot be applied to given types; [javac] populateSet(Collections.emptySet()); [javac] ^ [javac] required: Set<String> [javac] found: Set<Object> [javac] reason: actual argument Set<Object> cannot be converted to Set<String> by method invocation conversion [javac] C:\java\examples\typesafeEmptyCollections\src\dustin\examples\Main.java:64: error: method populateMap in class Main cannot be applied to given types; [javac] populateMap(Collections.emptyMap()); [javac] ^ [javac] required: Map<String,String> [javac] found: Map<Object,Object> [javac] reason: actual argument Map<Object,Object> cannot be converted to Map<String,String> by method invocation conversion [javac] 3 errors These compiler errors are avoided and type safety is achieved by explicitly specifying the types of the collections’ elements in the code. This is shown in the next code listing. Explicitly Specifying Element Types with Collections’s Empty Methods /** * Pass empty collections to another method for processing and specify those * empty methods using Collections methods. This will result in javac compiler * ERRORS unless the type is explicitly specified. */ public void instantiateWithEmptyCollectionsMethodsTypeSpecified() { populateList(Collections.<String>emptyList()); populateSet(Collections.<String>emptySet()); populateMap(Collections.<String, String>emptyMap()); } The Collections class’s methods for obtaining empty collections are preferable to use of Collections‘s similarly named fields for that same purpose because of the type safety the methods provide. This allows greater leveraging of Java’s static type system, a key theme of books such as Effective Java. A nice side effect is the removal of cluttering warnings and marked NetBeans hints, but the more important result is better, safer code.   Reference: Type-safe Empty Collections in Java from our JCG partner Dustin Marx at the Inspired by Actual Events blog. ...

Wasting time by saving memory

You might say that at my company, the hard ware is 10x more expensive, but it also likely you time is costing the company about the same more. In any case, this article attempts to demonstate that there is a tipping point where it no longer makes sense to spend time saving memory, or even thinking about it.              time spent cheap memory expensive memory cheap disk expensive diska screen refresh 20 ms      27 KB 150 bytes        1 MB   24 KBone trivial change ~1 sec     1.4 MB   7.6 KB      60 MB     1.2 MBone command ~5 sec        7 MB   50 KB    400 MB     6 MBa line of code ~1 min      84 MB 460 KB   3,600 MB   72 MBa small change ~20 min  1600 MB     9 MB 72,000 MB     1.4 GBa significant change ~1 day       40 GB   0.2 GB   1,700 GB   35 GBa major change ~2 weeks      390 GB   2 GB 17,000 GB 340 GB  Your mileage may vary, but just today some one asked how to save a few bytes by passing short instead of int as method arguments (Java doesn’t save any memory if you do) Even if it did save as much as it might, the time taken to ask the question, let alone implement and test it, could have been worth 10,000,000 times the cost of memory it could have saved. In short; don’t fall into the trap of a mind boggling imbalance of scale.   Reference: Wasting time by saving memory from our JCG partner Peter Lawrey at the Vanilla Java blog. ...
Java Code Geeks and all content copyright © 2010-2015, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below: