What's New Here?

jcg-logo

Best Of The Week – 2012 – W10

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));Hello guys, Time for the “Best Of The Week” links for the week that just passed. Here are some links that drew Java Code Geeks attention: * How to Hire a Programmer: Some tips on how to hire a programmer like checking fir his portfolio, having a phone screen, assigning an audition project etc. Also check out our brand new Jobs section. * Introducing Spring Integration Scala DSL: Sping introduces the Spring Integration Scala DSL, Domain Specific Language written in Scala with the goals of providing a strongly-typed alternative to XML configuration for Spring Integration, providing first class integration with various Scala frameworks and products such as Akka and providing seamless integration with Java where Scala developers can still leverage their existing Java investments. * Akka 2.0 Released!: On a similar note, Akka version 2.0 has been released. Akka is a Scala/Java framework suitable for programming for concurrency, fault-tolerance and scalability. * Zero Defects : Baking Quality into the Agile Process: In this presentation it is explained how to use testing and defect management in an Agile project to ensure product quality, addressing design quality, legacy systems, and how build management affects quality. Also check out 4 Warning Signs that Agile Is Declining and Agile’s Customer Problem. * What Every Programmer Should Know About SEO: As the title says, this article provides some insights on SEO and how to optimize it. * Continuous Integration in the Mobile World: This presentation discusses using CI for iOS and Android apps, headless emulators, tools for unit and functional testing, and mobile app deployment. The whole process is described, from creating a new project to emailing the project’s artifacts. Also see “Android Full Application Tutorial” series and The Ten Minute Build.* How sheep-like behavior breeds innovation in Silicon Valley: A nice article on how we try to apply pattern-matching in a world of low probability, usually with bad results. I liked the mention at Survivorship Bias, i.e. that we draw our pattern-recognition from well-publicized successful companies while ignoring the negative data from companies that might have done many of the same things, but end up with unpublicized failures. * Say Hello to Jelastic: This brief tutorial shows how to get started with Jelastic by deploying a simple Spring MongoDB application to the cloud. Also see OpenShift Express: Deploy a Java EE application (with AS7 support). * Speeding Up Java Test Code: An article that provides pointers on how to reduce test execution times with tips like avoiding “sleeps” in the code, reducing the hierarchy of test classes, avoiding to hit the database, minimizing file I/O etc. * DRY and Duplicated Code: This post briefly discusses the DRY (Don’t Repeat Yourself) principle and it’s most common violation, code duplication. Gives an example of DRY violation and discusses how to identify it and treat it. Also see Integrating Maven with Ivy that talks about building an infrastructure that can identify DRY principle violations. * I’m an Engineer, Not a Compiler: This article discusses phone interviews and the fact that very often “nano-questions” are used in the processed, i.e. questions that could be answered with a quick google search. Those carry a lot of disadvantages, including generating false negatives. * You can’t achieve REST without client and server participation: In this article the author suggests that when building a RESTful system both client and server have to be good citizens. A RESTful system has a shared set of responsibilities between client and server, so both sides should receive the proper attention by the developers/architects. * Basic Application Development with Spring Roo and SQLFire: This presentation introduces Roo (Rapid Application Development framework) and SQLFire (Memory-oriented clustered database) along with a demonstration of using AspectJ for SQLFire administration. *Java Book Recommendations: Top and Best Java Books: What the title says. Also see Top 10 Java Books you don’t want to miss and Java Developer Most Useful Books. * Why Do IT Pros Make Awful Managers?: A very interesting article on why IT professionals usually make awful managers. It basically boils down to choosing the computer interface over the human interface and the fact that sucessful people have the belief that success and smartness applies to pretty much everything they do. * Ask For Forgiveness Programming – Or How We’ll Program 1000 Cores: Interesting article on parallele programming and concurrency. It is stated that the only way we’re going to be able to program the new multicore chips of the future is to sidestep Amdhal’s Law and program without serialization, without locks, embracing non-determinism. In essence, removing serialization is the only way to use all the available cores. That’s all for this week. Stay tuned for more, here at Java Code Geeks. Cheers, Ilias Tsagklis...
software-development-2-logo

Mentorship in Software Craftsmanship

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));First, a little bit of background and metaphor In the medieval times, apprentices would work in workshops an would be mentored by senior craftsmen (journeymen) or by the master craftsman himself. The apprentice had the responsibility to learn, observing the master’s and everyone else’s work, questioning everything and practising as much as he could. This was different from the teacher/student relationship where the teacher had the responsibility to teach. Here it was the apprentice that had to push his own development. The apprentice, mentored by the master, would learn and refine his skills on a daily basis. Over time, the master would reach his limit as in what he could teach the apprentice and the knowledge and skills gap between both would not be that big any more. The master would then publicly recognise the apprentice as a professional that could take on work on his own and would deliver it with the same quality that the he would deliver himself. The master, at this point, would be putting his own reputation on the line since he was the one that trained the apprentice and was now vouching for his ability. This would be the graduation point. Now the apprentice was ready to start his own journey, as a journeyman. As a journeyman, he would then go from town to town, work for and learn from different masters up to a point that he would be recognised by all of these masters and other craftsmen as a master himself. He would then be ready to have his own shop and start mentoring other journeymen and apprentices. Back to the present From now on, instead of master/apprentice, I’ll be using mentor/mentee. The main reason is that you don’t need to be a master craftsman to mentor someone. You also don’t need to be an apprentice to have a mentor. Besides that, each developer has different areas of expertise. They can be very senior in certain areas and completely ignorant in other areas. As we all know, software development is not as limited as a blacksmith’s workshop in the medieval times. The role of the mentor Deciding to mentor someone is a big responsibility. The role of a mentor is more than just be available on the phone and answer a few questions here and there. Although this can be very helpful, the role of a mentor goes way beyond than that. The mentor has the responsibility to make the mentee a good professional and that includes the technical and the personal side. More than just teaching the mentee a specific framework or technique, the mentor will also be teaching the mentee how to become a professional software developer, including ethics and code of conduct. From the technical perspective, the mentor will do his best to teach everything he knows. If they don’t work together, the mentor is expected to reserve formal time to work with the mentee. From the personal perspective, a mentor should help the mentee in his career (journey), giving some guidance, advices, opening doors or showing the doors he has already opened himself, introducing the mentee to his own professional circle and whatever else the mentor judges that can be important for the mentees. The role of the mentee The mentee is expected to do whatever he or she can to learn from the mentor. The mentee must be open-minded, be able to take criticism on board, be able to listen and also be very proactive in terms of perpetuating the knowledge. Practice is key. The mentee is expected to practice as much as possible, making him or herself better everyday. Mentees are also expected to produce something. Something where the mentor can measure their progress and possibly identify areas of improvement. Besides the direct benefits to the mentee, this is also the best way to keep mentors excited to be helping. Mentees must show passion and a clear desire to learn otherwise mentors will probably loose interest and find it a waste of time. Gains and sacrifices Mentors have the opportunity to perpetuate their knowledge since they need to organise their ideas to teach someone. Mentors will also need to be studying and practising hard to keep feeding his mentee, what obviously is a good thing. They will get the satisfaction of helping developers to progress in their careers, with good foundation, ethics and professionalism. They will be doing their part in rasing the bar of our industry, training the next generation, and this, on its own, is a very noble cause. But there are sacrifices and the main one is time. Mentors are expected to dedicate time to their mentees. If mentor and mentee work together in a daily basis, they won’t need much time outside work. However, if they don’t, mentors need to be clear they will need to find and reserve time for their mentees, ideally in a regular basis. Mentees have the opportunity to speed up their development as professional software developers. They benefit from the mentor’s experience acquired over the years, shortening the time they would take to learn something and avoiding to commit the same mistakes. They also have someone they trust that could offer a different perspective on the things they are going through (technical issue, problem with a manager, process, bureaucracy, deadlines, estimation, etc). Without any scientific evidence to back me up, I would dare to say that with the right attitude from both parts and a good amount of time together, the mentee could learn in two years what the mentor learned in ten. That gives the mentees a massive head-start in their careers. Time should never be called a sacrifice from the mentees perspective. When someone is getting a lot of knowledge and help for free, possibly saving years of their careers, complaining about time (or lack of it) would be very short-sighted, to say the least. Mutual Respect Both mentors and mentees should respect each other. In order to have a healthy and prosperous relationship, there must be a commitment from both parts. This commitment is the mutual respect. The mentor shows respect with regular interactions with the mentee and time commitment. The mentee shows respect with his commitment to excel and demonstration of progress. That’s the minimum a mentee could do for someone that is sacrificing time with family and friends in order to help him. So far I’ve given a bit of background history and also described the roles and responsibilities of mentors and mentees according to the Software Craftsmanship principles. Now I’ll be convering a few other areas related to the relationship itself. Remember that the focus here is on mentorships outside work. Let’s start from the beginning. How do we find mentors or mentees? That’s probably the most difficult question to answer. I can think in three possibilities (in no particular order):Try to find someone that works with you in a daily basis. It will make the approach easier. Via user groups and communities. If you are an active member of a user group or community you will have the opportunity to meet a lot of people and maybe find the mentor or mentee you are looking for. Via indication: A friend could recommend/introduce you to someone.Choosing a mentor Although not a rule, normally, in Software Craftsmanship, it is the mentee that chooses the mentor or, at least, start the conversation. The mentee needs to have an idea of the path he wants to take. For example, he may want to learn more about mobile development, work as a consultant, work in high-performance low-latency systems, learn about gaming development, improve his automated testing techniques or, in the best case, just learn how to become a better developer. Whatever the mentee’s ambitions are, the mentee needs to make sure that his future mentor is a person that can provide the knowledge he or she is looking for. Unfortunately, this may be easier said than done. Depending where the mentee is in his career, he may not know exactly what he wants. He may not even know what options he has. Although good developers are good developers and the foundation and priciples of software development always apply, we all know that the skills set used in different types of systems can vary a lot. A well experienced developer that spent the majority of his career doing web-development and became an expert in user experience may take a while to perform well in a completely server-side, asynchronous application with no GUI. Working on an application that heavily rely on algorithms and that don’t use any database can be totally different from developing a mobile client application. The same way that working for a consultancy company can be very different from working for a bank or a startup. Those are some of the things that developers at early stages of their careers may not even have a clue. If you are at this stage in your career, the best suggestion is that you focus purely on being a great developer and learn the basics. Things like Test-Driven Development, clean code, refactoring, Object-Oriented principles, different languages and paradigms. Once you have that, it will be easier for you to choose your next move. Choosing a mentor involves is not an easy task. Maybe a single mentor won’t be able to teach everything a mentee wants to know. Mentees should focus on their very next step(s) instead of focusing in everything they want to learn throughout their entire career. Mentees may change their minds quite often as soon as new opportunities and options are presented to them. They should keep their options open and choose a different mentors as time goes by. Choosing a mentee Mentors, before accepting a mentee, should ask themselves: Would I be able to dedicate time to this person? Can I really help his or her career? If the answer is yes, fantastic. The first thing to look for in a mentee is passion. Since a mentor will be allocating time to help someone, they should make sure that this person deserves their help. Make sure that the mentee is committed to be better. Another thing mentors need to decide is what sort of seniority level their future mentee should have. Some mentors will prefer to take graduates, others prefer juniors with one or two years of experience and others prefer to get seniors or mentees on the verge of becoming seniors. Choosing a mentee is a very personal thing and different mentors will have completely different criteria. If the mentor already knows the mentee, than it is easier. When the mentor doesn’t know the mentee I could suggest a few options that could either be combined or used independently:Introduction letter: The mentor could ask the mentee for a summary of his career so far (not a CV). In this summary, the mentee would describe what he has done, where he thinks he are in his career, the things he learnt, provide (if any) links to his public profile like github account, twitter, blog, web/mobile applications and most importantly, what he expect from the mentor. Coding challenge: The mentor can set up a challenge before considering to speak and accept a mentee. E.g. the mentee, using his preferred language, needs to write a simple blog or wiki application, have it deployed in somewhere (heroku, cloudbees, Google App Engine, Amazon Beanstalk, etc) and code should be public (github, for example). Or it could be simpler like solving a few katas using two different languages or anything along these lines. Blog: Mentee should create a blog, if he or she does not have one, and publish a few posts related to things he has been learning on his own.The mentor, if going down this route, should set the challenges according to the level of seniority he is expecting from the mentee. Once the mentor is satisfied with the initial effort by the potential mentee, he could decide if he is going to accept or not the mentee. Mentorship common misconceptions The mentor is not better than the mentee. In general the mentor will have more knowledge and experience in the areas where the mentee has chosen to be mentored. However, the mentee can easily have more knowledge than the mentor in other areas. Mentees should not expect the mentor to have all the answers to all the problems and mentors should not be naive to think that the mentee doesn’t know anything. Both mentors and mentees have skills and limitations and this needs to be understood by both parts. Mentors should not be looking for themselves a few years younger. This can be frustrating for the mentor. People are different and the mentor will be missing out when not being more open to the type of mentee he or she is prepared to take on board. There is no reason to be so restrictive. Mentoring someone with different personality, possibly slightly different ambitions can be extremelly enriching for both mentors and mentees. However it is important that both have same values and principles. Mentees should not expect that mentors will change their lives. The mentors will be there for the mentees, giving them advices and teaching what they know but it is up to the mentee to decide what to do with it. Mentees should do their part to improve and not think they are going to be spoon-fed by mentors. It’s up to the mentee to look after his or her own career. Walking the long road together (or at least part of it) Once the relationship between mentor and mentee is established, it is fair to say that they will be in a journey together. Every software craftsman is on a personal journey towards mastery. They are all walking the long road. Both mentor and mentees will be learning while they share part of their journey with each other. What mentors and mentees should do during the mentorship? Well, this may vary a lot depending of the type of mentorship. In general, they are expected to write code, loads of code. Ideally they should build something together, where the mentee should be the most active in terms of writing the code. The mentor is expected to pair-program with the mentee whenever possible and to be reviewing his code. Agreeing on creating an application would probably be the best option since that would involve not just writing the code but also thinking about requirements, being able to prioritize, defining a development process, deployment strategies, production environment and everything else that is related to a software project in the real life. Working on katas is also a valid. It’s a simpler and quicker approach for the mentee to learn basic skills. This could be used when the mentees are interested in the basics of TDD, naming, refactoring, programming languages paradigms, etc. However, after learning a few basic skills using katas, they should aim to do something larger that really can mimic the sort of applications they would be working on in their professional environments. Establishing goals and tracking progress Establishing goals is definitely a good practice in a mentorship. It keeps mentor and mentees focused in whatever they want to achieve. Working towards an objective is always the best way to study and learn something. Goals should be achievable and used to motivate and direct the menteed and not to be treated as a hard deadline. However, they should be as concrete as they can be, for example, writing an application with features X, Yand Z and have it deployed into production or doing a number of katas using TDD, write blog posts, read books, submit patches to open source projects, or whatever the pair agrees on. It’s important that progress is tracked so both mentor and mentees can see how they are getting on and how much the mentee is evolving. Tracking progress is all about feedback. The quicker and shorter the feedback loop is, the better. It’s also a good tool to trigger conversation about improvements and refinements. How long a mentorship should last? Unfortunately that’s another question that does not have an exact answer. This will depend a lot on the type of mentorship, how much the mentee wants to learn from the mentor and also how much the mentor has to offer. Some say that this should be a lifetime commitment, some say that it should last between 2 to 5 years and some say that it could be as short as a few months. Some mentorships start with very technical and specific things like learning the basics of a language or test disciplines. However they can evolve to an entire project lifecycle or even to a longer term career advice, networking, help with books and conferences, etc. I, personally, would never try to define that upfront. Just enjoy the journey and let time tell when the mentorship should terminate. How and when does it terminate? For the majority of the relationships, both mentor and mentees at some point need to continue their journey in separate ways. This does not mean that they will never talk to each other again or that they don’t like each other. This just means that they need to move on, teach or learn from other people. Also, we all change our minds in terms of what our next steps would be and we need to react to that. One important thing is that regardless who wants to terminate the relationship, this termination should be explicit. It should be clear to both parts that the mentorship is over. Professionalism, Reputation and Public recognition Software craftsmanship is all about professionalism and it is almost impossible to talk about professionalism without talking about reputation. Throughout the mentorship, it is important that the mentor advertises the progress of her mentee. Mentors should public recognise all the skills acquired by the mentee, what would help the mentee to start building her own reputation. However, mentors also need to be aware that every time they vouch for someone, they are also putting their reputations on the line. Mentees are also expected to be recognising their mentors for all the things they’ve been learning. This mutual recognition is one of the things that may help both to build their reputations. Raising the bar Mentorship is at the heart of software craftsmanship and this is probably one of the most efficient ways for us, developers, to help raising the bar of our industry. Sharing our knowledge and experiences with each other is what will help us learn from our mistakes and successes. We probably can teach many things to someone in a fraction of the time that took us to learn them. With this, mentees could absorb a lot of knowledge in a much shorter space of time, making them, overtime, a much more complete professional than any of the mentors she had in her career. Regardless of our level of seniority or if we are being mentored by someone else, if we all take the responsibility to mentor someone, we will all be helping to raise the bar of our industry. Reference: Mentorship in Software Craftsmanship – part 1, Mentorship in Software Craftsmanship – part 2 & Mentorship in Software Craftsmanship – part 3 from our JCG partner Sandro Mancuso at the Crafted Software blog....
agile-logo

Agile Lifecycles for Geographically Distributed Teams

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));I’ve been working with geographically distributed and dispersed teams for the past couple of years. Some of them on quite large programs, some of them reasonably small. What they all have in common is that they all want to transition to agile. Most of them start this way: someone takes a Scrum class, gets all excited. This is good. Then reality hits. Scrum is meant for collocated geographically cross-functional teams. Uh oh. Almost all of these teams are separated by function: the developers are in one place, the testers are in another, the business analysts are in a third place, the project managers are in a fourth places, and if there are product owners (or what passes for product owners) they are often in a fifth location. It’s not uncommon for every single function of the team to be separate from every other member of the team. So, the teams don’t fit the Scrum criteria. Uh oh. Since Scrum has so much brand recognition, these people think if they can’t do Scrum, they can’t do Agile. Nope, not so. What they need to do is start from the values and principles of the Agile Manifesto, and go from there. They create their own lifecycle, and their very own brand of Agile. When I worked with one client, that client thought they could extend their iteration. Nope, if anything, that means you keep the iterations even shorter, because you need more frequent feedback when no one is in the same place. Well, there were words. And more words. But, if you start from the values, you see that short iterations are the way to go if you want to be agile. Otherwise, you get staged delivery, which is a lovely lifecycle, but not agile. I’m blogging a series of examples. Please don’t ask me why the people ended up in these locations. I have no idea. All I know is that’s where the people are. Example 1: Using a Project Manager With Iterations, Silo’d Teams One IT organization has teams with developers in the Ukraine, testers in India, product managers and project managers in the UK, and enterprise architecture and corporate management in the eastern US. This organization moved to two-week iterations. The developers were 3.5 hours ahead of the testers, which was not terrible. This organization had these problems:The product managers had to learn to be product owners and write stories that were small enough to finish inside one iteration. The enterprise architects had to stop dictating the architecture without features to hang off the architecture. The developers and testers had to learn to implement by feature so the architects could help the team see the evolving architecture.This organization had a ton of command-and-control to start. The project managers needed to facilitate the teams, not control them. The architects needed to help the teams see how to organize the product, not to tell the developers what to do. The testers needed to not be order-takers, as in taking orders from the developers. You might ask why the organization wanted to move to agile. Senior management wanted agile because the releases got longer and longer and longer, and could not accommodate change. Agile was a complete cultural shift. The two-week iterations, along with an agile roadmap of features helped a lot. The pilot project team consisted of the developers, testers, a product manager, and a project manager. The team rejected the enterprise architect as a member of the team because the architect refused to write code. Release planning: The project manager and the product manager do an initial cut at release planning as a strawman and presented it to the team. “Can you do this? What do you think?” Iteration planning: The team does iteration planning together, making sure every story is either small, medium, or large, where a large story can be done by the entire team in fewer than three days. The team makes sure they get every started story to done at the end of the iteration. Daily commitment: The team does a daily checkin, not a standup. They timebox the checkin to 15 minutes. They ask these questions:What did you complete and with whom yesterday? (reinforces the idea that people work together) What are you working on and with whom today? What are your impediments?The project manager who acts as a servant leader, not a command/controller manages the impediments. The pilot project has two experienced agile people: the project manager and a developer. Both act as servant leaders. Measurements: burnup charts, impediment charts The pilot team has been together for six months now, and is successful. This is not Scrum. It’s not Kanban. It’s agile and it’s working. They are ready to start another project team, working by attraction. Example 2: Using a Project Manager with Kanban, Silo’d Teams This is a product development organization with developers in Italy, testers in India, more developers in New York, product owners and project managers in California. This organization first tried iterations, but the team could never get to done. The problem was that the stories were too large. Normally I suggest smaller iterations, but one of the developers suggested they move to kanban. The New York developers had a problem biting off more than they could chew. So nothing moved across their board. The Italy developers had a board where the work did move across the board. The teams took pictures of their boards every day and shared the work across a project-based wiki. That allowed the New York-based developers to see the work move across the Italy board. And, that encouraged the New Yorkers to call the Italians and ask some questions. That helped the New Yorkers to change the size of their work by working with the product owners. Now, why did the New Yorkers have such trouble originally? Because the developers “knew better” than the product owners, so they changed the stories into architectural features when they had originally received them. (Now they don’t. They leave the stories as real stories.) Release planning: Management in California plan with agile roadmaps. They have features planned specifically week-by-week for the next 6 weeks, and have more of a quarter-by-quarter approach after that. Iteration planning: No iteration planning because they are using kanban. Daily commitment: No daily commitment needed because they use kanban. They do have a checkin a few times a week with each other as a technical team to make sure they don’t create bottlenecks and that they respect the WIP (work in progress) limits. At one point, both the New York and Italy developer teams created automated tests so that the testers could catch up and stay caught up with regression tests. They add a story like that every couple of weeks, and they are paying down their automated testing debt. The Project manager keeps an eye on the WIP, work in progress. Project manager also shepherds the product owner into keeping the queue of incoming work full and properly ranked. The product owner is notorious for changing the incoming work queue all the time. Project manager makes sure the team does retrospectives and is a little unclear how to do them in such a distributed team. The project manager is not so sure their retrospectives are working, and has started an obstacle list, to make sure the team has transparency for their obstacles. Measurements: cumulative flow, average time to release a feature into the product. Example 3: Using a Project Manager with Iterations and Kanban and Silo’d Teams Here, the developers were in Cambridge, MA, the product owners were in San Francisco, the testers were in Bangalore, and the project manager was always flying somewhere, because the project manager was shared among several projects. The developers knew about timeboxed iterations, so they used timeboxes. Senior management had made the decision to fire all the local testers and buy cheaper tester time over the developers’ objections and move the testing to Bangalore. The Indian testers were very smart, and unfamiliar with the product, so the developers suggested the testers test feature by feature inside the iteration. The project manager suggested they use cumulative flow diagrams and cycle time measurements to make sure the developers were not developing “too fast” for the testers. The developers, still smarting over the loss of “their testers” were at first, peeved about this. They then realized the truth of this statement, and developed this kanban board.You can see in this board, that four items are waiting to go into system test. Uh oh. The developers are out-producing what the testers can take. This is precisely what a kanban board can show you. The testers aren’t stupid or slow. They are new. They cannot keep up with the developers. It’s a fact of life, not a mystery of life. The developers have to act in some way to help the testers or the entire project will fail. The reason they are working in timeboxes as well as using kanban is that they have several contractual deliverables, that management, bless their tiny little hearts, committed to. The timebox allows the team or the product owners to meet with their customers and show them their progress. (They were deciding who would meet when I last worked with the team.) The kanban board help make the progress even more transparent. Iteration planning: The product owner and the project manager jointly work on the agile feature roadmap, and the product owner owns the roadmap responsibility for it. The product owner owns and generates the backlog. The product owner and the agile project manager present a strawman iteration backlog to the team at the start of the iteration. They have had difficulty finding iteration planning time that allows everyone to be awake and functioning, bless the senior managers’ little hearts. Daily commitment: They do a handoff, asking each other what they completed that day and what the impediments are. If you have read Manage It!, you know I modified the three questions to “What did you complete, what are you planning to complete, what is in your way?” Measurements: cumulative flow, average time to release a feature into the product. They are experimenting with burnup charts and impediment charts. They are still having trouble bringing the testers up to speed fast enough. Yes, they do retrospectives at the end of each iteration. Yes, the product owners own the backlogs. I’ll summarize in the final part, the next entry. (Want to learn to work more effectively on your geographically distributed team? Join Shane Hastie and me in a workshop April 17-18, 2012.) Reference: Agile Lifecycles for Geographically Distributed Teams, Part 1, Agile Lifecycles for Geographically Distributed Teams, Part 2 & Agile Lifecycles for Geographically Distributed Teams, Part 3 from our JCG partner Johanna Rothman at the Managing Product Development blog....
software-development-2-logo

A Tale of Two Cultures: Hackers and Enterprise Developers

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));Today I found myself thinking again of what I see as two distinct cultures in the development world: Hackers and Enterprise Developers. This really isn’t any kind of a rant just an observation that I’ve been thinking over lately. Hackers are really bleeding edge. They have no problem using the commandline, using multiple languages, or contributing back to open source. They’ll find and fix bugs in the opensource software they use and issue pull requests frequently. They’ll always be willing to use new tools that help them produce better software when there might not even be any good IDE support. Finally, they’re always constantly investigating new technologies and techniques to give them a competitive edge in the world. Now when I say hacker I don’t mean someone who just hacks lots of random shit together and calls it a day (that kind of developer isn’t good for anyone). Just someone who isn’t afraid to shake up the status quo, isn’t afraid to be a bit different and go against the grain. They’re the polar opposite of enterprise developers. Enterprise Developers on the other hand are fairly conservative with their software development methodology. I’m not saying that a lack of standards is a good thing, but enterprise developers want standards for doing everything and they want it standardized across the company. If there isn’t IDE support for a tool they’ll refuse to use it. Want to use mongodb, riak, etc? Not unless there’s a fancy GUI client for interacting with it. If they find a bug they’ll back away from the framework they’re using and simply declare that the company shouldn’t use the framework until the bug is fixed externally. I find this group prefers to play it safe and work on solidifying their existing practices rather than explore new ideas. Now don’t get me wrong, this isn’t another rant on IDEs or developers who don’t use the command line. But give me a couple days in any organization and I can quickly point out who the Hackers and Enterprise Developers are. The hackers are always pushing the envelope, trying new ideas out, giving presentations. Most likely they’re facing off against enterprise developers on a daily basis who attempt to rebuff their ideas. The enterprise developers on the other hand are pretty content to do their same daily routine for the rest of their lives without any change or growth. To paraphrase Q from the Star Trek episode Tapestry, “He learned to play it safe. And he never, ever got noticed by anybody.” What I’ve been considering though is whether or not both are beneficial to an organization. It’s no secret I associate myself with the hacker group (and thus I am a bit biased) but I keep wondering if enterprise developers truly are just the right fit for some organizations. I always think hackers are perfect because they push the envelop and come up with all kinds of interesting solutions to scalability problems, such as using Bitorrent to deploy to thousands of servers. Enterprise developers on the other hand rarely exhibit such innovation and would require shelling out several million dollars for an application to copy a file to multiple destinations. In a nutshell, you can really get more done with hackers (who will seek to automate manual tasks as much as possible) while you can use enterprise developers in bulk to brute force through any problem. To repeat the beginning of my post… this isn’t a rant. And I don’t mean to put “enterprise developers” in a negative light. This is all just some random thoughts going through my mind about the two cultures I commonly see in every organization I have been in. What’s your opinion? Reference: A Tale of Two Cultures  from our JCG partner James Carr at the Rants and Musings of an Agile Developer blog....
java-logo

Functional programming with Map and Fold in Java

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));In functional programming, Map and Fold are two extremely useful operators, and they belong to every functional language. If the Map and Fold operators are so powerful and essential, how do you explain that we can do our job using Java even though the Java programming language is lacking these two operators? The truth is that you already do Map and Fold when you code in Java, except that you do them by hand each time, using loops. Disclaimer: I’m not a reference in functional programming and this article is nothing but a gentle introduction; FP aficionados may not appreciate it much. You’re already familiar with it Imagine a List<Double> of VAT-excluded amounts. We want to convert this list into another corresponding list of VAT-included amounts. First we define a method to add the VAT to one single amount: public double addVAT(double amount, double rate) {return amount * (1 + rate);}Now let’s apply this method to each amount in the list: public List<Double> addVAT(List<Double> amounts, double rate){ final List<Double> amountsWithVAT = new ArrayList<Double>(); for(double amount : amounts){ amountsWithVAT.add(addVAT(amount, rate)); } return amountsWithVAT; }Here we create another output list, and for each element of the input list, we apply the method addVAT() to it and then store the result into the output list, which has the exact same size. Congratulations, as we have just done, by hand, a Map on the input list of the method addVAT(). Let’s do it a second time. Now we want to convert each amount into another currency using the currency rate, so we need a new method for that: public double convertCurrency(double public double convertCurrency(double amount, double currencyRate){return amount / currencyRate;}Now we can apply this method to each element in the list: public List<Double> convertCurrency(List<Double> amounts, double currencyRate){ final List<Double> amountsInCurrency = new ArrayList<Double>(); for(double amount : amounts){ amountsInCurrency.add(convertCurrency(amount, currencyRate)); } return amountsInCurrency; }Notice how the two methods that accept a list are similar, except the method being called at step 2:create an output list, call the given method for each element from the input list and store the result into the output list return the output list.You do that often in Java, and that’s exactly what the Map operator is: apply a given method someMethod(T):T to each element of a list<T>, which gives you another list<T> of the same size. Functional languages recognize that this particular need (apply a method on each element of a collection) is very common so they encapsulate it directly into the built-in Map operator. This way, given the addVAT(double, double) method, we could directly write something like this using the Map operator: List amountsWithVAT = map (addVAT, amounts, rate)Yes the first parameter is a function, as functions are first-class citizens in functional languages so they can be passed as parameter. Using the Map operator is more concise and less error-prone than the for-loop, and the intent is also much more explicit, but we don’t have it in Java… So the point of these examples is that you are already familiar, without even knowing, with a key concept of functional programming: the Map operator. And now for the Fold operator Coming back to the list of amounts, now we need to compute the total amount as the sum of each amount. Super-easy, let’s do that with a loop: public double totalAmount(List<Double> amounts){ double sum = 0; for(double amount : amounts){ sum += amount; } return sum; }Basically we’ve just done a Fold over the list, using the function ‘+=’ to fold each element into one element, here a number, incrementally, one at a time. This is similar to the Map operator, except that the result is not a list but a single element, a scalar. This is again the kind of code you commonly write in Java, and now you have a name for it in functional languages: « Fold » or « Reduce ». The Fold operator is usually recursive in functional languages, and we won’t describe it here. However we can achieve the same intent in an iterative form, using some mutable state to accumulate the result between iterations. In this approach, the Fold takes a method with internal mutable state that expects one element, e.g. someMethod(T), and applies it repeatedly to each element from the input list<T>, until we end up with one single element T, the result of the fold operation. Typical functions used with Fold are summation, logical AND and OR, List.add() or List.addAll(), StringBuilder.append(), max or min etc.. The mindset with Fold is similar to aggregate functions in SQL. Thinking in shapes Thinking visually (with sloppy pictures), Map takes a list of size n and returns another list of the same size:On the other hand, Fold takes a list of size n and returns a single element (scalar):You may remember my previous articles on predicates, which are often used to filter collections into collections with less elements. In fact this filter operator is the third standard operator that complements Map and Fold in most functional languages. Eclipse template Since Map and Fold are quite common it makes sense to create Eclipse templates for them, e.g. for Map:Getting closer to map and fold in Java Map and Fold are constructs that expect a function as a parameter, and in Java the only way to pass a method is to wrap it into a interface. In Apache Commons Collections, two interfaces are particularly interesting for our needs: Transformer, with one method transform(T):T, and Closure, with one single method execute(T):void. The class CollectionUtils offers the method collect(Iterator, Transformer) which is basically a poor-man Map operator for Java collections, and the method forAllDo() that can emulate the Fold operator using closures. With Google Guava the class Iterables offers the static method transform(Iterable, Function) which is basically the Map operator. List<Double> exVat = Arrays.asList(new Double[] { 99., 127., 35. }); Iterable<Double> incVat = Iterables.transform(exVat, new Function<Double, Double>() { public Double apply(Double exVat) { return exVat * (1.196); } }); System.out.println(incVat); //print [118.404, 151.892, 41.86]A similar transform() method is also available on the classes Lists for Lists and Maps for Maps. To emulate the Fold operator in Java, you can use a Closure interface, e.g. the Closure interface in Apache Commons Collection, with one single method with only one parameter, so you must keep the current -mutable- state internally, just like ‘+=’ does. Unfortunately there is no Fold in Guava, though it is regularly asked for, and there even no closure-like function, but it is not hard to create your own, for example, you can implement the grand total above with something like this: // the closure interface with same input/output type public interface Closure<T> { T execute(T value); }// an example of a concrete closure public class SummingClosure implements Closure<Double> { private double sum = 0;public Double execute(Double amount) { sum += amount; // apply '+=' operator return sum; // return current accumulated value } }// the poor man Fold operator public final static <T> T foreach(Iterable<T> list, Closure<T> closure) { T result = null; for (T t : list) { result = closure.execute(t); } return result; }@Test // example of use public void testFold() throws Exception { SummingClosure closure = new SummingClosure();List<Double> exVat = Arrays.asList(new Double[] { 99., 127., 35. }); Double result = foreach(exVat, closure); System.out.println(result); // print 261.0 }Not only for collections: Fold over trees and other structures The power of Map and Fold is not limited to simple collections, but can scale to any navigable structure, in particular trees and graphs. Imagine a tree using a class Node with its children. It may be a good idea to code once the Depth-First and the Breadth-First searches (DFS & BFS) into two generic methods that accept a Closure as single parameter: public class Node ...{ ... public void dfs(Closure closure){...} public void bfs(Closure closure){...} }I have regularly used this technique in the past, and I can tell it can cut the size of your classes big time, with only one generic method instead of many similar-looking methods that would each redo their own tree traversal. More importantly, the traversal can be unit-tested on its own using a mock closure. Each closure can also be unit-tested independently, and all that just makes your life so much simpler. A very similar idea can be realized with the Visitor pattern, and you are probably already familiar with it. I have seen many times in my code and in the code of several other teams that Visitors are well-suited to accumulate state during the traversal of the data structure. In this case the Visitor is just a special case of closure to be passed for use in the folding. One word on Map-Reduce You probably heard of the pattern Map-Reduce, and yes the words « Map » and « Reduce » in it refer to the same functional operators Map and Fold (aka Reduce) we’ve just seen. Even though the practical application is more sophisticated, it is easy to notice that Map is embarrassingly parallel, which helps a lot for parallel computations. Reference: Thinking functional programming with Map and Fold in your everyday Java from our JCG partner Cyrille Martraire at the Cyrille Martraire’s blog....
agile-logo

That’s Not Agile!

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));If you work with a bunch of agile minded developer’s, you often hear the phrase “That’s not Agile!” It’s quite humorous to hear, because it comes up all the time. Recently I have been reading Andy Hunt’s books and I find them very insightful. The latest book I am reading is “Practices of An Agile Developer”, which he co-authored along with Venkat Subramanium. At the beginning of each section they place a little quote which represents a “devilish” thought. They are entertaining to read, so I thought I would pick out some of my favorite and post them along with my thoughts. If you are agile minded like myself, you will certainly think “That’s not Agile!” Blame Game “The first and most important step in addressing a problem is to determine who caused it. Find that moron! Once you’ve established fault, then you can make sure the problem doesn’t happen again. Ever.” This attitude is rooted in the blame game. Agile is about providing solutions, not assigning blame. If you run into this atmosphere, try to bring a positive outlook to it and solve the problem first. Allow room for a retrospective to mitigate problems in the future, but for pete’s sake, don’t blame. Hack “You don’t need to really understand that piece of code; it seems to work OK as is. Oh, but it just needs one small tweak. Just add one to the result, and it works. Go ahead and put that in; it’s probably fine.” Under time pressure, this thought will definitely come up in any reasonable person’s mind. If you think about it, this mindset is a hack. Responsible developer’s should understand what they are getting into. This doesn’t mean getting into analysis paralysis, but always look for ways to understand, refactor, and improve the code. The trade off in doing that must always be a consideration, but a hack mindset only leads to distressed code in the end. Weigh the options. Refactoring and understanding the code you are working will pay for itself quickly. Egotism “You have a lot invested in your design. You’ve put your heart and soul into it. You know it’s better than anyone else’s. Don’t even bother listening to their ideas; they’ll just confuse the issue.” Agile is about collaboration and learning. I have run into this egotistic attitude many times in my career. I would hope an agile team is about ideas, not who is behind the idea. In addition, even if you have a design, it means nothing until you prove it out in code. Tracer bullet the idea instead of arguing, and you will probably come up with a better design in the end anyways if you consider others’ input. Don’t invest too much in upfront design. If you do, you are missing out on evolving your design. Stagnant “That’s the way you’ve always done it, and with good reason. It usually works for you just fine. The ways you learned when you first started are clearly the best ways. Not much has changed since then, really.” I’ll just say it. This is my favorite. That’s not agile! At all. In an environment of continous improvement and value creation, this is the last thing you should hear. Especially in the technology field, we need to brace for change and accept that new ideas may be better then the old. I’ll borrow a quote from the athletic arena, “If you ain’t improving constantly, you are getting passed up.” Feel free to add your own quotes based on your past experiences. Reference: That’s Not Agile! from our JCG partner Nirav Assar at the Assar Java Consulting blog....
spring-logo

Why I will use Java EE instead of Spring in new Enterprise Java Projects in 2012

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));The question comes up often. It came up in my new project in November 2011, too. I will use Java EE (JEE) instead of the Spring framework in this new Enterprise Java project. I know: Several articles, blogs and forum discussions are available regarding this topic. Why is there a need for one more? Because many blogs talk about older versions of Java EE or because they are not neutral (I hope to be neutral). And because many people still think thank EJBs are heavy! And because the time has changed: It is Java EE 6 time now, J2EE is dead. Finally! Finally, because not only JEE 6 is available, but also several application servers (not just Glassfish as reference implementation). I do not want to start a flame war (too many exist already), I just want to describe my personal opinion of the JEE vs. Spring „fight“… Therefore, I think it is very important to start with a short overview and history of both alternatives. Afterwards, I will list the differences of both and explain why these differences lead me to JEE instead of Spring for most new Java projects. I am explicitly talking about new applications. If you have to extend an existing application, continue using the existing framework! One more disclaimer: I am talking about mission-critical Enterprise Java applications. I am not talking about a little internal application or other uncritical stuff. I also would prefer using a combination of Scala, Groovy and Clojure persisting to a NoSQL database while being deployed at a PaaS cloud service such as JBoss OpenShift or VMware CloudFoundry… General Information about JEE and Spring First, I want to summarize some general information about JEE and Spring:In the end, both alternatives consist of several libraries which can be used by developers to create enterprise applications. Both can be used in most use cases, they have very similar functionality (business logic, transactions, web-frameworks, whatever…) – they only differ in realization (e.g. declarative transactions in Spring vs. conventions in JEE). You also can use only one or some of the available libraries. You can even combine JEE and Spring stuff. Usually, the crucial question is: „Should I use JEE (i.e. especially EJB, JPA, CDI, etc.) or the Spring core framework (i.e. especially Spring Application Context, Spring beans, etc.) for realizing my new application? Mostly, you can choose both, it does not matter from the point of view of the end user. But you should not merge both, this only creates higher complexity. There always was a debate about which alternative to choose. It is very difficult to discuss this question in a neutral way. That’s why almost all discussions end up in praising one framework and bashing the other one (I hope to be neutral in this blog post).History: J2EE was horrible, so Spring helped! J2EE was horrible. So much XML configuration, so many interfaces, and so lame application servers. This is why the Spring framework was created. It solved many problems of J2EE. It was lightweight, easy to use, and applications could be deployed in a web container (such as Tomcat) instead of a heavy J2EE application server. Deployment took seconds instead of 15 minutes. Unfortunately, JRebel did not exist at that time. The Spring framework is no standard as J2EE, nevertheless it became very widespread and an large community arose. Today: J2EE is dead. JEE „stole“ the lightweight Spring ideas! Everything started with a little shortcut change. J2EE was dead. The new shortcut was JEE. JEE 5 was born in 2006. It „stole“ many good, lightweight ideas such as „convention over configuration“ or „dependency injection“ from Spring and other frameworks. Yes, JEE application servers still were heavy, and testing was almost impossible. Nevertheless, developing JEE applications was fun with JEE 5. You did not have to write 20 interfaces when creating an EJB. Wow, amazing! Then, in 2009, JEE 6 was released. Development is so easy. Finally! For example, you have to add only one annotation and your EJB is ready! Of course, the developers of the Spring framework did not sleep. Much new stuff was added. Today, you can create a Spring application without any one XML file as I have read in a „No Fluff Just Stuff“ article some weeks ago. Besides, several really cool frameworks were added to the Spring stack, e.g. Spring Integration, Spring Batch or Spring Roo. Today (November, 2011), both JEE and Spring are very widespread and have a large community. Much information is available for both, e.g. books, blogs, tutorials, etc. So, after I have described the evolution of JEE and Spring, why will I use JEE in most new Java projects? Pros and Cons of JEE and Spring A decision must be made. Which alternative to use in new projects? Let’s look at the pros and cons of both. I will add a „BUT“ to the Spring advantages – these „BUTs“ are the reason why I prefer JEE to Spring. Advantages of JEEJEE is a set of standard specifications, thus it is vendor-independent. Usually, several implementations exist of a specification. Sustainability: Well, this is the advantage of a standard which is supported by several big players. Yes, believe it or not, testing is possible! Lightweight application servers and frameworks such as Arquillian arrived in the JEE world! Convention over Configuration is everywhere instead of explicit (I know that some people will disagree that this is an advantage).Advantages of SpringYou do not need a heavy JEE application server, you can deploy your application in a web container such as Tomcat.BUT: JEE application servers are not as heavy as they were some years ago. Besides, the JEE web profile can be used, too. You do not have to use a Tomcat or Jetty to be lightweight!Spring offers features which are not available as JEE standards, such as Spring Batch.BUT: You can add such a library to a JEE project without problems. You can also add other Spring libraries such as JDBCTemplate or JMSTemplate (which help reducing some boilerplate code) if you want.Spring offers much more flexiblity and power, e.g. aspect-oriented programming is more powerful than JEE interceptors.BUT: In most projects you do not need this flexibility or power. If you do need it, then use Spring, not JEE – of course!Faster Releases (because it is no standard and only one vendor). The reaction to market requirements is much faster. Some current examples: cloud, mobile, social computing.BUT: All enterprise projects – of many different clients – which I have seen, are not that flexible. Enterprise applications do not change every month or year. If there is a project, where you can change your version very easily, Spring might be better than JEE under some circumstances. But in most enterprise projects, you cannot simply upgrade from Spring 2.5 to Spring 3.x or from JEE 5 to JEE 6. I wish this would be possible, but low flexibility and politics rule in large companies with thousands of employees. Conclusion: I will use JEE in most new Enterprise Java Projects Due to the reasons I explained against Spring in the „BUT“ parts, I will choose JEE in most new Enterprise Java projects. Nevertheless, I will sometimes use a Spring libraries, too (such as Spring Batch). Sometimes, I will even have to use Spring (if I need its flexibility or power), but only then I will choose it. Of course, for existing projects, I will continue using the framework that is used already. I would probably not migrate a Spring 2.5 application to JEE, but I would migrate it to Spring 3.x instead! So, I have described my reasons why I will use JEE in most new Enterprise Java projects. If I have missed something, or if you have got another opinion (probably many guys have), you can bash me in the comments. I appreciate all „non-flame-war“ discussions… Reference: Why I will use Java EE instead of Spring in new Enterprise Java Projects in 2012 from our JCG partner Kai Wahner at the Blog about Java EE / SOA / Cloud Computing blog....
devops-logo

Observations on Dev / Ops culture

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));I am and always will be a student of leadership & design. I like to see things work, but I like it more when things work a little better or a little different than I’ve seen them work in the past. More than anything I like change, which I think of as progress when the goal is learning and improvement. When things are working well, I wonder if they could work better and I end up tinkering with them anyways. I am not a fan of the mantra “If it ain’t broke don’t fix it” because I think most things can be improved, some people just don’t see how. I have worked for 3 SaaS companies now and all 3 have had a meaningful influence in the way I think about Operations & Engineering today. Sometimes I learn what works, sometimes I learn what doesn’t work, but I always learn. Of all the things I’ve learned though, I’m coming to view culture – that indescribable, unspoken law of the workplace – to be the most meaningful measure of a company and the one that defines whether bad judgement does indeed contribute to experience and in turn, good judgement. For some, culture is this frilly thing that companies toss around with expensive parties, expensive computers, beer on tap, scooters flying down the halls, and so forth. Companies do this in hopes of attracting top talent by investing capital. For me though, culture has nothing to do with those things. Culture is what allows someone in Customer Support to become CTO. Culture is what allows bad decisions to be talked about and changed, or not. Culture is what keeps employees working at companies for years and years, or causes them to hop jobs every 2-3. Culture is the reality of the way a company operates and it cannot be manufactured. This is a long post – if you just want my conclusion you can skip to the bottom. Company A In 2006 tech companies were just starting to recover from the dot-bomb crash. Most companies were buttoned down and while it wasn’t too hard to find a job as a Sysadmin, companies were looking for experience. “A” was an anomaly during this period. They had landed tons of funding and were ready to take over the world of mobile video. They wanted Rock Stars and they wanted a lot of them – because they needed them. “A” hired by building a front – they had lots of money, a service nobody else had, and a team of really bright folks who wanted to do something new – video to mobile. Their culture was “go fast”, and that was about it. “A” built and deployed software on multi-month release cycles & every time they released it was a mess. They would cram as much as they could into each release, test it in environments that didn’t resemble production and then, starting at 10pm some night – we would deploy. 6-8 hours later, as the sun was beginning to come up, we would all go home… It sucked. When production problems would arise, Ops took a first crack at fixing things. If it was someone who had been there a while they could sometimes deal with it – but largely Developers had to get involved and that could sometimes be a lengthy process. Multi-hour outages were very common at “A”. Over the 3.5 years I spent at “A” I saw countless examples of what doesn’t work:It doesn’t work to deliver service based applications on long release cycles where the requirements literally change as you develop the software. It breaks. It doesn’t work to have Dev design new applications without having any understanding of how production works. They fail. It doesn’t work relying on having top-notch Ops team members to keep your service up. They can’t. It doesn’t work adding more machines instead of investing in Engineering effort. The costs add up. It doesn’t work documenting changes to prevent bad decisions. It just doesn’t.Over time people began to realize that these things were wrong – but turning this ship was like navigating the Titanic. “A” had developed a culture of analysis paralysis. Because releases & production changes had gone so poorly more “controls” were put in place – more rigor and more process. It was harder, not easier, to get changes into production. Every production change had to have a document associated with it – those documents required approval. I’m responsible for putting some of that in place & I will never do it again. The company became focused on establishing deniability in everything you did, a culture of CYA. It may as well have been hog-tied and hanging from a tree – it couldn’t move. Eventually they formed a separate organization called the “CTO Office” where design decisions could be made, where Engineers could work in isolation on new ideas. The designs that came out of this organization rarely hit the mark. The team had brilliant members, but they were isolated from the realities of the day to day operation. They were a huge help when they came in on a production issue and had to turn around a fast fix, they rarely had tight competing deadlines & they were top notch engineers. But when it came to designing something from the ground up it was difficult. What did I learn? Change control done wrong can be very bad. Long release cycles & big-bang releases are bad. Developers not knowing how production works is bad. Testers who don’t know how production works is worse. Relying on Operations to compensate for code & design quality will fail. Isolating engineers who build new components is bound to fail. Money does not buy happiness or a good culture. Company “B” In looking for my next role I wanted the opposite – I wanted freedom and I found it at “B”. The company was small, about 30 people, and was on a 6 week release cycle. There are posters on the walls of the office “Be Nice, Be Scrappy, Be Creative” and that seemed to be how things worked. “B” had very different problems than “A”. At “B” the Developers were very aware of how production worked, often more so than Ops. Developers had production access & when problems arose in production (which was much less frequent than company “A”) the Developers were looking at it right alongside Ops. The problems with “B” were largely Operational initially. Monitoring coverage was actually much better than company “A” however monitoring was so noisy you couldn’t tell what was good and what was bad. Disks would fill because logs weren’t being rotated properly. Services would die and there were manual processes to recover them. Disks went bad just about every day in their large storage service & keeping up with those along was a full time job. Operations spent all their time shaving yaks, never having any time to make things better. One thing Company “B” did reasonably well was configuration management & automated deployments. Overall most things were managed via puppet or our deployment scripts. If we took an outage, we could deploy to the entire system (>150 systems) in around 10-15 minutes. Rolling releases were more of a pain point but were certainly common and mostly automated. As with Company “A”, Company “B” had a strong push to move fast, get features out, gain customers. While Dev seemed to focus a lot of time on building features they also did prioritize infrastructure fixes for some things. This would change over time, but it didn’t seem so bad initially. The service ran pretty well and things stayed pretty stable. Over time I observed that company “B” had a lot of broken windows. The logs were filled with exceptions, you couldn’t tell what was a “normal error” and what was bad. Just about every application ran under a restarter script, an unwieldy perl script that would restart the app if it died and send an email off to alert people. Often our earliest indicator that there were problems with the service was the frequency of application crashes. It became difficult to know if the service was operating correctly or not & hard to pinpoint problems when you knew there was one. Their technical debt grew with each release & Operations spent much of their time focused on break/fix efforts instead of long-term improvements. Company “B” was also growing subscribers at a very fast pace and wasn’t optimizing their software as fast as they were gaining customers. Provisioning hardware fast enough became a real problem & keeping up with database sharding and storage growth was also a big problem. They were trying, but there was still a heavy focus on new features & growing more customers. The culture of company “B” would have allowed for a pivot to address these issues if only funding & want for features allowed for it. Their product had arguably more features than any other in the marketplace and they continue to add more. They were obsessed with building what everyone else had and more. Yet, they still weren’t #1, there were other competitors who beat them on the basis of subscription counts & on familiarity in the Market. I had to explain to people what my company did by asking if they had heard of our competitor – they always had. “We do what they do”, I would say. I always felt crappy saying that.What did I learn? Configuration management & deployment automation are awesome – it was here that I decided these were key capabilities. “Broken windows syndrome” is real and has a dramatic effect on an organizations ability to address problems. Prioritizing infrastructure fixes & reduction of technical debt is as important as being aware of problems. Developer ownership of functionality of code *in production* is critical – knowing it works in testing is irrelevant if it doesn’t work in prod. Monitoring & trending & analytics are fundamental to understanding what your system is doing & an opportunity many companies miss. Company “C” I have only been at Company “C” for 5 months, but I have been taken to school. When I went out looking for something after Company “B” I was looking for a place that understood what it meant to run a service – somewhere I could stop ringing the bell of continuous delivery and get down to the business of actually doing it. While this company isn’t perfect and I’m sure I’ll have more learnings after a few years, what I’ve observed so far has changed my view on many things. Company “C” has some key cultural aspects that make it what it is:Decisions are driven through structured collaboration and consensus. Individuals have the ability to drive decisions, but there is a strong culture of sharing, collaborating and adjusting. Individuals are encouraged to find their own place in the company. You are hired for a loosely defined role in the company and there’s an expectation that you will fill that role until someone else does, but you are encouraged to find your passion and put your energy into that area of the company. Leaders of the company are there to help support team decisions. Like any company, there is some steering that comes from management but it is most often done in a shared and collaborative way. They focus heavily on hiring for personality fit than on technical skill.This core culture makes the company what it is and has led to some interesting responses to issues. When the service began to fall over and had significant availability issues they formed an architecture council, a group of individuals who are passionate about how the service is built. This group includes Engineering, Operations & Product members (about 20 people total). Any significant change to the service is presented to this group and discussed before being built. Major problems that require architectural change are raised to this group, discussed and prioritized. Like other things at this company, this group was probably formed by sending an email asking for volunteers who are passionate about fixing these things. The other thing this company does is make decisions based on data. They have no less than 3 people focused entirely on the infrastructure and analysis of test & production metrics. This includes load testing & performance testing environments. Logging of metrics for every request made in production & in testing. Detailed analytics down to the amount of DB time, CPU time, heap, etc of each request made to the system. If there is an outage, this group can typically pinpoint the # of people impacted and who they are. If there is a customer misbehaving, they can typically find them – fast. If there are performance problems they can describe them very precisely and usually tie those problems to a specific commit. This makes conversations around performance tuning much easier. I have observed a consistent strategy to try things and be willing to change them when they aren’t working. Many companies try new things with the argument that they can change if they aren’t working but few companies have the structured process to make sure that change happens. This company does. It requires discipline. Not coincidentally, this company also shares their experience with their customers and uses their own organization as an experimentation ground for new ideas. As a result of all of this – despite having the smallest infrastructure of any company I’ve ever worked for they:Use feature flags for every new feature that goes out, often even for changes to existing features. Release weekly, the fastest pace of any of the above companies. Have fully automated & unattended (scheduled) deployments. Have the highest availability of any company I have worked for.What am I learning? I’m learning that allowing people to do their job is more valuable than telling them how to do it & allowing them to follow their passion produced incredible results. That collaboration done right can be immensely valuable and while it doesn’t allow rock stars to shine as brightly – it generates much more consistent results and overall has a more positive effect. I’m learning that the companies I’ve worked for do collaboration wrong, and that doing it right requires discipline and training. I’m learning that building a great culture has everything to do with the leadership in the company & very little to do with the product or funding. So what does all this mean? I’ve come to believe that every company has skilled team members. Every company has challenges growing the business and scaling their systems. Every company comes up with clever ways to solve problems. What separates the companies I have worked for is that some understand that they can’t know everything and have built a structured process to make sure change can happen as new things are learned. The others seem to believe that change will happen organically – that important issues will get fixed if they are important enough. This doesn’t happen though because the business doesn’t allow teams to choose to prioritize those important things. They don’t allow pride of ownership. I have also observed that adding process needs to be resisted and questioned at every opportunity. Process is good when it makes decisions better, when it makes the organization more effective. Process is bad when it stands in the way of effectiveness, when it stifles agility & when it causes people to avoid good decisions because they are too much work. I have learned that broken windows syndrome is real and in technical companies it takes a lot of work to keep those windows fixed. Knowing when problems are real and not “normal problems” is important. If something isn’t important enough to build it right then maybe it isn’t important enough in the first place. Leaving broken windows in place means you are ok with a lower quality product and the bar will only move lower – it will never go higher. The single greatest tool I’ve seen to avoid these issues is to empower individuals to gather teams and act. Encourage them to collaborate and create opportunities for teams to come together and share experiences. Retrospectives are excellent for this. Have Operations go to Engineering meetings, have Engineering come to Operations meetings. Have everyone share their experiences – good and bad. You will spend a lot of time in meetings, it will feel less efficient – because it is. Efficiency is being traded for effectiveness. When you do act, you will know better why you are doing it and what the expected result it. There’s not much point in efficiency if you are doing things poorly. Cranking out broken things faster makes no sense. Lastly, I’ve learned that the book “Good to Great” by Jim Collins is right. Period. Reference: Observations on Dev / Ops culture from our JCG partner Aaron Nichols at the Operation Bootstrap  blog....
openxava-logo

Adding Ehcache to Openxava application

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));Introduction This article shows how to quickly enable Ehcache on Openxava applications and thus improving performance. When viewing an entity and its graph, relationships are loaded. Adding a second-level cache fasten the retrieval of associated elements, since already loaded elements are retrieved from cache and not database. Eventually this page explains how minuteproject treats this aspect with keeping its promise: write nothing. As an example, we will take the Lazuly minuteproject showcase. Openxava-Ehcache integration In Openxava, you describe your model in the manner of Java annotated POJO.The annotations come from the standard JPA2 ORM and Openxava specific ones. But nothing prevents you to add others. This is what is done to add caching. There are also couple of configurations to undertake to enable caching. List of actionsAdd ehcache.xml config file at the root of your sources Modify persistence.xml to include second level cache Add caching annotation (alongside JPA2)Remark: Openxava comes with the ehcache.jar so there is no need to add a dependency.Detailed actions Add ehcache.xml In /persistence place ehcache.xml file <ehcache> <defaultCache maxElementsInMemory="1000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="300" overflowToDisk="false" diskPersistent="false" diskExpiryThreadIntervalSeconds="300" memoryStoreEvictionPolicy="LRU" /> <cache name="your.domain.object" maxElementsInMemory="5000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="600" overflowToDisk="false" /> </ehcache>Modify persistence.xml Persistence.xml file contains information related to the persitence unit such as connection pool info, class or configuration to load. ‘persistence.xml’ is located in /persistence/META-INF We will append properties for L2 cache. <properties> <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/> <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider" /> <property name="net.sf.ehcache.configurationResourceName" value="/ehcache.xml" /> <property name="hibernate.cache.use_query_cache" value="true" /> <property name="hibernate.cache.use_second_level_cache" value="true" /> <property name="hibernate.generate_statistics" value="true" /></properties>Add cache annotation Here the hibernate annotation is used instead of the standard one (Cacheable in fact seems not to work) Place Cache annotation at class level of your domain object. @org.hibernate.annotations.Cache(usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE)Example Lazuly application Lazuly is a sample database holding conference information used for MinuteProject showcase purpose. Minuteproject generates a comprehensive set of artefacts to speedup the release of OX application. Further information can be found in Minuteproject 4 Openxava Lazuly showcase. On this part we focus on the artefacts generated for the caching specific. Minuteproject for the generation base itself on a configuration file, where we define the datamodel to reverse engineer. In this configuration there is an enrichement part where you can add information. One of this information deals with the type of content is held in an entity. There are 4 possibilities (reference-data, master-data, pseudo-static-data, live-business-data) If you enrich your entity with the content-type=”master-data” or “reference-data” MinuteProject 4 Openxava will generate associated caching. This is done here for the entity Country. <entity name="COUNTRY" content-type="reference-data"> Here are the cache related artefacts ehcache.xml <ehcache><!-- Sets the path to the directory where cache files are created.If the path is a Java System Property it is replaced by its value in the running VM.The following properties are translated: * user.home - User's home directory * user.dir - User's current working directory * java.io.tmpdir - Default temp file pathSubdirectories can be specified below the property e.g. java.io.tmpdir/one --> <!--MP-MANAGED-UPDATABLE-BEGINNING-DISABLE @ehcache-main-config-conference@--> <diskStore path="java.io.tmpdir"/><!-- Mandatory Default Cache configuration. These settings will be applied to caches created programmtically using CacheManager.add(String cacheName) --> <defaultCache maxElementsInMemory="1000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="300" overflowToDisk="false" diskPersistent="false" diskExpiryThreadIntervalSeconds="300" memoryStoreEvictionPolicy="LRU" /> <!-- The unnamed query cache --> <cache name="org.hibernate.cache.StandardQueryCache" maxElementsInMemory="1000" eternal="false" timeToLiveSeconds="300" overflowToDisk="false" /> <!--MP-MANAGED-UPDATABLE-ENDING--><!--MP-MANAGED-UPDATABLE-BEGINNING-DISABLE @cache-entity-country-conference@--> <cache name="net.sf.mp.demo.conference.domain.admin.Country" maxElementsInMemory="5000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="600" overflowToDisk="false" /> <!--MP-MANAGED-UPDATABLE-ENDING--><!--MP-MANAGED-ADDED-AREA-BEGINNING @custom-cache-definition@--> <!--MP-MANAGED-ADDED-AREA-ENDING @custom-cache-definition@--></ehcache>Persistence.xml <persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0"> <!-- Tomcat + Hypersonic --> <persistence-unit name="default"> <non-jta-data-source>java:comp/env/jdbc/conferenceDS</non-jta-data-source> <class>org.openxava.session.GalleryImage</class> <properties> <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/> <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider" /> <property name="net.sf.ehcache.configurationResourceName" value="/ehcache.xml" /> <property name="hibernate.cache.use_query_cache" value="true" /> <property name="hibernate.cache.use_second_level_cache" value="true" /> <property name="hibernate.generate_statistics" value="true" /> <!--MP-MANAGED-ADDED-AREA-BEGINNING @properties@--> <!--MP-MANAGED-ADDED-AREA-ENDING @properties@--> </properties> <!--MP-MANAGED-ADDED-AREA-BEGINNING @persistence-unit@--> <!--MP-MANAGED-ADDED-AREA-ENDING @persistence-unit@--> </persistence-unit><!--MP-MANAGED-ADDED-AREA-BEGINNING @persistence@--> <!--MP-MANAGED-ADDED-AREA-ENDING @persistence@--></persistence>Class annotation @org.hibernate.annotations.Cache(usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE) //MP-MANAGED-ADDED-AREA-BEGINNING @class-annotation@ //MP-MANAGED-ADDED-AREA-ENDING @class-annotation@ public class Country {@Hidden @Id @Column(name="id" ) @GeneratedValue(strategy = GenerationType.AUTO) private Integer id; ...Generated code remark The generated code has markers inside file extension comment. Within MP-MANAGED-ADDED-AREA-BEGINNING and  MP-MANAGED-ADDED-AREA-ENDING you can place customized code Within MP-MANAGED-UPDATABLE-BEGINNING-DISABLE and  MP-MANAGED-UPDATABLE-ENDING you can alter the code. To keep your modifications please change MP-MANAGED-UPDATABLE-BEGINNING-DISABLE into MP-MANAGED-UPDATABLE-BEGINNING-ENABLE. Updatable code prevent you to lose your customisation over consecutive generations. For more information on updatable code see Minuteproject updatable code. GenerationPlace the following file mp-config-LAZULY-OPENXAVA.xml in /mywork/config on a prompt execute mp-model-generation(.sh/cmd) mp-config-LAZULY-OPENXAVA.xml  the resulting artefacts in /DEV/output/openxava/conference To generate use the updated version of mp-config-LAZULY-OPENXAVA.xml <!DOCTYPE root> <generator-config> <configuration> <conventions> <target-convention type="enable-updatable-code-feature" /> </conventions> <model name="conference" version="1.0" package-root="net.sf.mp.demo"> <data-model> <driver name="mysql" version="5.1.16" groupId="mysql" artifactId="mysql-connector-java"></driver> <dataSource> <driverClassName>org.gjt.mm.mysql.Driver</driverClassName> <url>jdbc:mysql://127.0.0.1:3306/conference</url> <username>root</username> <password>mysql</password> </dataSource> <!-- for Oracle and DB2 please set the schema <schema> </schema> --> <primaryKeyPolicy oneGlobal="true"> <primaryKeyPolicyPattern name="autoincrementPattern"></primaryKeyPolicyPattern> </primaryKeyPolicy> </data-model> <business-model> <!-- <generation-condition> <condition type="exclude" startsWith="DUAL"></condition> </generation-condition> --> <business-package default="conference"> <condition type="package" startsWith="STAT" result="statistics"></condition> <condition type="package" startsWith="COUNTRY" result="admin"></condition> <condition type="package" startsWith="ROLE" result="admin"></condition> </business-package> <enrichment> <conventions> <column-naming-convention type="apply-strip-column-name-suffix" pattern-to-strip="_ID" /> <reference-naming-convention type="apply-referenced-alias-when-no-ambiguity" is-to-plurialize="true" /> </conventions><entity name="COUNTRY" content-type="reference-data"> <semantic-reference> <sql-path path="NAME" /> </semantic-reference> </entity> <entity name="CONFERENCE_MEMBER"> <semantic-reference> <sql-path path="FIRST_NAME" /> <sql-path path="LAST_NAME" /> </semantic-reference> <field name="STATUS"> <property tag="checkconstraint" alias="conference_member_status"> <property name="PENDING" value="PENDING" /> <property name="ACTIVE" value="ACTIVE" /> </property> </field> <field name="EMAIL"> <stereotype stereotype="EMAIL" /> </field> </entity> <entity name="SPEAKER"> <field name="BIO"> <stereotype stereotype="HTML_TEXT" /> </field> <field name="PHOTO"> <stereotype stereotype="PHOTO" /> </field> <field name="WEB_SITE_URL"> <stereotype stereotype="WEBURL" /> </field> </entity> <entity name="PRESENTATION"> <field name="STATUS"> <property tag="checkconstraint" alias="presentation_status"> <property name="PROPOSAL" value="PROPOSAL" /> <property name="ACTIVE" value="ACTIVE" /> </property> </field> </entity> <entity name="SPONSOR"> <field name="STATUS"> <property tag="checkconstraint" alias="sponsor_status"> <property name="PENDING" value="PENDING" /> <property name="ACTIVE" value="ACTIVE" /> </property> </field> <field name="PRIVILEGE_TYPE"> <property tag="checkconstraint" alias="sponsor_privilege"> <property name="GOLDEN" value="Golden" /> <property name="SILVER" value="Silver" /> <property name="BRONZE" value="Bronze" /> </property> </field> </entity> <!-- views --> <entity name="stat_mb_per_ctry_conf" alias="MEMBER_PER_COUNTRY_AND_CONFERENCE"> <virtual-primary-key isRealPrimaryKey="true"> <property name="virtualPrimaryKey" value="ID" /> </virtual-primary-key> </entity> <entity name="stat_mb_by_role" alias="MEMBER_PER_ROLE_COUNTRY_AND_CONFERENCE"> <virtual-primary-key isRealPrimaryKey="true"> <property name="virtualPrimaryKey" value="id" /> </virtual-primary-key> <field name="stat_mb_per_ctry_conf_ID" linkToTargetEntity="stat_mb_per_ctry_conf" linkToTargetField="id"></field> </entity> </enrichment> </business-model> </model> <targets> <!-- openxava --> <target refname="OpenXava" name="OpenXava" fileName="mp-template-config-openxava-last-features.xml" outputdir-root="../../DEV/output/openxava/conference" templatedir-root="../../template/framework/openxava"> </target><target refname="JPA2-LIB" fileName="mp-template-config-JPA2-LIB.xml" templatedir-root="../../template/framework/jpa"> </target> <target refname="BSLA-LIB" fileName="mp-template-config-bsla-LIB-features.xml" templatedir-root="../../template/framework/bsla"> </target><target refname="CACHE-LIB" fileName="mp-template-config-CACHE-LIB.xml" templatedir-root="../../template/framework/cache"> </target> </targets> </configuration> </generator-config>Test To ensure that the caching is working properly:Enable hibernate logging. Add the following snippet as extra properties in persistence.xml.<property name="hibernate.show_sql" value="true" /> <property name="hibernate.format_sql" value="true" />navigate to an entity that reference country (example Address) When you view the detail of this entity you will notice that there is a load of the associated entity ‘country’ But the second time you access to the details of this entity (or another entity referencing the same country instance), the country is not loaded twice from the database.Reference: Adding Ehcache to Openxava application from our JCG partner Florian Adler at the minuteproject blog....
apache-cassandra-logo

A SMALL cross-section of BIG Data

(function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_GB/all.js#xfbml=1&appId=629802223740065"; fjs.parentNode.insertBefore(js, fjs); }(document, "script", "facebook-jssdk"));Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. IDC estimated the digital universe to be around 1.8 zettabytes by 2011. How big is a zettabyte? It’s one billion terabytes. The current world population is 7 billion – that is, if you give a hard disk of 250 billion GB for each person on the earth – still that storage won’t be sufficient. Many sources contribute to this flood of data… 1. The New York Stock Exchange generates about one terabyte of new trade data per day. 2. Facebook hosts approximately 10 billion photos taking up one petabytes of storage. 3. Ancestry.com, the genealogy site, store around 2.5 petabytes of data. 4. The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month. 5. The Large Harden Collider near Geneva will produce about 15 petabytes of data per year. 6. Everyday people create the equivalent of 2.5 trillion bytes of data from sensors, mobile devices, online transactions & social networks. Facebook, Yahoo! and Google found themselves collecting data on an unprecedented scale. They were the first massive companies collecting tons of data from millions of users. They quickly overwhelmed traditional data systems and techniques like Oracle and MySql. Even the best, most expensive vendors using the biggest hardware could barely keep up and certainly couldn’t give them tools to powerfully analyze their influx of data. In the early 2000’s they developed new techniques like MapReduce, BigTable and Google File System to handle their big data. Initially these techniques were held proprietary. But they realized making the concepts public, while keeping the implementations hidden, will benefit them – since more people will contribute to those and the graduates they hire will have a good understanding prior to joining. Around 2004/2005 Facebook, Yahoo! and Google started sharing research papers describing their big data technologies. In 2004 Google published the research paper “MapReduce: Simplified Data Processing on Large Clusters”. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in this paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Google’s implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable. A typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers and the system easy to use. Hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day. Doug Cutting who worked for Nutch, an open-source search technology project which are now managed through the Apache Software Foundation, read this paper published by Google and also another paper published by Google on Google’s distributed file system [GFS]. He figured out GFS will solve their storage needs and MapReduce will solve the scaling issues they encountered with Nutch and implemented MapReduce and GFS. They named the GFS implementation for Nutch as the Nutch Distributed Filesystem [NDFS]. NDFS and the MapReduce implementation in Nutch were applicable beyond the realm of search, and in February 2006 they moved out of Nutch to form an independent sub project of Lucene called Hadoop and NDFS, became HDFS [Hadoop Distributed File System] – which is an implementation of GFS. During the same time Yahoo! extended their support for Hadoop and hired Doug Cutting.At a very high-level, this is how HDFS works. Say we have a 300 MB file. [Hadoop also does really well with files of petabytes and terabytes.] The first thing HDFS is going to do is to split this up in to blocks. The default block size on HDFS right now is 128 MB. Once split-ed in to blocks we will have two blocks of 128 MB and another of 44 MB. Now HDFS will make ‘n’ number of ['n' is configurable - say 'n' is three] copies/replicas of each of these blocks. HDFS will now store these replicas in different DataNodes of the HDFS cluster. We also have a single NameNode, which keeps track of replicas and the DataNodes. NameNode knows where a given replica resides – whenever it detects a given replica is corrupted [DataNode keeps on running checksums on replicas] or the corresponding HDFS node is dowm, it will find out where else that replica is in the cluster and tells other nodes do ‘n’X replication of that replica. The NameNode is a single point of failure – and two avoid that we can have secondary NameNode which in sync with the primary -and when primary is down – the secondary can take control. Hadoop project is currently working on implementing distributed NameNodes. Again in 2006 Google published another paper on “Bigtable: A Distributed Storage System for Structured Data” Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size, petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. This paper describes the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and describes the design and implementation of Bigtable. BigTable maps two arbitrary string values (row key and column key) and timestamp (hence three dimensional mapping) into an associated arbitrary byte array. It is not a relational database and can be better defined as a sparse, distributed multi-dimensional sorted map. Basically BigTable discussed how to build a distributed data store on top of GFS. HBase by Hadoop is an implementation of BigTable. HBase is a distributed, column oriented database which is using HDFS for it’s underlying storage and supports both batch-style computation using MapReduce and point queries. Amazon, published a research paper in 2007 on “Dynamo: Amazon’s Highly Available Key-value Store”. Dynamo, is a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. Apache Cassandra — brings together Dynamo’s fully distributed design and BigTable’s data model and written in Java – open sourced by Facebook in 2008. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010. In fact much of the initial development work on Cassandra was performed by two Dynamo engineers recruited to Facebook from Amazon. However, Facebook abandoned Cassandra in late 2010 when they built Facebook Messaging platform on HBase. Also, besides using the way of modeling of BigTable, it has properties like eventual consistency, the Gossip protocol, a master-master way of serving the read and write requests that are inspired by Amazon’s Dynamo. One of the important properties, the Eventual consistency – means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent. I used the term ‘NoSQL’ when talking about Cassandra. NoSQL (sometimes expanded to “not only SQL”) is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally. The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of file-based database he was developing. Ironically it’s relational database just one without a SQL interface. The term re-surfaced in 2009 when Eric Evans used it to name the current surge in non-relational databases. There are four categories of NoSQL databases. 1. Key-value stores : This is based on Amazon’s Dynamo paper. 2. ColumnFamily / BigTable clones : Examples are HBase, Cassandra 3. Document Databases : Examples are CouchDB, MongoDB 4. Graph Database : Examples are AllegroGrapgh, Neo4j As per Marin Dimitrov, following are the use cases for NoSQL databases – in other words following are the cases where relational databases do not perform well. 1. Massive Data Volumes 2. Extreme Query Volume 3. Schema Evolution With NoSQL, we get the advantages like, Massive Scalability, High Availability, Lower Cost (than competitive solutions at that scale), Predictable elasticity and Schema flexibility. For application programmers the major difference between relational databases and the Cassandra is it’s data model – which is based on BigTable. The Cassandra data model is designed for distributed data on a very large scale. It trades ACID-compliant data practices for important advantages in performance, availability, and operational manageability. If you want to compare Cassandra with HBase, then this is a good one. Another HBase vs Cassandra debate is here. References:A SMALL cross-section of BIG Data from our JCG partner Prabath Siriwardena at the Facile Login blog. MapReduce: Simplified Data Processing on Large Clusters Bigtable: A Distributed Storage System for Structured Data Dynamo: Amazon’s Highly Available Key-value Store The Hadoop Distributed File System ZooKeeper: Wait-free coordination for Internet-scale systems An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics Cassandra – A Decentralized Structured Storage System NOSQL Patterns BigTable Model with Cassandra and HBase LinkedIn Tech Talks : Apache Hadoop – Petabytes and Terawatts  O’Reilly Webcast: An Introduction to Hadoop Google Developer Day : MapReduce WSO2Con 2011 – Panel: Data, data everywhere: big, small, private, shared, public and more Scaling with Apache Cassandra HBase vs Cassandra: why we moved A Brief History of NoSQL...
Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books
Get tutored by the Geeks! JCG Academy is a fact... Join Now
Hello. Add your message here.