Thursday, 1 September 2011

Problems with ORMs


Object Relational Mapping tools like Hibernate have helped developers make huge productivity gains in dealing with relational databases in the past several years. ORMs free developers to focus on application logic and avoid writing a lot of boilerplate SQL for simple tasks like inserts or queries.

However, the well-documented problems with object-relational impedance mismatch inevitably cause headaches for developers. Relational databases are a specialized technology built on sound concepts, but they don’t necessarily line up to the object-oriented world. There are a few approaches and styles to using ORMs which have various pros and cons.

One of the fundamental choices to using an ORM is deciding whether the you will generate your ORM mappings from the database schema or the other way around, generating your database schema off of some ORM definition (possibly an XML configuration file, annotations or something like XDoclet).

The former approach of generating your ORM layer from your database schema means you have to deal with the database in its own language and terms, whether you deal with DDL specific to the database or have some abstraction layer, but nevertheless you are forced to treat the database as what it is. Unfortunately, it means you need expertise in the technology and it may take more work than allowing your schema to be generated. However, this forces developers to understand and deal with the RDBMS properly – it is dangerous and harmful to treat a DBMS as a simple datastore. Developers need to consider the impact of keys, indexes, etc. when designing applications, and shielding them from the realities of relational databases can be dangerous, and in my experience this always turns out badly. A related issue is the use of POJOs, which end up being manipulated by the ORM framework. While this sounds nice in theory, in practice you can hit various kinds of issues and it can be tempting to mix application logic with what should really amount to data access objects. Developers and architects like to praise the separation of concerns by using Spring and other frameworks, and there’s no real reason why the same concept shouldn’t be applied here. One other minor issue is the need to maintain the POJOs and mapping definition, but this is usually not too much work.

The second approach of generating your ORM mappings and code from your schema is my preferred approach. In my experience of using both approaches, having beans generated off of your schema allows your beans to be intelligently designed and be only as complicated as required, while making available fetch by PK, by index, etc. all for free. It also becomes easier to manage things like lazy collections and referenced objects since it is all managed within the persistent class itself. This approach also avoids the need for writing boilerplate POJOs and forces you to treat your data access objects separately from your domain objects and business logic. In my experience with generating data access beans off your schema, the beans end up being richer, more usable, perform better, and once you have your infrastructure in place, maintenance costs are lower. One may think that you end up needing additional data-wrapper classes, but in practice the need for separate bean classes is independent of what is going on in your data access layer. One issue here is the availability of frameworks to do this generation work for you – in the past, I have worked with custom-built solutions that worked well and paid off, but required initial up-front work. On smaller projects there may not be enough pay-off to be worth investing in such an effort. At the same time, there are ORMs out there that take this approach and generate persistent entity classes, such as jooq, but I have to try them.

Hibernate is the most popular ORM in the Java world and while it is a step up from dealing with writing copious amounts of SQL, it has its problems. Generally the approach is to define your mappings and POJOs and then let Hibernate manage your SQL generation. The problem with this is your defined schema is often less than ideal and things like proper indexing end up overlooked. Hibernate also forces you to end up using their transaction and query mechanisms, though the extent you choose to use their proprietary stuff is up to you. This isn’t necessarily a bad thing in all cases, but personally I have a distaste for the HQL language which is frequently used, since it introduces a familiar-yet-different language to developers which others will later have to maintain and try to figure out. There are also issues with query optimization that can crop up, and having done significant work on performance tuning in the past, access to actual queries for tuning is a must for me. I also believe that trying to implement inheritance in persistence classes is just a bad idea – it’s not worth trying to force a concept onto a technology which naturally does not accommodate it. Hibernate tempts developers to implement inheritance in the database through support for table-per-hierarchy and table-per-class mechanisms, but it is a mistake in my mind since you end up with a poor data model and problems later managing and extending the hierarchy. I also do not like to populate what should be a clean relational model – you can’t pretend relational DBs are object-oriented datastores.

If you take one thing away from this post, it should be to not ignore the actual technologies you are using. Treat an RDBMS for what it is, and learn to use it. Do the same for object-oriented systems. By all means, try to make your life easier by using ORMs to avoid writing boilerplate code and unnecessary SQL, but don’t think you can avoid dealing with some kind of translation or code to deal with the natural mismatch that occurs. Do not expect a framework or tool to solve the problem for you. Developers are paid to think and discerning the best road to take, so we shouldn’t be afraid to deal with problems as they come up and solve them intelligently. As with many things, the 80-20 rule seems to apply here. Use ORMs to take care of 80% of the work for you, and be prepared to write SQL and some persistence code for the other 20%. Don’t expect too much or you will end up with several types of problems – strange bugs, performance issues, a poorly designed object-oriented model, etc.

I’d love to hear your experience and thoughts on ORMs in any language and the issues you’ve faced, along with how you dealt with them. This is one of those topics where experience is incredibly valuable, so please share your thoughts.


Reference: Problems with ORMs from our JCG partners at the Carfey Software blog.


Related Articles :

11 comments:

  1. "Hibernate also forces you to end up using their transaction and query mechanisms"

    this is not true, you can use JPA standards that Hibernate implements.

    Hibernate is very sharp, but it requires rather good knowledge of this framework to make proper use of it. If used incorrectly, it might have disastrous results.

    ReplyDelete
  2. Your post makes totally sense to me.

    Using an ORM shouldn't be a one way trip.

    I usually use JPA(EclipseLink) in my projects. However, the same projects also use simple JDBC database connections when:
    1. Performance is a priority
    2. Tables are created during runtime
    3. Complexity and mutability of queries requires not using an ORM abstraction
    4. The critical level of the feature

    Thinking deeper about the point 4, I believe it is very important to consider about this trade off: Is fast present development = less future maintenance? Sometimes when we develop faster could mean we have lots of abstractions. When some problem appears during maintenance, it is hard to troubleshoot it having so many abstraction layers to dig.

    Features that are high critical for users I personally try to keep then simple and not having so many abstractions. When a problem appears, I have full liberty to check it because the code is almost mine :). Ohhh!!! My precious… :).

    We know that data layers are always a pain in the *** and its good design is extremely relevant :). So my advices are:
    1. be careful;
    2. be wise;
    3. don’t feel in a hurry;
    4. don't become dogmatic about a technology…
    …because when you maintain code in a production environment… all kind of Gods must be with you all the time :).

    ReplyDelete
  3. Personally, I dislike Hibernate. It first creates the illusion of being easy and by the time developers realize it is not they are stuck with a poorly designed database model, unnecessarily complicated queries and likely some performance issues.

    I wouldn't use it again and instead would likely go for a more lightweight approach. The most tedious thing about jdbc is dealing with rows of data and converting those to objects for easy manipulation. Luckily there are many good solutions here. For example, spring comes with a quite convenient JdbcTemplate, including a row mapper. Using it is not that hard. And it is not the kind of code that you will spend a lot of time writing or maintaining to begin with.

    Secondly, you want to denormalize your data as much as your querying needs allow. Using hibernate in full automagic mode achieves the exact opposite. Result: ridiculously normalized table structures. Applying a good dose of YAGNI here will keep your design simple. Problems you have with transactionality, lazy loading, etc. will also get a lot less complicated.

    Finally, if you need flexible indexing, consider using something like solr. That way you can keep your database model simple, your queries fast and simple, and new queries don't require you to deal with the database layer. Solr is great stuff and nicely complements SQL querying abilities. We use a simple denormalized schema and a solr index. It replaced a hibernate generated mess and insanely complex queries that took ages to run.

    ReplyDelete
  4. My experience with Hibernate / JPA is that it is good for mapping table rows to objects. But I allways do the following steps:

    1. Make a datamodel in SQL and with ER-diagram. I try to make this model "perfect" from an SQL perspective with respect to normalization and performance.

    2. Write the objects by hand. I add all the relations between objects as "@Transistent", such that I can consentrate on making the objects as simple and clear as possible.

    3. Write the relation mapping annotations on the objects. The hibernate / JPA annotations have several dimmentions: foreign key column / field, fetching and cascading.

    4. DAO: At last I write the Data-Access-Objects. First interface for these, and then the implementations. I strive to make the interface as simple and clear as possible. I also make unit-tests to test the dao.

    ReplyDelete
  5. "I also believe that trying to implement inheritance in persistence classes is just a bad idea – it’s not worth trying to force a concept onto a technology which naturally does not accommodate it"

    Hibernate does not require you to use domain object inheritance - it just provides support for it so that applications with complex and properly designed domain object models can use the power of OOP..

    Like with any framework - if you don't understand Hibernate, and not willing to invest the time in learning it - don't use it, you can always enjoy writing raw JDBC code.... oh.. wait.. you need to understand well what you are doing there too.. if you want to write working code..

    Sounds like this article is written by a DBA, or at least a person coming from PL/SQL background, and not understanding OOP concepts well.

    ReplyDelete
  6. Relational db were invented in the days of lotus123 tables when object oriented programming were not popular.

    Imo, it has outlived its lives and usefulness. We shld be using object db instead.

    ReplyDelete
  7. Recently I see many topics like ORM vs ADO.NET(JDBC). I totally disagree with these types comparisons.

    I totally agree with 80-20 rule while using ORM. And I also think that RDMS will stay for a while. Object DB's also intrduces new problems

    ReplyDelete
  8. I have to say I feel like you just stated the obvious here...

    ReplyDelete
  9. I really like this post but I think that the perspective (Relational>OO or vice-versa) is usually not an option but is set by the context: either a new application must use existing DB (usually RDBs with Beans) or it can be developped together with its own OODB.
    More generally, since the pivotal mismatch comes from inheritance, the best option should be to build OO models using qualified subtyping constructs with relational mapping semantics.

    ReplyDelete
  10. I totally agree with the premise of the article. What ever you do, be sure to know fundamentals of your stack. Just jamming in Pojos and a bit of JPA annotations is not enough, learn SQL. And if you are coming from SQL, learn how to minimize repetition by taking help where necessary (don't hack up your own gremlin ORM)

    ReplyDelete
  11. Nicely put. The Object-Relational Impedance mismatch is not going to go away, as the two paradigms will continue to coexist and JPA/CriteriaQuery just copies off of Hibernate (including its problems), instead of going into a similar direction as .NET’s LINQ. Truly, ignoring the fact that the two worlds are different will always be the most important source of problems in any software dealing with an RDBMS.

    While many applications don’t really need SQL, some do, extensively. It makes a big difference whether you use IN (…) or EXISTS (…), whether you use OR, or UNION’s. And many times, grouping is better/worse than using window functions. As you said, this requires lots of expertise, which should not be neglected in larger projects. See also this article:

    http://lukaseder.wordpress.com/2011/09/02/oracle-scalar-subquery-caching/

    Thanks anyway for linking to jOOQ. I’d be curious about your feedback, should you decide to try it for yourself!

    Cheers
    Lukas

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...