The Data Access Layer, Part 1 – The Basics (And When Things Aren’t So Basic)

by on August 17, 2012 9:25 am

Today, I’d like to talk about the data access layer (DAL). I’ll start with a brief review: the motivation behind the DAL, the design principles that define a theoretically “good” DAL, and the relationship between the DAL and modern persistence technologies. Then, I’ll look at how simple DAL designs might start to break down under some workloads. I’ll outline a design approach that aims to address those situations without making the types of design compromise one might typically consider.

In Part 2 of this series, I’ll describe JPA’s Criteria API. Then I’ll return to the DAL design approach from Part 1 and step through a more concrete example using a JPA persistence mechanism, with the Criteria API playing a key role in the implementation.

I’m sure we’re all familiar with the tension between quality of design and speed of delivery that seems to pervade the IT world. Sometimes this tension is artificial – people tend to assume that a solid design will take too long to implement, though in fact it is often the case that having structure can make the end-to-end project timeline shorter. Even when the tradeoffs are real, too often they are expressed in terms of “business needs vs. technical needs” – a dichotomy that I always encourage people to rethink.

A full discussion of the business value of good design is beyond the scope that I wish to cover here. Perhaps some of you are already the design advocates on your teams, pointing out when so-called “technical debt” might incur excessive interest payments. Others might tend to prefer quick-and-dirty solutions that let you move on to the next thing. Either way, hopefully, we can all agree that a good separation of concerns is preferable when we can afford it; so in that spirit, let’s look at how to keep the separation between business logic and persistence details affordable.

The DAL sometimes gets marginalized a bit in light of modern persistence technologies. ORM tools like Hibernate abstract away many details of the DBMS (for typical use cases at least; but beware of the occasional need for native queries). JPA further provides a common API for use with various ORM tools (or providers as it calls them). At first glance, it may seem a little much to suggest that even JPA should be abstracted away from the business logic; but on closer inspection, there are reasons to say just that.

The insulation from the DBMS that JPA provides isn’t perfect. Even if you avoid native queries, information about the database schema still tends to leak through, as JPA entities tend to mirror tables very closely, and the relationships between the entities likewise tend to reflect the database’s structure. For a young application with a custom schema, the tables may have been designed to reflect business entities pretty closely anyway; but even when that’s so, as the application evolves, there may be performance reasons for the database to deviate from the “pure” business understanding of the data, and/or you may find that you’d prefer not to restructure the database to follow a change in the business layer data model.

From a separation of concerns point of view, then, JPA behaves less like a data access abstraction to be called from the business layer, and more like a persistence detail to be abstracted away by a DAL interface. By no means does this undercut the value of a good ORM tool. We just have to recognize that like any tool, it is valuable when applied to the right problems and costly when applied to the wrong ones.

So getting back to basics, a good DAL has two qualities:

  1. The interface (through which the business layer invokes DAL operations) is written in the business layer’s terms. Today’s implementation might use JPA; but tomorrow you might find that the native Hibernate API better meets your needs, or decide to get away from ORM and use JDBC, or even abandon the use of a relational database entirely. It should be possible to gut your DAL and write a new one around the new persistence mechanism, changing none of the DAL interfaces (and therefore changing none of the business layer that uses those interfaces). Such a rewrite may not be easy, but without a solid DAL interface you’d typically have to rewrite broad swaths of business logic that had become hopelessly entangled with details of the old persistence mechanism. A DAL rewrite will usually be easier, and will almost always be less risky, than that.
  2. The DAL contains no business logic. In one extreme case, I saw an application whose “business layer” was nothing but a series of pass-through calls into the DAL, which housed all of the application’s real functionality. Of course, saying that you never have to change the business layer is a matter of semantics in this case; the entire purpose of the DAL has been defeated, because re-implementing the DAL would require rewriting the business logic just the same as if there were no DAL.

A typical textbook example of a DAL design consists of a collection of data access objects (DAOs), each focusing on one business entity. The DAO for a business entity will provide basic CRUD operations for the corresponding database records. It will probably include some basic “find” methods (e.g. “find order items by order number”) and perhaps some simple aggregate calculations (“return the total revenues from all transactions on this account”).

As long as data volumes are reasonably small, the DAL may never have to grow beyond this textbook scenario. Sure, the business logic might infer some complex relationship or define some complex calculation; but you can still avoid encoding such business logic within the DAL by simply using the DAL to retrieve the handful of objects involved and resolving the relationship or calculation in code (i.e. in the business layer of the application). This may mean you aren’t using the DBMS to its full potential, but for tens, hundreds, or perhaps even small thousands of records, it’s unlikely that any performance penalty will be noticed.

As volumes climb into the 10,000 range, you may start being more careful about your access paths and query patterns. Fetching 15 records and issuing a follow-up query for each one isn’t usually a huge deal, but issuing a follow-up query for each of 10,000 records will almost certainly take longer than you would accept in an interactive context. Even giving more care to your individual database interactions, you may find that you’re straining the limits of the textbook design.

And what happens if you have millions of records? The optimizations your DBMS and physical data model might provide for high-volume operations become much harder to ignore, and the prospect of pulling all that data across the network into app server memory becomes much less attractive. How often would a transaction-oriented business application have to deal with that much data? Well, both the sizes of corporate data sets and the demand for immediate access to those data sets keep growing; so it’s a reasonable bet that the answer will be more often in the future than in the past.

Faced with this problem, one solution is to start writing stored procedures. Creating a layer of business code in the DBMS isn’t the worst thing in the world, but it does mean that you have database-specific implementations of business logic. It also may mean you need multiple pieces of code implementing the same logic. (Reuse isn’t always easy in stored procedure languages, so two procedures doing similar or related tasks may each need code that implements the same logic; and/or you may still need a bit of code that implements the same logic in the application’s business layer.) This is the traditional way to make maintenance more expensive.

A variation on this theme is to hand-code more sophisticated queries and embed them in special-purpose methods of your DAOs. This may get the DBA off your back if he or she doesn’t want business logic in the database, but in practice it shares the same structural drawbacks as the stored procedure approach.

With either of these solutions, you compromise the design goals (by mixing business logic with persistence details) for optimization. Sometimes you have to do that, but in this case it may be worth exploring another option: dynamic query building.

In addition to methods that immediately return data, your DAOs can provide methods for creating, adding logic to, and ultimately executing queries. Each operation should be simple and devoid of business logic – basically the same level of complexity afforded the individual CRUD methods. And, like the CRUD methods, their names and parameters should preserve the abstraction of the persistence mechanism. You might borrow terminology from the relational database world (e.g. “join”), but ultimately you’re just describing operations that your DAOs will be responsible for implementing through whatever persistence mechanism is hidden within.

With this approach, the running query reflects business logic to be sure; but that logic “lives” safely outside the DAL, being injected into the query at run time through the selection of a particular sequence of query-building calls. The ability to construct and execute queries of (theoretically) arbitrary complexity is both a strength and a risk of this approach. The instinct when you see a flurry of small database interactions bogging down your application is to minimize the number of queries; but at some point you have to balance this against the cost of running each individual query. One query that evaluates 10 conditions for each of 10,000 records will almost certainly be more efficient than 10,000 queries that each fetch one record so the application can evaluate the conditions; but in some cases, 5 queries that each evaluate 2 of the conditions for all 10,000 records will be more efficient still.

To make this approach work, you need a mechanism for building queries. Obviously that mechanism is going to be closely related to persistence details, so you also need to define abstractions that provide the business layer the ability to invoke the mechanism without knowing much (if anything) about it.

In Part 2 of this series, I’ll review the JPA Criteria API, which I’ll then use as the query building mechanism in a JPA-based example of the design.

— Mark Adelsberger, asktheteam@keyholesoftware.com

  • Share:

5 Responses to “The Data Access Layer, Part 1 – The Basics (And When Things Aren’t So Basic)”

  1. Phil says:

    I was all prepared to throw down upon reading the title of the article, but this is really good. Very solid principles for data access being espoused here, and I especially liked the point about the interface being in business terms and abstracting away the data access implementation specifics. Ayende would totally disagree with you, but I think your approach is dead on and has brought me tangible cost savings and business value in past projects.

    • Thanks for your comment; always nice to come in ahead of expectations :)

      Of course you’re right that this is not a universally-held view on data access design, and in fact it’s possible that it’s a minority opinion. Hopefully if nothing else, those who disagree might find the Criteria API review in Part 2 to be useful. I always aim to at least have SOMETHING for everyone.

      • Phil says:

        I found the Criteria API very useful in making query objects and executing them, especially since NHibernate’s LINQ support is a little on the slim side right now.

  2. [...] Part One of this series, I reviewed the basic design goals of a Data Access Layer (DAL), those [...]

Leave a Reply

Things Twitter is Talking About
  • Don't miss @jhackett01's newest post on the Keyhole blog - Processing #Java Annotations Using Reflection: http://t.co/E1lr3RmjI7
    September 15, 2014 at 12:02 PM
  • We're pretty excited - Keyhole's #BikeMS team raised 158% of their fundraising goal to benefit @MidAmericaMS. Plus, they had a great ride!
    September 15, 2014 at 10:38 AM
  • A huge welcome to David Kelly (@rheomatic) who officially joins the Keyhole team today! :-)
    September 15, 2014 at 10:00 AM
  • Sending warm thoughts to @eastlack, @cdesalvo, @wdpitt & all participating in #BikeMS this AM. Thanks for helping in the fight against MS!
    September 13, 2014 at 8:10 AM
  • .@rheomatic We are so excited to have you joining the team! Welcome :-)
    September 12, 2014 at 4:11 PM
  • As the official holiday is a Saturday, we're celebrating today! Happy (early) #ProgrammersDay to you! http://t.co/1CvUfrzytE
    September 12, 2014 at 1:55 PM
  • Tomorrow @cdesalvo, @eastlack, & @wdpitt are riding #BikeMS to benefit @MidAmericaMS. You can get involved, too - http://t.co/9boQwEUxth
    September 12, 2014 at 11:00 AM
  • RT @AgileDevs: 5 tips for great code reviews http://t.co/9PdbtEv0z8
    September 11, 2014 at 2:53 PM
  • The BEMs of Structuring #CSS - http://t.co/159suYtfx6 A quick introduction to the Block Element Modifier methodology.
    September 10, 2014 at 2:49 PM
  • A huge welcome to Joseph Post (@jsphpst) who has joined the Keyhole team this week!
    September 10, 2014 at 9:52 AM
  • @TheGrisExplores Absolutely, and thanks for the compliment! Here's an article that you might find helpful, too - http://t.co/7oxpaohCS1
    September 9, 2014 at 2:22 PM
  • Express.js seems to be the top pick! MT @TheGrisExplores: "what's your fave server-side MVC framework for NodeJS when SPA is not an option?"
    September 9, 2014 at 1:56 PM
  • RT @TheGrisExplores: Yo @KeyholeSoftware dudes and dudettes: what is your favorite server-side MVC framework for NodeJS when SPA is not an …
    September 9, 2014 at 1:15 PM
  • There is a new post on the Keyhole blog by @brianletteri - The BEMs of Structuring CSS: http://t.co/159suYtfx6
    September 9, 2014 at 12:12 PM
  • Know a bright grad? We're looking for a business support / recruiting representative to join our team in Kansas City: http://t.co/GDvFVmoMF9
    September 8, 2014 at 11:31 AM
  • A huge welcome to @jeremygard who joins the Keyhole team today!
    September 8, 2014 at 8:40 AM
  • 8 tips to help you code for maintainability so the next developer doesn't hate you (which we think is important) - http://t.co/Aoe931WLzb
    September 5, 2014 at 2:40 PM
  • Psst... we're looking to add to our team a Java dev who likes SPA technologies like #backbone & #angular - http://t.co/wdbZBFj4y2
    September 5, 2014 at 11:35 AM
  • A #JavaScript promise is an IOU to return a value in the future. Learn about promises - http://t.co/122CZm0fhk http://t.co/F0VorQcMML
    September 5, 2014 at 11:05 AM
  • RT @DZoneLinks: The ABCs of REST - http://t.co/T37lDRfREX - @DZoneLinks Big Link by bball
    September 4, 2014 at 2:26 PM
Keyhole Software
8900 State Line Road, Suite 455
Leawood, KS 66206
ph: 877-521-7769
© 2014 Keyhole Software, LLC. All rights reserved.