The Data Access Layer, Part 1 – The Basics (And When Things Aren’t So Basic)
Today, I’d like to talk about the data access layer (DAL). I’ll start with a brief review: the motivation behind the DAL, the design principles that define a theoretically “good” DAL, and the relationship between the DAL and modern persistence technologies. Then, I’ll look at how simple DAL designs might start to break down under some workloads. I’ll outline a design approach that aims to address those situations without making the types of design compromise one might typically consider.
In Part 2 of this series, I’ll describe JPA’s Criteria API. Then I’ll return to the DAL design approach from Part 1 and step through a more concrete example using a JPA persistence mechanism, with the Criteria API playing a key role in the implementation.
I’m sure we’re all familiar with the tension between quality of design and speed of delivery that seems to pervade the IT world. Sometimes this tension is artificial – people tend to assume that a solid design will take too long to implement, though in fact it is often the case that having structure can make the end-to-end project timeline shorter. Even when the tradeoffs are real, too often they are expressed in terms of “business needs vs. technical needs” – a dichotomy that I always encourage people to rethink.
A full discussion of the business value of good design is beyond the scope that I wish to cover here. Perhaps some of you are already the design advocates on your teams, pointing out when so-called “technical debt” might incur excessive interest payments. Others might tend to prefer quick-and-dirty solutions that let you move on to the next thing. Either way, hopefully, we can all agree that a good separation of concerns is preferable when we can afford it; so in that spirit, let’s look at how to keep the separation between business logic and persistence details affordable.
The DAL sometimes gets marginalized a bit in light of modern persistence technologies. ORM tools like Hibernate abstract away many details of the DBMS (for typical use cases at least; but beware of the occasional need for native queries). JPA further provides a common API for use with various ORM tools (or providers as it calls them). At first glance, it may seem a little much to suggest that even JPA should be abstracted away from the business logic; but on closer inspection, there are reasons to say just that.
The insulation from the DBMS that JPA provides isn’t perfect. Even if you avoid native queries, information about the database schema still tends to leak through, as JPA entities tend to mirror tables very closely, and the relationships between the entities likewise tend to reflect the database’s structure. For a young application with a custom schema, the tables may have been designed to reflect business entities pretty closely anyway; but even when that’s so, as the application evolves, there may be performance reasons for the database to deviate from the “pure” business understanding of the data, and/or you may find that you’d prefer not to restructure the database to follow a change in the business layer data model.
From a separation of concerns point of view, then, JPA behaves less like a data access abstraction to be called from the business layer, and more like a persistence detail to be abstracted away by a DAL interface. By no means does this undercut the value of a good ORM tool. We just have to recognize that like any tool, it is valuable when applied to the right problems and costly when applied to the wrong ones.
So getting back to basics, a good DAL has two qualities:
- The interface (through which the business layer invokes DAL operations) is written in the business layer’s terms. Today’s implementation might use JPA; but tomorrow you might find that the native Hibernate API better meets your needs, or decide to get away from ORM and use JDBC, or even abandon the use of a relational database entirely. It should be possible to gut your DAL and write a new one around the new persistence mechanism, changing none of the DAL interfaces (and therefore changing none of the business layer that uses those interfaces). Such a rewrite may not be easy, but without a solid DAL interface you’d typically have to rewrite broad swaths of business logic that had become hopelessly entangled with details of the old persistence mechanism. A DAL rewrite will usually be easier, and will almost always be less risky, than that.
- The DAL contains no business logic. In one extreme case, I saw an application whose “business layer” was nothing but a series of pass-through calls into the DAL, which housed all of the application’s real functionality. Of course, saying that you never have to change the business layer is a matter of semantics in this case; the entire purpose of the DAL has been defeated, because re-implementing the DAL would require rewriting the business logic just the same as if there were no DAL.
A typical textbook example of a DAL design consists of a collection of data access objects (DAOs), each focusing on one business entity. The DAO for a business entity will provide basic CRUD operations for the corresponding database records. It will probably include some basic “find” methods (e.g. “find order items by order number”) and perhaps some simple aggregate calculations (“return the total revenues from all transactions on this account”).
As long as data volumes are reasonably small, the DAL may never have to grow beyond this textbook scenario. Sure, the business logic might infer some complex relationship or define some complex calculation; but you can still avoid encoding such business logic within the DAL by simply using the DAL to retrieve the handful of objects involved and resolving the relationship or calculation in code (i.e. in the business layer of the application). This may mean you aren’t using the DBMS to its full potential, but for tens, hundreds, or perhaps even small thousands of records, it’s unlikely that any performance penalty will be noticed.
As volumes climb into the 10,000 range, you may start being more careful about your access paths and query patterns. Fetching 15 records and issuing a follow-up query for each one isn’t usually a huge deal, but issuing a follow-up query for each of 10,000 records will almost certainly take longer than you would accept in an interactive context. Even giving more care to your individual database interactions, you may find that you’re straining the limits of the textbook design.
And what happens if you have millions of records? The optimizations your DBMS and physical data model might provide for high-volume operations become much harder to ignore, and the prospect of pulling all that data across the network into app server memory becomes much less attractive. How often would a transaction-oriented business application have to deal with that much data? Well, both the sizes of corporate data sets and the demand for immediate access to those data sets keep growing; so it’s a reasonable bet that the answer will be more often in the future than in the past.
Faced with this problem, one solution is to start writing stored procedures. Creating a layer of business code in the DBMS isn’t the worst thing in the world, but it does mean that you have database-specific implementations of business logic. It also may mean you need multiple pieces of code implementing the same logic. (Reuse isn’t always easy in stored procedure languages, so two procedures doing similar or related tasks may each need code that implements the same logic; and/or you may still need a bit of code that implements the same logic in the application’s business layer.) This is the traditional way to make maintenance more expensive.
A variation on this theme is to hand-code more sophisticated queries and embed them in special-purpose methods of your DAOs. This may get the DBA off your back if he or she doesn’t want business logic in the database, but in practice it shares the same structural drawbacks as the stored procedure approach.
With either of these solutions, you compromise the design goals (by mixing business logic with persistence details) for optimization. Sometimes you have to do that, but in this case it may be worth exploring another option: dynamic query building.
In addition to methods that immediately return data, your DAOs can provide methods for creating, adding logic to, and ultimately executing queries. Each operation should be simple and devoid of business logic – basically the same level of complexity afforded the individual CRUD methods. And, like the CRUD methods, their names and parameters should preserve the abstraction of the persistence mechanism. You might borrow terminology from the relational database world (e.g. “join”), but ultimately you’re just describing operations that your DAOs will be responsible for implementing through whatever persistence mechanism is hidden within.
With this approach, the running query reflects business logic to be sure; but that logic “lives” safely outside the DAL, being injected into the query at run time through the selection of a particular sequence of query-building calls. The ability to construct and execute queries of (theoretically) arbitrary complexity is both a strength and a risk of this approach. The instinct when you see a flurry of small database interactions bogging down your application is to minimize the number of queries; but at some point you have to balance this against the cost of running each individual query. One query that evaluates 10 conditions for each of 10,000 records will almost certainly be more efficient than 10,000 queries that each fetch one record so the application can evaluate the conditions; but in some cases, 5 queries that each evaluate 2 of the conditions for all 10,000 records will be more efficient still.
To make this approach work, you need a mechanism for building queries. Obviously that mechanism is going to be closely related to persistence details, so you also need to define abstractions that provide the business layer the ability to invoke the mechanism without knowing much (if anything) about it.
In Part 2 of this series, I’ll review the JPA Criteria API, which I’ll then use as the query building mechanism in a JPA-based example of the design.
— Mark Adelsberger, firstname.lastname@example.org