Class Loaders, Data Sources, Service Providers and JDBC
Contents
The Rabbit Hole
What starts out as a simple question (where do I put my JDBC drivers in Tomcat?) quickly leads down a rabbit hole…
Actually, this is a straightforward configuration topic - and not the true source of the rabbit hole. That starts with an unusually opinionated paragraph found in this page of the official Tomcat documentation (emphasis mine):
java.sql.DriverManager
supports the service provider mechanism. This feature is that all the available JDBC drivers that announce themselves by providing aMETA-INF/services/java.sql.Driver
file are automatically discovered, loaded and registered, relieving you from the need to load the database driver explicitly before you create a JDBC connection. However, the implementation is fundamentally broken in all Java versions for a servlet container environment. The problem is thatjava.sql.DriverManager
will scan for the drivers only once.
So, on the one hand, for my simple Java web app being hosted on Tomcat, there are some straightforward JDBC set-ups which are well understood and commonly used. But on the other hand, there is clearly something more going on here… and down the hole we go.
Some JDBC History
In the olden days, when you wanted to use a JDBC driver to connect to your relational database from a Java application, you needed to explicitly load the relevant JDBC Driver class.
I’ll use MySQL as my example:
warning
This is from a very old version of MySQL - version 5.1.5 to be specific. Do not use this in your modern code!
|
|
In the above fragment, the relevant line is:
|
|
The Class.forName()
call is not directly related to JDBC - it is just a general purpose Java method for explicitly instructing the current classloader to load a class object from its fully-qualified name.
This is explained in the Java DDBC Tutorial as follows:
In previous versions of JDBC, to obtain a connection, you first had to initialize your JDBC driver by calling the method
Class.forName
.
and:
Any JDC 4.0 drivers that are found in your class path are automatically loaded. (However, you must manually load any drivers prior to JDBC 4.0 with the method
Class.forName
.)
Why was Class.forName Ever Needed?
But why did we ever need to use Class.forName
? We always had our JDBC driver (the JAR file) on the classpath. Why could its classes not be accessed just like any other of our classes in core Java or in libraries in JAR files?
The specific technical reason is eloquently explained in this Stack Overflow answer from Joachim Sauer:
If you’re not using a current JDK (or if you have a JDBC driver that does not have the appropriate files set up to use that mechanism) then the driver needs to be registered with the
DriverManager
usingregisterDriver
. That method is usually called from the static initializer block of the actual driver class, which gets triggered when the class is first loaded, so issuing theClass.forName()
ensures that the driver registers itself (if it wasn’t already done).
That answer also makes the points that you need (a) a “type 4” JDBC driver which supports the automatic loading mechanism; and (b) a recent enough version of Java in which the loading mechanism is actually provided (but this has been available since Java 6).
Classpaths
Another way to answer this “why the need…?” question is to consider that none of the code in the above example makes direct reference to the specific vendor JDBC driver. It’s all handled via Java’s java.sql.*
classes.
|
|
There is nothing in that code which would cause the Java runtime to try to load the required MySQL JDBC driver class - hence the need for an explicit line of code to force the issue.
There are different classpaths. The compile-time classpath contains all libraries and dependencies needed to compile the source code. We can see that there is nothing in our source code for connectUsingDriverManagerTheOldWay()
which directly depends on the MySQL driver.
It is only at runtime (when we attempt to connect to the MySQL database) that we need the MySQL driver available - on the runtime classpath.
Type 4 Drivers
Returning to “type 4” drivers…
A “JDBC 4.0” driver is a type of JDBC driver which is implemented in pure Java. It is the most recently introduced “type” of JDBC driver - and all mainstream DBMSs almost certainly provide a type 4 driver, these days.
If you are interested in reading about other types, a good overview is provided by the Wikipedia page. In summary:
- Type 1 driver: JDBC-ODBC bridge
- Type 2 driver: Native-API driver
- Type 3 driver: Network-Protocol (middleware) driver
- Type 4 driver: Database-Protocol driver/Thin Driver (Pure Java)
How do you know if your driver is a Type 4 driver?
The simplest way is to locate the driver JAR file and look inside it (e.g. with a tool such as 7-Zip). There should be a META-INF
directory containing a MANIFEST
file. For my above example, the related manifest file was as follows:
|
|
There we can see the following:
Specification-Title: JDBC
Specification-Version: 4.0
SPI and Automatic Driver Loading
Coming back to the following sentence from the Java JDBC tutorial:
Any JDC 4.0 drivers that are found in your class path are automatically loaded
How does that automatic loading work?
Side note: It’s not actually guaranteed that a Type 4 JDBC driver will be automatically loaded: there are some very early type 4 driver versions which may not have implemented this feature.
For example, the MySQL example I referenced above was from MySQL 5.1.5 - and the automatic loading mechanism was not supported. But by version 5.1.6, it was supported.
It is worth adding that MySQL version 5.1.x was first released in November of 2008 - so this is (in technology terms) bordering on ancient history. These are not versions you should be using today.
Back to automatic loading…
Automatic loading is provided via the Java SPI - the Service Provider Interface. The ServiceLoader
documentation provides an overview. This is a general purpose mechanism used not only by JDBC but also by other areas such as JNDI, JAXP and so on.
A JDBC driver typically implements support for SPI by providing a file in META-INF/services
named java.sql.Driver
.
The contents of that file will vary from provider to provider. For example, for the 5.1.6 version of MySQL’s JDBC driver, the contents of that file are:
|
|
That is to say, it’s the same value as the one we needed to use in our Class.forName
example earlier.
MySQL tip
Modern MySQL JDBC drivers now use the following:com.mysql.cj.jdbc.DriverThe earlier example (com.mysql.jdbc.Driver) is from a legacy MySQL JDBC driver.
Automatic Loading Steps
We can now take a closer look at how loading happens…
To illustrate the auto-loading process I will assume the following:
a) a simple JDBC connection to a MySQL database using DriverManager:
|
|
b) The following JAR files included on the classpath, shown as Maven dependencies:
|
|
Why two drivers for two different databases? Because this will illustrate an important point about how automatic loading works.
The steps for my scenario are as follows:
- The
DriverManager.getConnection
method is invoked from my code. The calling class (i.e. the class containing my code) is retrieved by the driver manager. This class is used in the next step, along with my connection URL, user ID and password.
From the DriverManager
JavaDoc:
When the method
getConnection
is called, theDriverManager
will attempt to locate a suitable driver from amongst those loaded at initialization and those loaded explicitly using the same class loader as the current application.
Note also that the
- The driver manager calls a
getConnection
worker method which obtains the classloader for the calling class, usingcaller.getClassLoader()
.
From the DriverManager
JavaDoc again:
The drivers loaded and available to an application will depend on the thread context class loader of the thread that triggers driver initialization
Classloaders in Java are a rabbit-hole unto themselves - and I don’t propose to go too far into this topic here. However, it is crucially important to how automatic loading works, because it relates to which resources (in our case which JDBC drivers) can be located, in different runtime configurations.
The
ensureDriversInitialized
method is called. This only happens once for our application - another critical point with important consequences. TheDriverManager
class uses a booleandriversInitialized
field to track whether initialization has already happened, for any subsequentDriverManager.getConnection
calls.The
ensureDriversInitialized
method calls theServiceLoader
class to perform driver class loading:
|
|
The service loader’s
load
method handles scanning the runtime classpath for resources (JAR files) which meet the required criteria - which in our case, is (a) the existence of a file namedMETA-INF/services/java.sql.Driver
, which (b) must contain the fully qualified name of the JDBC driver class to be loaded by the service loader. These are “registered” - which simply means their classes are added to a list of available JDBC driver classes (registeredDrivers
) in the driver manager.Specifically,
ServiceLoader.load(Driver.class)
will cause a new instance of the specific JDBC driver to be created. Typically, JDBC drivers will contain a static initialization block which is executed when the driver is created.
Here is the MySQL JDBC driver example (simplified slightly for this article):
|
|
- The above code calls back into the driver manager and causes the driver to be added to the list of registered drivers:
|
|
and:
|
|
Once all drivers have been registered, the driver manager sets
driversInitialized
totrue
. For our set-up, this means we end up with two registered drivers: The MySQL driver and the H2 driver. So, even though we were not even attempting to perform any H2-related database access, that driver is still registered at this point in the process.The driver manager’s
getConnection
method then walks through the list ofregisteredDrivers
, and invokes each one’sconnect
method:
|
|
- The first successful connection from the list is returned to the client application.
Some points in summary:
This process means that we do not need any explicit references to specific driver implementations. The only database-specific details we need are the string properties needed for the URL, user ID and password.
Drivers are loaded only once for my application. That happens “lazily”, the first time a
DriverManager.getConnection
statement is encountered. TheDriverManager
class cannot be instantiated - it usesprivate DriverManager(){}
. Amongst other things, it acts as a holder for static fields, as we saw for theregisteredDrivers
list. These static fields will not be garbage collected until the class loader which loaded theDriverManager
is itself eligible to be garbage collected (typically at the end of the program, or when a web application is unloaded). Therefore the list of registered drivers will remain available - anddriversInitialized
will remaintrue
.If my application has more than one JDBC driver (as shown in the
pom.xml
example at the beginning of this section), then it’s possible that multiple connection attempts could be made, before one actually works. However, well-behaved drivers will typically perform a lightweight sanity-check on the provided URL, before attempting a more expensive connection.This process is the reason why modern JDBC drivers no longer require you to explicitly use
Class.forName
in your JDBC code. As we shall see later on, however, there can be exceptions whereClass.forName
may still be needed.
You could choose to sidestep the SPI process entirely by replacing the DriverManager
with code such as the following - note the reference to the MySQL implementation class com.mysql.cj.jdbc.Driver
:
|
|
This is discouraged because now you have a hard-coded reference to a vendor-specific class object: com.mysql.cj.jdbc.Driver
.
Using DataSource instead of DriverManager
In the DriverManager
JavaDoc it states:
The
javax.sql.DataSource
interface, provides another way to connect to a data source.
And:
The use of a
DataSource
object is the preferred means of connecting to a data source.
What’s that all about? Have we been doing it wrong up to now?
Here is my earlier MySQL example, rewritten to use a DataSource
:
|
|
What are the advantages of using DataSource
? Well, in my naive example, nothing, really. In fact, it requires me to import a specific implementation class:
|
|
So, in that sense, it’s worse than the original DataSource
version.
But if you look at the official Java JDBC tutorial for data sources, you will see the following advantages mentioned:
DataSource
objects can provide connection pooling and distributed transactions.- Programmers no longer have to hard code the driver name or JDBC URL in their applications, which makes them more portable.
DataSource
properties make maintaining code much simpler.
The last two points in particular are more relevant to applications running in containers - such as web apps in a Tomcat application server. And it could be argued that “much simpler” is not always the case. What may be simpler fro you as a developer, may be more work for the web server administrator. The work is really just moved somewhere else.
(I will take a closer look at connection pooling next - but distributed transactions will not be discussed here.)
Basic Connection Pooling
I will use HikariCP. Here is the Maven dependency:
|
|
Hikari support both the “old school” DriverManager
approach using their jdbcUrl
parmeter, and the “preferred” DataSource
approach, using their dataSourceClassName
parameter - although, as they take pains to point out:
We recommended using
dataSourceClassName
instead ofjdbcUrl
, but either is acceptable. We’ll say that again, either is acceptable.
To begin with I will use a MySQL pool using jdbcUrl
. The simple reason for this is that currently, there is a known issue using the MySQL data source approach:
The MySQL DataSource is known to be broken with respect to network timeout support. Use jdbcUrl configuration instead.
It looks like an issue regarding this was opened in the MySQL bug tracker:
Incorrect implementation of Connection.setNetworkTimeout()
The issue was discussed from 2015 through 2017, but appears to have languished ever since.
Here is a very simple implementation of a HikariCP pool, using an enum
to give us a singleton:
|
|
Here is how it can be used:
|
|
In this case, it is the execution of new HikariDataSource(config)
which triggers the same DriverManager
auto-registration and loading process as previously outlined. In other words, Hikari takes care of the JDBC driver loading.
This:
Connection conn = ds.getConnection()
has become this:
Connection conn = PoolDemo.INST.getConnection()
To use Hikari with a data source class name, instead of a connection URL, you can do this:
|
|
Tomcat with a Servlet
At long last, we can take a look at JDBC using Tomcat 10.0 (in this case, with a very simple JSP page and a servlet). The page displays a “success”/“fail” message depending on whether a database connection was made.
The servlet code:
|
|
The JSP:
|
|
The Maven dependencies and build configuration:
|
|
I build a WAR file using the above code and Maven pom.xml
, and manually deploy it to my Tomcat’s webapps
directory.
Driver Manager with a Local JAR
Our standard DriverManager.getConnection()
code is used.
The MySQL JDBC driver is placed in the webapp’s WEB-INF/lib
folder. I use the following Maven runtime
scope for this:
|
|
This gives the following runtime exception:
|
|
Why does this fail? Why can’t my webapp find the JDBC driver bundled with the webapp?
This is because the driver manager has already run, as a part of Tomcat’s startup process. This is the point made by the Tomcat documentation I referenced at the start of this article.
It means that only libraries visible to the common class loader and its parents will be scanned for database drivers.
Therefore driversInitialized
is already true
and ensureDriversInitialized()
does not run.
Why does Tomcat do this?
Memory Leaks in Servlet Containers
This requires another digression - and I will use the following presentation notes from Mark Thomas to help explain:
Diagnosing and Fixing Memory Leaks in Web Applications: Tips from the Front Line
A class is uniquely identified by:
• Its name
• The class loader that loaded it
Hence, you can have a class with the same name loaded multiple times in a single JVM, each in a different class loader. Web containers use this for isolating web applications. Each web application gets its own class loader.
An object retains a reference to the class it is an instance of. A class retains a reference to the class loader that loaded it. The class loader retains a reference to every class it loaded.
Retaining a reference to a single object from a web application pins every class loaded by the web application in the JVM’s memory.
These references often remain after a web application reload. With each reload, more classes get pinned in memory and eventually it fills up.
Note: The original presentation refers to the “Permanent Generation” memory area in the JVM. This has been replaced in more recent versions of the Oracle JVM with the Metaspace memory region - but the point about memory leaks still holds true.
So, one of the fundamental requirements of a servlet container (to manage multiple different web applications simultaneously) is in conflict with the way the DriverManager
class handles automated driver loading.
Tomcat's Leak Prevention Listener
This is why Tomcat triggers the driver scan during Tomcat startup (via a configuration option which can be changed). It move this (and other similar core Java singletons) from each web application to the Tomcat platform, where they will not proliferate as web apps are re-loaded. But the consequence of this is:
Drivers packaged in web applications (in WEB-INF/lib) and in the shared class loader (where configured) will not be visible and will not be loaded automatically.
Instead, drivers can be placed in $CATALINA_HOME/lib
, or, if it is provided, in $CATALINA_BASE/lib
. These will be visible to the Tomcat classloader used to trigger driver loading.
More details can be found in the JreMemoryLeakPreventionListener
documentation. See specifically the driverManagerProtection
attribute.
Class.forName() - Still Needed
There are further consequences arising from this approach. For example, if you have different web applications which may rely on different versions of a JDBC driver, then placing these different driver JARs in the Tomcat lib
directory may not work, since it is not guaranteed that each application will use the correct version of the driver.
In this case, you may still need to bundle each JDBC driver in the WEB-INF/lib
directory of the related web application - and use Class.forName()
to load it.
Tomcat with a Servlet and JNDI
Another approach is to take advantage of the previously discussed DataSource
approach - this time via JNDI.
In this case, a JDBC connection can be defined via configuration, instead of in your web application’s code. The typical approach is to create this configuration in the Tomcat conf/context.xml
file, along with a resource reference in your application’s WEB-INF/web.xml
file.
Examples:
The context.xml
entry:
|
|
The web.xml
entry:
|
|
My web application can then access this data source as follows:
|
|
Just to note: This still needs to follow the same guidelines as described above regarding the correct location for the driver manager JAR file. It still relies on automatic driver registration.
Conclusions
There is nothing new in any of the above discussions, but I found much of the information somewhat scattered around different documents, presentations and discussions. Having a more detailed walkthrough has helped me gain a better understanding of the “why” behind the “how” regarding where to place my JDBC driver files.
Author northCoder
LastMod 08-Nov-2021