Integration Testing Your SOLR Index with Maven

Introduction

The following article describes how to automatically (via Maven):

  1. prepare the SOLR WAR dependency to log with log4j and repack it as part of the release artifact
  2. start a SOLR instance with the resources in
/src/main/resources/ 

as SOLR_HOME

  • import data into SOLR by running a Shell script
  • run integration tests on that SOLR instance via SolrJ
  • package an artifact containing the resources in a SOLR convenient structure; including two SOLR WAR artifacts: one as provided via Maven and one repackaged with log4j support; and including any custom libraries that go into SOLR_HOME

Resulting in a good foundation for a Continuous Integration setup.

Image

Existing Solutions

There have been solutions on how to create a Maven SOLR configuration and run it, before. What is different?

Simple Alternative

Launching Solr from Maven for rapid development

Here, the web.xml of SOLR WAR is copied to:

/src/main/webapp/WEB-INF/ 

The article does not offer a solution on how to do that automatically to make it work on a Continuous Integration environment.

SOLR Packager (Maven Archetype)

I haven’t used the packager, yet, so I cannot tell how it compares to my solution. From the website information, it uses a different structure than the one I will propose in this article: instead of mirroring the SOLR_HOME structure right in

/src/main/resources/ 

it uses the custom folder

/src/main/solr/ 

I’m not sure all of their archetypes allow to run the SOLR WAR on Jetty out of the box which is essential for integration testing.

SOLR Quickstart (Maven Archetype)

This archetype uses an overlay to run the SOLR WAR. That way, logging cannot be changed to log4j.

Using Cargo instead of Dependency Plugin with Jetty

StackOverflow: How to run jetty:run-war using a war defined by maven coordinates?

In some previous attempt, some time ago, I noticed that Cargo had problems initializing the JNDI environment which I use for setting SOLR_HOME. This might be solved by now, though, but I have yet to try it again. I’m also not sure about modifying the WAR file to deploy. If you have experience with Cargo in that matter do not hesitate to leave some feedback.

Project Structure

As a general advice stick to Maven’s and SOLR_HOME’s conventional structures as close as possible even if it means more folder levels. Of course, in case the multiple cores tend to repeat their configuration you might want to configure some clever automatic copy-and-modify via Maven. But this is not in the scope of this article. (The SOLR Packager might offer some help for such use cases.)

Any SOLR configuration files go into

/src/main/resources/ 

in the structure SOLR_HOME requires. This setup is for a multicore index, and core0 contains solritas configuration (velocity templates and layout):

/src/main/resources
|_/core0
  |_/conf
    |_/velocity
      |_/css
      |_/js
      results.vm
    |_/xslt
      data-config.xml
      schema.xml
      solrconfig.xml
|_/core1
  |_/conf
    data-config.xml
    schema.xml
    solrconfig.xml
  log4j.xml
  solr.xml

/src/test/java
  com.itagenten.test
    ITCore0.java
    SomeUnitTest.java

/src/test/resources
  jetty-env.xml

/src/main/java
  com.itagenten.solr.handler.dataimport
    RecursingSolrEntityProcessor

/src/assembly/
  distribution.xml
  library-java.xml
  solr-log4j.xml

pom.xml
 

Dependencies

Instead of listing all of my dependencies, I will mention the two central ones: SOLR and Jetty, and the dependencies required to make SOLR log via Log4J.

Specify SOLR with type “war” and scope “runtime”:

${solr.version} 

is a property specified in pom.xml so that all other SOLR depencies can make use of it, as well, and upgrading the version is easy enough. (See the properties section below.)

I prefer Log4J, so I’ve excluded the Slf4J dependency provided by SOLR, and specified the required Slf4J and Log4J dependencies in exchange. If you also happen to depend on other SOLR packages like, for example, dataimporthandler make sure that all of them exclude the slf4j-jdk14 artifact. Use some Maven dependency hierarchy browser (like the one in Eclipse) to spot whether it is still listed. Check the output when the integration tests run: Slf4J will complain if there are multiple implementations of the Logger in the classpath.

The Jetty dependency uses the Eclipse variant:

The Assembly and the Dependency plugin will be referring to some dependencies not mentioned here like the solr-dataimporthandler or lucene-analyzers-icu artifacts. Take them as placeholders for any dependencies you will be defining for your own project.

Maven Lifecycle

As overview, here are the essential steps mapped onto the Maven lifecycle:

process-resources

Antrun:

adjust execute file mode on import script (import_data.sh: chmod to executable)

process-test-resources

Dependency:

  1. unpack SOLR WAR, remove slf4j-jdk14
  2. copy slf4j-log4j and log4j libraries to
WEB-INF/lib 

of exploded SOLR WAR

  • copy SOLR extensions to
/target/classes/lib/ 

(SOLR_HOME/lib)

Resources:

copy log4j.xml from

/src/main/resources/ 

to WEB-INF/classes of the exploded SOLR WAR

package

Assembly:

  1. build library artifact
  2. repackage SOLR WAR with log4j support and log4j.xml configuration
  3. build distribution artifact (containing the artifacts produced in (1) and (2))

Resources:

copy library artifact (Assembly (1)) to

/target/classes/lib/ 

(SOLR_HOME/lib)

pre-integration-test

Jetty:

start server (goal: deploy-war)

Exec:

execute import script

post-integration-test

Jetty:

stop server

Properties

The versions of SOLR and Jetty are set via properties. Also, the file encoding is set to UTF-8 for sources and for reporting output. As the location of the exploded SOLR WAR is referenced from different plugins, it is also extracted to a property.

Dependency Plugin

There are two steps that we configure via two “execution” elements:

Step 1

We will be using the dependency plugin to get the WAR into the right place for Jetty to run it: unpack the dependency org.apache.solr:solr:war into the specified output directory (which is below /target/). Unfortunately, unpack-dependencies does not honour the exclusion elements from the dependencies section so an explicit exclusion of slf4j-jdk14 is required.

Step 2

Copy the slf4j-log4j and log4j dependencies into the exploded SOLR WAR.

Step 3

I decided that SOLR_HOME would be at:

/target/classes/ 

simply because that is where the /src/main/resources/ stuff ends up by default. Thus, any extension libraries for SOLR should go into:

/target/classes/lib/ 

This is accomplished by this second execution step of the dependency plugin.

I am using an explicit include list to avoid bloating the lib directory. My selection requires that I set excludeTransitive to false so that the plugin can find all the artifacts from the inclusion list.

Resources Plugin

Same as the dependency plugin, the resources plugin helps in two ways.

Step 1

As mentioned in the dependency section, we want to use Log4J for handling SOLR’s log output. Therefore, the logging configuration which is located directly in

/src/main/resources/ 

needs to be added to the

/WEB-INF/classes/ 

folder of the exploded SOLR WAR.

Step 2

In this step, the actual project artifact is copied to

/target/classes/lib 

folder (SOLR_HOME/lib). Note that this is only necessary if you do have any custom code in

/src/main/java/ 

like the entity processor extension that I listed in the structure as an example. If you do not provide any custom code but plan to use SOLR via configuration alone you can skip this step. See the Assembly Plugin section for more information on how and when to build the artifact so that it is ready to copy when this execution step of the resources plugin is called. (The artifact is build by Assembly.)

Jetty Plugin

The SOLR_HOME variable needs to be set in jetty-env.xml which is located in /src/test/resources/:

The Jetty plugin is configured with the WAR file located where the dependency plugin unpacked it:

${integration-test.dependencies.dir}/${solr.server.war} 

See the Properties section for the actual values.

The execution elements setup the Jetty server to be run during integration testing or when started explicitly via:

jetty:run-war 

You start the SOLR application with your configuration by calling:

mvn jetty:run-war 

This will make SOLR available at:

https://localhost:9090/solr 

This is a 4.0 configuration, and the URL points to the AdminUI of the multicore index.

Or only run the integration tests (start, test, stop):

mvn verify 

Assembly Plugin

Step 1: library artifact (classifier: lib)

This section applies if you do have custom code that you want to place as JAR artifact into SOLR_HOME/lib. For that, we need the Assembly plugin to build the artifact, and build it in an earlier lifecycle phase than when the Resources plugin is supposed to be moving it to /target/classes/lib (SOLR_HOME/lib). See also the Resources Plugin section.

Step 2: SOLR WAR repackaged to support Log4J

Repackage the modified WAR in order to add it to the release artifact and spare whoever is deploying it to production from having to modify it themselves (removing the Slf4J dependencies and adding the log4j ones). Its final name is:

solr-${solr.version}-log4j.war 

This WAR file will include a sample log4j.xml file which is also added to the release artifact directly, simply for convenience.

Step 3: distribution artifact (classifier: dist)

The other use case for which Assembly will be setup is the distribution artifact which includes the complete SOLR configuration. For deployment, it can be unpacked and used as SOLR_HOME. It will also include any additional files such as the Log4J configuration, the SOLR WAR file and any release documents you might want to include.

The library-jar descriptor includes only the actual source code:

Of course, if you require property files to go into the JAR file you will have to modify the descriptor accordingly. The JAR file does not include any dependencies because these are included as JAR files in SOLR_HOME/lib in their own right.

The descriptor for the SOLR WAR repackaging bundles up the exploded and modified contents of the SOLR WAR dependency (the final name has been set in the execution configuration):

The distribution descriptor includes all SOLR configuration in the structure required for SOLR_HOME, and additional files for a complete release like the vanilla SOLR WAR as well as the one created for Log4J support and the log4j.xml file – and any documentation or other resource files you may find appropriate.

Antrun Plugin

The Resources plugin does not retain unix file modes, as explained here:

StackOverflow: Keep permissions on files with Maven resources:testResources

The Antrun plugin is a workaround to adjust the file mode of the import script after it has been copied to the build directory:

The Antrun plugin needs to be executed before the Exec plugin which imports the data into SOLR. I have configured it to run right after the resources are copied.

Exec Plugin

The Exec plugin starts the script which imports the data into the database. This script calls the DataImportHandler via CURL and waits for the time of the import. In this case, importing does not take very long and it is fine to wait for the import to finish. For integration testing, choose a sensible amount of data that offers enough meat to let the tests chew on while still importing fast enough to make the tests run in an acceptable time all over.

The script is executed in the pre-integration-test phase and integration tests are run right afterwards.

Failsafe Plugin

The configuration for integration tests is as usual:

A short integration test (JUnit 4) assures that it is called correctly after the import:

[fblike style=”button_count” showfaces=”false” width=”90″ verb=”like” font=”arial” float=”left”] [fbshare type=”button” float=”left”] [google_plusone size=”medium” float=”left”] [twitter style=”horizontal” float=”left” lang=”de”] [linkedin_share style=”right” float=”left”]
Java  Maven  SOLR