Sunday, October 24, 2010

Testing Solr Indexing Software - Full Stack Tests

In a previous post, I talked about the different levels of testing for code that writes to a Solr index.   This post will go into detail about Full Stack Tests, which are acceptance tests for search results including the UI wrapping.

The testing mantra:  if you had to test it manually, then it's worth having automated tests.  How many times can you ask human testers to bang on your application?   How often do they repeat searches that worked in the past? 

Our UI for http://searchworks.stanford.edu is based on Project Blacklight (http://projectblacklight.org/), a Ruby on Rails application.  There is a great way to test the RoR application stack from the user input forms to the html that would be returned:  Cucumber (http://cukes.info/).  So our cucumber tests fake user input into the UI, run it through our RoR code, send the request to Solr, run the response through our RoR code, and then we can look for desired data in the resultant html.

Here are some example cucumber tests:

Scenario: Query for "cooking" should have exact word matches before stemmed ones
  Given a SOLR index with Stanford MARC data
  And I go to the catalog page
  When I fill in "q" with "cooking"
  And I press "search"
  Then I should get ckey 4779910 in the first 1 results
  And I should get result titles that contain "cooking" as the first 20 results

Scenario: Stopwords in author searches should be ignored
  Given I am on the home page
  When I fill in "q" with "king of scotland"
  And I select "Author" from "search_field"
  And I press "search"
  Then I should get at least 20 total results
  And I should get the same number of results as an author search for "king scotland"
  And I should get more results than an author search for "\"king of scotland\""

Scenario: Two term query with COLON, no Stopword
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "Jazz : photographs"
  And I press "search"
  Then I should get ckey 2955977 in the results
  And I should get the same number of results as a search for "Jazz photographs"
  And I should get the same number of results as a search for "Jazz: photographs"

Yes, these are executable tests.  And they give us a huge safety net for ensuring Solr configuration changes and indexing changes don't break anything.   If we change boost values in a Request Handler.  If we change a field type in Solr.  If we tweak the UI code handling raw user queries.

Whenever we get a user-feedback message, or an email from a staff member about expected search behavior, it is fodder for these tests.  Normally, we get reports of what is broken.  Great!  The ideal testing scenario.  We write a cuke test before we fix it, assert the cuke test fails, then we work on a fix, assert the cuke test passes.  And we can run all our other cuke search tests to ensure it doesn't break anything else.

The staff are delighted to hear that we now have a way to know automatically if we break the behavior in the future.  And that we'll fix it.   They are delighted to hear that they won't be asked to repeat the tests manually when we upgrade Solr or make any other changes.

Perhaps you are starting to see how we can do relevancy testing.  But that's for another post.

1 comment:

  1. Hi Naomi,

    I really enjoyed your series of Solr posts. Will you be continuing them this year?
    I'm a content curator for DZone.com and I'd love to syndicate your blogs and help give your work some exposure to our large Java audience and our general web development audience. Email me at mitch {at} dzone [dot] com and we can discuss more details.

    ReplyDelete