Thursday, December 22, 2011

How to Configure Hudson to Monitor Test Coverage Stats

Goal:   configure a Hudson project so it will squawk if the test coverage stats drop below the current coverage levels.

I researched this a while ago, and perhaps this will spare a few folks some effort.

It turns out there are two separate conditions that are related:

1.  job states:    successful / unstable / broken / disabled
 this is displayed as the color of the dot next to an individual build.

2.   job stability (weather icon):
"While a job may build to completion and generate the target artifacts without issue, Hudson will assign a stability score to the build (from 0-100) based on the post-processor tasks, implemented as plugins, that you have set up to implicitly evaluate stability."  These can include unit tests (JUnit, etc.), coverage (Cobertura, Rcov, etc.), and static code analysis (FindBugs). The higher the score, the more stable the build.

settings:
 bright sun (80-100)
 partly cloudy (60-79)
 cloudy (40-59)
 raining (20-39)
 stormy (0-19)


Now for the details about coverage metric settings:

 If you go into "configure" on your project, and have "Publish (coverage) report" turned on, you'll see there are rows (in Cobertura, for things like "classes" "methods" "lines") and then there are three columns.  Here's what they mean:

bright sun (left column):
 the minimum coverage level required for a bright sunny weather indicator on the dashboard.

stormy (middle column):
 the minimum coverage level to avoid stormy icon.

plain sun (rightmost column)
 the minimum test coverage required for a stable build.
 so you should put your current coverage HERE, and your build will be marked unstable if you go below your current coverage percentage.


My interpretation is the first two columns affect your weather icon (job stability), and the third column affects the job state (color of the dot by an individual build).


- Naomi

sources:

http://www.javaworld.com/javaworld/jw-12-2008/jw-12-hudson-ci.html?page=7

http://books.google.com/books?id=YoTvBpKEx5EC&pg=PA369&lpg=PA369&dq=hudson+setting+cobertura+coverage+metrics+targets&source=bl&ots=eJw1L5oit9&sig=6fnE54EDRICZsN6nNcYXKbF8cXQ&hl=en&ei=5wvCTOy3MYXEsAOn9dhB&sa=X&oi=book_result&ct=result&resnum=3&ved=0CCUQ6AEwAg#v=onepage&q&f=false

Friday, December 16, 2011

Stopwords in SearchWorks - to be or not to be?

We've been examining whether or not to restore stopwords to Stanford's SearchWorks index (http://searchworks.stanford.edu).

Stopwords are words ignored by a search engine when matching queries to results. Any list of terms can be a stopword list; most often the stopwords comprise the most commonly occurring words in a language, occasionally limited to certain functions (articles, prepositions vs. verbs, nouns).

The original usage of stopwords in search engines was to improve index performance (query matching time and disk usage) without degrading result relevancy (and possibly improving it!). It is common practice for search engines to employ stopwords; in fact Solr (http://lucene.apache.org/solr), the search engine behind SearchWorks, has English stopwords turned on as the default setting. We had no compelling reason to change most of the default Solr settings.  Thus, since SearchWorks's inception we have been using the following stopword list:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with.

What follows is an analysis of how stopwords are currently affecting SearchWorks, and what might happen if we restore stopwords to SearchWorks, making every query term significant.

 

Executive Summary

We believe that restoring stopwords to SearchWorks could improve results in up to 18% of the searches, and will degrade results only in the small number of searches with more than 6 terms.

 

How Many Terms are there in User Queries?

Over 50% of the query strings for SearchWorks are 1 or 2 terms.
Over 75% of the query strings are 1, 2 or 3 terms.
Over 90% of the query strings for SearchWorks have 6 or fewer terms.

This is strictly query strings; it does not include facet values or other parameters.  Here is a histogram showing the number of terms in our queries for October 2011.  Note that single term queries are split into "alphanum" and "numeric".


Source: (from Google Analytics for Oct 2011, analyzed by Casey Mullin)

 

What Percentage of Query Strings have Stopwords?

In November 2011, there were 142,869 searches.  Stopwords appeared 26,076 searches. Thus, stopwords appeared in roughly 18% of searches.



(Per analysis of November 2011 usage statistics by Casey Mullin, sent in email on Dec 14, 2011).

 

Do the Stopwords Currently Used in Queries Imply the Users are Trying Boolean Searches?

The 10 stopwords appearing most often in queries are (for November 2011):

Stopwordoccurrences in queries
the7578
of6582
and4106
in2298
a1137
to1033
for695
on685
an289
with231

or and not do not appear in many queries, while and is not the most frequent stopword, nor close to it in occurrences. I interpret this to mean stopwords in queries are NOT intended as boolean operators.

(per analysis of November 2011 usage statistics by Casey Mullin, sent in email on Dec 14, 2011).

 

What About Minimum Must Match?

Restoring stopwords could hugely degrade precision, since stopwords occur so often.  Solr's mm setting (minimum must match) gives us a way to mitigate this problem.  In our index employing stopwords, our mm threshold is 4:  queries with up to 4 terms must match all 4 terms;  for 5 or more query terms, 90% must match.   Given that over 90% of queries have 6 or fewer terms, 6 seems an appropriate threshold for an index that includes all words.

As it happens, increasing our mm threshold was proposed a while back, distinct from the idea of restoring stopwords to the index. 


What is Improved by Restoring Stopwords to the Index?

  1. Searches comprised only of stopwords now retrieve results (improved recall) 
    • to be or not to be (with or without quotes) 
  2. Precision is greatly improved for short searches that include stopwords 
    • pearl vs. the pearl
    • the one
    • A Zukofsky (author Zukofsky, title "A")
    • there will be blood  (3 stopwords, so huge improvement)
    • OR spectrum (a periodical)
    • Jazz: an Introduction
  3. Subject links distinguish "in" from "and", etc. 
    • Archaeology in Literature is no longer conflated with Archaeology and Literature
  4. Improved results for languages having words overlapping English stopwords

 

What is Degraded by Restoring Stopwords to the Index?

  1. long queries (over 6 terms) with a lot of stopwords have reduced precision ...  BUT the words occurring as a phrase do float to the top. 
    • Lectures on the Calculus of Variations and Optimal Control Theory

 

What Else Have Testers Reported?

  • Known Item Searches: 
    • restoring stopwords tied or improved our testers' known item searches. 
    • one exception: 
      • a search for dorothy and the wizard OF oz did not retrieve the desired title, which was actually dorothy and the wizard IN oz. 
  • Series Searches, and Uniform Title:
    • "A potential problem of the stopword change is that title access points (aka uniform title) constructed according to AACR2 are without initial articles. So, for instance, the access point for the series "The NASA history series" is "NASA history series". A query that includes the initial article will not affect the search result in current production SW because "the" is eliminated as a stopword, but will affect the search result when stopwords are treated as significant words. On searchworks-test, a phrase title search for "The NASA history series" retrieves 76 records. The same search on production retrieves 125 records. The test search still retrieves some of the records that belong to this series because the transcribed series statement, which is in the 490 field, includes the initial article, but not all of them do. The series access points in the 830 field are all without the initial article. [Symphony browse series retrieves 94 results.]"
    • my reaction: in the metadata advisory group, many of the records we examined had the "wrong" information in the field (it included the initial article, and it shouldn't have). Sooo … our data is dirty -- shocking, but true. It would also be nice to know how often the affected searches are exercised, especially by end-users.

 

Additional Comments

Everything is Imperfect. 
  • SearchWorks employing stopwords gives imperfect search results. 
  • SearchWorks restoring stopwords, so that every term is signficant, gives different imperfect search results.
  • Socrates (our OPAC from our ILS, Sirsi) gives yet different imperfect search results. 
The back end algorithms for determining what results match a query will always be fairly opaque to the end users - the algorithms are complicated. Moreover, users will have typos and other mistakes in their queries no matter what we do, and it seems unlikely we can consistently rescue them from themselves.

Everything Can be Changed.

Solr gives us incredible control over our search engine's algorithm. There are many many knobs we can twiddle in our quest to improve the relevancy of search results. A few of the possibilities include:
  • mm -- require a higher percentage of matching terms when there are more than 6 terms in the query
  • phrase boosting -- this floats result with the query terms occurring close together (and presumably in the same order) to the top.  Currently it seems high enough, but we have never performed any empirical tests.
  • phrase slop -- how close words must occur to each other in the results.  Our current setting is 3; it is not clear to me exactly how phrase boosting and phrase slop interact.
  • adjust the relative boosting of fields -- give even more weight to title field matches, etc.  Again, we've never performed any empirical tests.
  • indexed string length doesn't always have to matter -- adjust the situations where the length of the indexed string affects the score of matches.  E.g. query "my cat" can score higher for title "my cat" than for "my cat and dog."

 

So Where Are We Now?

The data is in, and a decision will be made soon.  I'm guessing stopwords are going to be left in our past.

Tuesday, September 27, 2011

Cucmber Step Definition with inline comment

Have you ever wanted to put a comment on the same line as a cucumber step?

    And I should see "M666" # local_id
    And I should see "1977-1997" # create date


It just occurred to me that I could create a step definition to allow this:


  # 'I should see "text"' step  with comment at end of line
  Then /^I should see "([^"]*)"(?: +\#.*)$/ do |text|
      Given "I should see \"#{text}\"" 
  end

If your text could include escaped quotes, you can use this step definition:

   # 'I should see "text"' step  with comment at end of line
   Then /^I should see "(.*?)"(?: +\#.*)$/ do |text|
      text.gsub!(/\\"/, '"')
      assert page.has_content?(text)
   end




Tuesday, February 8, 2011

Expressing (Search Result) Expectations as Cucumber Scenarios

As many of you know, SearchWorks is Stanford's Blacklight instance providing a "next generation" User Interface for materials at the Stanford Library.  What follows is a document I wrote for internal use so that motivated staff could provide feedback in the form of cucumber scenarios.  This blog post might make more sense in the context of my presentation at Code4Lib 2011 ... but this seemed a worthy blog post nevertheless.  Don't be put off by the length - a lot of what follows is examples.   

   

How to Write SearchWorks Search Result Expectations as Cucumber Scenarios

Sometimes we ask folks to check something new in SearchWorks. (thanks for your help!)
Sometimes people notice problems and report them: via the feedback form (thanks!), or a direct JIRA ticket (thanks!) or via email (less optimal, thanks!)
Occasionally people tell us specifics of something that is working "correctly."
When you ask yourself questions like these, then you are doing a "manual" test:
  • "Is SearchWorks getting the right search results?"
    • "SearchWorks is getting the right results because ..."
    • "The results in SearchWorks aren't ordered correctly. They should be ..."
    • "I know SearchWorks is wrong because ..."
  • "Are things displaying correctly?"
  • "The vernacular title should be ..."
Whenever you test something ("manually") in SearchWorks, we would like to capture your expectations as a cucumber scenario so we can run the test repeatedly and automate it.
Benefits:
  1. we won't have to keep asking you to check the same things over and over. Imagine never having to perform a given test search again!
  2. we can ensure that applying a fix for one problem won't inadvertently break something we've already fixed.
  3. we can automate running a large suite of tests nightly so we keep checking that we haven't broken anything.
  4. as we add specific searches and expected results against our own (meta)data corpus, we are accruing relevancy tests for our own data, based on human review of search results.
Sadly, this does not mean we can make all tests pass – sometimes, we can't achieve the ideal. There may be unacceptable tradeoffs, or it might just be too difficult technically to warrant pursuit. We do have a way to hang on to tests that should pass but are not passing at the current time, so these sorts of tests are welcome as well.
We would still like JIRA issues filed for FAILING cuke tests, and the JIRA issue identifier put in the scenario description.
The tests are easy to write.
Here are some sample cucumber scenarios:
Scenario: Query for "cooking" should have exact word matches before stemmed ones (VUF-123)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "cooking"
  And I press "search"
  Then I should get ckey 4779910 in the first 1 result
  And I should get result titles that contain "cooking" as the first 20 results

Scenario: relevance sort explicitly selected: score, then pub date desc, then title asc (SW-175)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I follow "Newspaper"
  And I select "author" from "sort"
  And I press "sort_submit"
  And I select "relevance" from "sort"
  And I press "sort_submit"
  # alpha for 2007
  Then I should get ckey 7141368 before ckey 7097229
  # newer year (2007) before older year (2005)
  And I should get ckey 8214257 before ckey 5985299
There are more samples at the bottom of this page.

Basic concepts

We use Cucumber (http://cukes.info) to automatically test the behavior of SearchWorks. It matches specific language in the "scenario" (cucumber parlance) with actions to perform, like filling in the search box, and then hitting return. Or clicking a facet link. Or going to a particular record to ensure information is properly displayed.
Cucumber does string matching (using regular expressions) to turn the natural language expressing expected behavior into executable test code. But you don't need to worry about it - just follow the specific language rules and you'll be supplying tests to the grateful engineers.

Scenarios

Each cucumber test is called a "scenario." It can have multiple actions taken, and what is displayed after each step can be examined. The idea is to capture how you're interacting with the web page (clicking buttons, selecting from pull downs, typing in text boxes) and what you expect to be displayed.

Be as precise as possible, BUT

We want the tests to be as useful as possible. Relevancy of search results can sometimes be as clear as
  • "record 666 should be the first result"
  • "the first 4 results should be"
  • "record 777 should be before record 999"
    There are more possibilities given in the "statements" section below.

Try to leave wiggle room for changes to our collection.

If a test is too rigid in its expectations, then small changes can make the test fail. These are the sorts of questions that help determine if the test is too brittle:
  • Are we likely to get more resources of the exact title you expect as the first result?
  • Are we likely to get more resources for the subject heading?
  • Are we likely to get resources that are a better match to the search terms?

Statements

  1. The quotes or absence of quotes in the statements below is important.
  2. When, Then, and And at the beginning of the statements are interchangeable.
  3. If you can't express your expectations with the statements below, please file a JIRA ticket telling us what you are writing a test for. We may be able to add more statements to enable the scenario.
(Step Definition Code for the following is available at
http://www.stanford.edu/~ndushay/code4lib2011/search_result_steps.rb)

All Scenarios must start:
All Scenarios must start:
Scenario: (free text description, keep it short) (JIRA issue identifier)
  Given a SOLR index with Stanford MARC data

Indicate Your Starting Point

  • When I go to the home page
    • Use this for searching scenarios.
  • When I go to the advanced search page
  • When I am on the show page for "________"
    • Use this when you are talking about a particular record.
    • Fill in the blank with an id (ckey).

You're At Your Starting Page; Now Do Something.

Fill in a Text Box
  • When I fill in "q" with "___________________"
    • Use: searches without quotes.
    • Fill in the blank with any string for the search text box (no quotes allowed).
      • When I fill in "q" with "gobblety gook"
      • When I fill in "q" "under the sea-wind"
      • When I fill in "q" with "Shindy AND Delilah"
  • When I fill in the search box with "_________________"
    • Use: searches containing quotes.
    • Fill in the blank with any string for the search text box, and if there are quotes, escape them with a backslash
      • When I fill in the search box with "\"under the sea-wind\""
Pressing a Button
  • And I press "________"
    • Use: pressing a button
      • And I press "search"
      • And I press "per_page_submit"
Selecting from a Pulldown
  • And I select "_____" from "______"
    • Use: selecting a value from a pull-down. (If you don't know the official name of the pulldown, we'll figure it out.
    • Fill in the first blank with the selected value; fill in the second blank with the name of the pulldown.
      • And I select "Title" from "search_field"
      • When I select "author" from "sort"
      • And I select "100" from "per_page"
Following a Link
  • And I follow "____________"
    • Fill in the blank with the link text (NOT the url it goes to)
    • This is how we select facets in testing.
      • And I follow "Journal/Periodical"
      • And I follow "Hoover Library"
      • And I follow "Chinese"
Checkbox Selection and Un-Selection
  • I check "____________"
  • I uncheck "____________"
    • Fill in the blank with label of the checkbox (the text displayed next to it)
Radio Buttons
  • I choose "____________"
    • Fill in the blank with label of the selected radio button (the text displayed next to it)

Look at What You Got Back

  • Then I should get results
  • Then I should not get results
    • Use this only if
      1. you can't provide at least one id (ckey) OR
      2. you can't provide a ballpark number of expected results
  • Then I should get (at least|at most) ___ results
    • Use when the number of results is less than the default number per page (currently 20)
    • Pick the appropriate qualifier; fill in the blank with a number
      • Then I should get at least 2 results
      • Then I should get at most 19 results
  • Then I should get (at least|at most) ___ total results
    • Use when the number of results is more than the default number per page (currently 20)
    • Pick the appropriate qualifier; fill in the blank with a number
      • Then I should get at least 250 total results
      • Then I should get at most 50 total results
  • Then I should get ckey _______ in the results
  • Then I should not get ckey _______ in the results
    • Fill in the blank with a single ckey expected (or not expected) in the first page of results.
    • The latter can be used to exclude false positives.
  • Then I should get ckey _______ in the first ___ results
  • Then I should not get ckey _______ in the first ___ results
    • These are good statements when a particular record should be "above the fold" or should clearly be the first result, or when you want to ensure a particular false positive isn't polluting the top search results.
    • Fill in the first blank with a single ckey, and the second blank with a number lower than the default number per page (currently 20). The last word may be result or results.
      • Then I should get ckey 12345 in the first 1 result
      • Then I should get ckey 12345 in the first 3 results
  • Then I should get ckey _______ before ckey _______
    • Use this to specify result ordering, such as after a particular sort.
  • Then I should get (the same number of|fewer|more) results (than|as) (a|an) (.)search for "_______"*
    • Use: compare number of results with different search
    • "than" and "as" are interchangeable, as are "a" and "an"
    • "title" "author" "subject" may be put before search to indicate a specialized search.
    • query string may contain quotes - but they must be escaped with a backslash
      • Then I should get fewer results than a search for "wonderbread"
      • Then I should get more results than an author search for "\"James Herriot\""
      • Then I should get the same number of results as a title search for "jack in the beanstalk"
  • Then I should get at least ____ of these ckeys in the first ___ results: "______________"
    • fill in the first two blanks with positive integers, fill in the blank with a list of ckeys separated by comma-space: "1234, 23324, 1523"
      • Then I should get at least 4 of these ckeys in the first 4 results: "7637875, 336046, 6634054, 2130330"
  • Then I should get ckey _______ and ckey _______ within ___ positions of each other
    • Then I should get ckey 6974167 and ckey 5757985 within 2 positions of each other
  • Then I should get result titles that contain "______________" as the first ___ results
    • Use when you think a term or phrase in the title will be a less brittle test than ckeys. (originally used to detect if exact matches sort higher than stemmed matches.)
      • Then I should get result titles that contain "arabic" as the first 20 results
  • Then I should see "______________"
  • Then I should not see "______________"
  • Then I should see "______________" (at least|at most|exactly) ___ times
    • Use when you want to find visible text somewhere on the page. Generally too vague for search tests.
      • Then I should see "Carnoy, Martin"
      • Then I should see "Refine" exactly 2 times

Facet Expectations

  • Then the facet "______________" should display
  • Then the facet "______________" should not display
    • Then the facet "Russian" should display
    • Then the facet "Choctaw" should not display
  • Then I should get facet "_____________" before facet "_____________"
    • Then I should get facet "Croatian" before facet "Czech"

Call Number ordering in show view

  • Then I should get callnumber "_____________" before callnumber "_____________"
    • Then I should get callnumber "505 .S343 V.20 1972" before callnumber "505 .S343 V.21:1 1973"

Example Scenarios

Examples: Simple Searches
Scenario: Query for "cooking" should have exact word matches before stemmed ones (VUF-321)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "cooking"
  And I press "search"
  Then I should get ckey 4779910 in the first 1 result
  And I should get result titles that contain "cooking" as the first 20 results

Scenario: Expect specific match and non-match for "french beans food scares" without quotes (VUF-123)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "french beans food scares"
  And I press "search"
  Then I should get ckey 7716344 in the first 1 result
  And I should NOT get ckey 6955556 in the results
Examples: Specialized Searches
Scenario: Single Author Title search matches Socrates results (SW-5)
  Given a SOLR index with Stanford MARC data
  When I go to the advanced search page
  And I fill in "author" with "McRae"
  And I fill in "title" with "Jazz"
  And I press "advanced_search_button"
  Then I should get at least 4 of these ckeys in the first 4 results: "7637875, 336046, 6634054, 2130330"

Scenario: Search for non-existent author should yield zero results (VUF-5)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I fill in "q" with "jill kerr conway"
  And I select "Author" from "search_field"
  And I press "search"
  Then I should get at most 0 results

Scenario: Stopwords in title searches should be ignored - 3 terms total (SW-14)
  Given I am on the home page
  When I fill in "q" with "alice in wonderland"
  And I select "Title" from "search_field"
  And I press "search"
  Then I should get at least 100 total results
  And I should get the same number of results as a title search for "alice wonderland"

Scenario: Thesis advisors (720 fields) should be included in author search (SW-3)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "Zare"
  And I select "Author" from "search_field"
  And I press "search"
  Then I should get at least 10 results
  And I should see "Thesis"
Example: Multi-Button Presses
Scenario: relevance sort explicitly selected: score, then pub date desc, then title asc (SW-666)
  Given a SOLR index with Stanford MARC data
  When I go to the home page
  And I follow "Newspaper"
  And I select "author" from "sort"
  And I press "sort_submit"
  And I select "relevance" from "sort"
  And I press "sort_submit"
  # alpha for 2007
  Then I should get ckey 7141368 before ckey 7097229
  # newer year (2007) before older year (2005)
  And I should get ckey 8214257 before ckey 5985299

Scenario: Call Number
  Given a SOLR index with Stanford MARC data
  When I am on the home page
  Then I should see "Archive of Recorded Sound"
  When I follow "Archive of Recorded Sound"
  Then I should see "[remove]"
  And I should get at least 10 results
Example: Non-Latin Script, Per Page selection
Scenario: Cyrillic (VUF-22)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "пушкин pushkin"
  And I select "Title" from "search_field"
  And I press "search"
  And I select "50" from "per_page"
  And I press "per_page_submit"
  Then I should get at least 12 results
  And I should get ckey 216398 in the results
  And I should get ckey 7898778 in the results
Example: Selecting Facet
Scenario: japanese journal of applied physics PAPERS - 780t, 785t indexed (VUF-11)
  Given a SOLR index with Stanford MARC data
  And I go to the home page
  When I fill in "q" with "japanese journal of applied physics papers"
  And I select "Title" from "search_field"
  And I press "search"
  Then I should get at least 7 of these ckeys in the first 8 results: "365562, 491322, 491323, 7519522, 7519487, 460630, 787934"
  When I follow "Journal/Periodical"
  Then I should get at least 5 of these ckeys in the first 5 results: "7519522, 365562, 491322, 491323, 7519522"
Examples: Call Number Sorting in Record
Scenario: The show view call numbers should be in volume reverse sort order for serials (VUF-666)
  Given a SOLR index with Stanford MARC data
  When I go to the show page for "370790"
  Then I should get callnumber "570.5 .N287 V.25-26 1935" before callnumber "570.5 .N287