Wednesday, February 15, 2012

stanford-solr-marc fork of SolrMarc

In the interests of reducing my ongoing work for Stanford's SearchWorks index, I have, with Bob Haschart's blessing, forked the SolrMarc code and made my fork available via the (new) SolrMarc space on github:

Specifics of how my fork digresses are below.

This is an experiment:  I believe my personal efforts will be reduced by using this pared down derivative of SolrMarc.  I am NOT committing to supporting all the use cases that Bob supports with SolrMarc.  Bob is doing a great job of juggling VuFind needs, Blacklight needs, UVa needs, less savvy consumers' needs, and maintaining backward compatibility with earlier versions of Solr.  I cannot make those kinds of commitments on Stanford's dollar or on my own time.   
One goal of the fork is to simplify the code and the build scripts for development purposes.  This creates a slightly higher expectation of users:  they will be presumed to have expertise to do what they need downstream.  (e.g. edit the file, set up analogous directories for their local site code and/or their local versions of Solr, substitute their own java customizations, set their own version up for bean shell, etc).

If anyone likes what I've done or any part of it, feel free to grab it, fork it, mimic it or whatever.   I am happy to add committers if they write test code for any changes they want to push up.

I have created hudson builds for the core code and the site specific code in stanford-solr-marc on the projectblacklight hudson server.  These builds will kick off after each commit to the stanford-solr-marc github repository, and they create javadoc and test coverage reports (see the hudson pages below for links to these).

I can add emails to the hudson build notifications, and can probably figure out how to have github send emails upon commits, if folks desire.

It would be awesome if the fork converges with SolrMarc future development to the point of re-combining the code base.  Meanwhile, as Bob and I have discussed, this fork may help Bob with some of his refactoring plans, and I can forge ahead with Stanford specific needs more easily.
Significant Differences between my fork and the SolrMarc on GoogleCode:
  1. git  
  2. reorg of the directory structures for clarity and to reduce nesting.
  3. complete rewrite of the ant builds.
    • a single build.xml file
      • no macros
    • a single file -- it should be straightforward to change as desired.
    • the build process does not result in a single jar, but instead creates a dist directory with all the files and folder structure as needed to execute the code.
  4. the wonderful scripts written by Bob are not "localized" by the build process
  5. strives to use "vanilla" versions of Solr and Marc4j, with version clearly indicated
  6. the utility class has been refactored into smaller pieces
  7. the only exemplar site code is Stanford SearchWorks
  8. functionality not used by Stanford is often stripped out, such as
    • bean shell scripting capability (it could be added back in easily, if desired)
    • notion of running under windows (could be added back in)
    • unused code placeholders, such as z39.50
  9. embedded solrj update options are not exercised - this code will be stripped out soon
  10. core tests have been largely rewritten to adhere to junit common practices:  ant calls a junit class which executes the java code and asserts the correct results.
  11. current intent is to move away from using java reflection to simultaneously support multiple versions of Solr -- I will create a tag/branch for a Solr version if a Solr upgrade isn't backwards compatible, and I make no promise to keep that branch up to date.
I have not written or rewritten the type of documentation available on the googlecode SolrMarc wiki - much of that documentation is directly applicable (settings for, settings for …).

Note that the SITE code for Stanford SearchWorks will lag behind our actual production code, as the copy of record is *not* the github repository.  
a.  avoids commit messages for every commit for local work
b.  allows our copy-of-record to be behind the Stanford firewall.
c.  I will update the github repository to the current Stanford production code from time to time.

Let me repeat:  I'm not promising to keep this project backwards compatible with older versions of Solr or of files, as those progress.  The main audience for this codebase is me.  Others are welcome to the code, and will probably be welcomed as committers … but consumers of this codebase will be presumed to have enough expertise to do what they need downstream.  (e.g. substitute their own java customizations, or set their own version up for bean shell, or for a different version of Solr).

There is plenty more work to do.  Just a few examples:
  • More tests of core code
  • More refactoring of core code
  • Documentation