tag:blogger.com,1999:blog-441116840829624685.post1457347194959484811..comments2022-11-28T09:27:23.808-08:00Comments on Discovery Grindstone: Searching in Solr, Analyzing Results and CJKNaomihttp://www.blogger.com/profile/01110399946779003895noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-441116840829624685.post-62442093836386376282015-05-09T23:09:03.725-07:002015-05-09T23:09:03.725-07:00Dear Ms. Dushay,
I have been asked to research wa...Dear Ms. Dushay,<br /><br />I have been asked to research ways to create a searchable digital library in Tibetan for a nine year curriculum used at a Tibetan Buddhist monastic college.<br /><br />The main concern is that we must have the ability to index and search for words and phrases in Tibetan. Tibetan words are separated by the tsheg character = the unicode character, 0f0b. They are not separated by an ascii space character. <br /><br />We are building on work being done by developers of The Nitartha Digital Library (http://nitarthadigitallibrary.org) who are using XTF to create their text only library. <br /><br />This project needs to include audio, video and photos in addition to, and often along with, the searchable text and so we have been researching other possibilities. Islandora and the Islandora community look like a great match for our project and us.<br /><br />I wrote to the Islandora google groups asking if anyone had advice. Nick Ruest answered, “I'd assume most or all of the work would need to be done in Solr. Naomi Dushay from the Hydra/Blacklight community did a pretty great deep dive into working with Chinese, Japanese, and Korean in Solr. I know it isn't Tibetan, but there is probably a fair bit there that is related and could help out a great deal.” And he linked to your blog.<br /><br />This is a link to the code changes that were done to allow the texts to be searched in XTF using the tseg as a word breaker: https://github.com/cdlib/xtf/commit/41740b48fae930a8c29c3932221d5199d96b73c5<br /><br />I can build a good website and get a book ready for publishing but, wow, I am so over my head.<br /><br />Any advice, suggestions and help would be appreciated.<br /><br />Warm regards,<br />Candia Ludy, Director<br />Pema Karpo Meditation Center<br />www.pemakarpo.org<br />pemakarpomeditation@gmail.comAnonymoushttps://www.blogger.com/profile/11977002448134301078noreply@blogger.comtag:blogger.com,1999:blog-441116840829624685.post-14982120103971375362014-07-10T02:36:11.999-07:002014-07-10T02:36:11.999-07:00Hi Naomi,
Which version of solr is used to imple...Hi Naomi,<br /> <br />Which version of solr is used to implement this cjk search. Currently I am using 3.6.1 and used the same fieldtype which you had used for 'text_cjk' but after re-indexing my content only few words of chinese and japanese work. Korean does not work.<br /><br />Please advice<br /><br />Thanks,<br />PoornimaPoornimahttps://www.blogger.com/profile/01266893112509900013noreply@blogger.com