Lucene relevance search
I want to have Lucene implemented for searching in an extensive article database for relevancy.
What is Lucene?
Lucene is an open-source free-ware that conduct relevance searching. Lucene is searching for relevance within your own database and basically it works much the same as a Google search minus PageRank.
From Lucene Tutorial [url removed, login to view]
“Jarkarta Lucene ([url removed, login to view]) is a high-performance, full-featured, java, open-source, text search engine API written by Doug Cutting.
Note that Lucene is specifically an API, not an application. This means that all the hard parts have been done, but the easy programming has been left to you. The payoff for you is that, unlike normal search engine applications, you spend less time wading through tons of options and build a search application that is specifically suited to what you're doing. You can easily develop a custom search application, perfectly suited to your needs. Lucene is startlingly easy to develop with and use.”
Today Google is used to find articles that are relevant to a certain topic. Basically it works like this: When an archive is created or re-published with the option Google Search enabled, a query is send to Google in the format of site:[url removed, login to view] string.
• This page is a “Topic” or “Archive” page listing articles related to the topic “Kundalini yoga”; [url removed, login to view]
• A search among all articles on the website has been made using Google, basically with a search phrase like this site: [url removed, login to view] kundalini yoga
• The articles on [url removed, login to view] are presented in a certain order, an important parameter is the result from the relevance search.
• Lucene should be used for relevance search instead of Google
More information about Lucene can be found here:
Lucene Tutorial: [url removed, login to view]
Apache Lucene - Overview : [url removed, login to view]
Wikipedia: [url removed, login to view]
What should be done?
To get Lucene to work, functionality for the following has to be developed:
• Creating index in bulk
o First step would be to start a batch creating a "Lucene index" for all articles.
• Incremental indexing
o Every time a new article is added (or changed), the Lucene index should be updated.
• Relevance search
o Every time a new archive is created or an old archive is re-published with "New relevance search" enabled, Lucene should be used to find the most relevant articles for this archive.
The site is written in PHP
Admin: PHP , SH, Crontab
OS: Linux 7.0