Open Source Search Solutions

Effective search is one of the most important applications on the Internet – just look at a company called Google to see what I mean. With the explosion of information available on the web alone, how do you quickly find what you are looking for? Then there is all of the data that we collect on our hard drives; that’s the stuff that is not available on the Internet and other wide area networks (WANs).

Two open source search solutions are well worth mentioning, OpenSearchServer and Constellio. Recently, I implemented OpenSearchServer on the JC-J.IN blog site using their user friendly WordPress plugin. It has been set up to provide a more powerful search facility for this blog site and has been giving good results. I am using a dedicated server to run the OpenSearchServer, which spiders (crawls) the blog site running on a server on the JC-J.COM network, thus keeping its index updated. Whenever a search request is made on the blog site, the search server responds with the search results. You can give it a go now using the Search field on this page. As they say, OpenSearchServer is “An open source search engine and crawler based on best open source technologies: lucene, zkoss, tomcat, poi, tagsoup. A stable, high-performance piece of software. It is a modern search engine and a suite of high-powered full text search algorithms.”

Another open source search solution is Constellio. I quote, “Constellio is the first open source comprehensive suite of enterprise search. It is the result of more than 2 years of research and development and is based on best practices and standards of market research information. Based on the popular search engine Apache Solr and using the architecture of connectors of Google Search Appliance, Constellio provides the solution to index all sources of information in your business. Constellio is compatible with all connectors from Google Search Appliance and can import any index from Solr and Lucene.”

I have installed Constellio on a server and now use it to search my business and technical data, including IMAP folders for email and specific websites of interest. It can perform a combined (federated) search over a number of different search indexes. According to Neilson Group, “The federated search reduces the time spent searching for information within an organization by 53%”.

Apache Solr

Apache Lucene