Hi
I have been working with Apache Lucene, Hadoop, SolR, ElasticSearch for 3 years+.
I would suggest you to use SolRCloud or ElasticSearch for a large dataset.
I could help to make it in 1 week (up to 2 week).
I suggest to make a middlewave system which:
+ Index: should index via internal system, our mw system would listen on TCP and waiting for index request, I think Apache Thrift would be great for it.
+ Search/Suggest/AutoCompletion: Via Restful & TCP, so that App or web could do query, paging, facet.
+ Admin monitor: Realtime Monitor System, Monitor data, Request, Server Info (Ram used, Disk memory, Caching Info) ...
If you have any question, just let me now.
Best Regards.
Sang Dang.