I'm working with SOLR on a project where we import a bunch (~40k items) of rich documents, mainly MS Word, Powerpoint, Excel and PDFs.
Is there a best practice schema.xml
and/or solrconfig.xml
to use in SOLR when using the ExtractingRequestHandler
?
I have been doing tweaks to the default schema to attempt to get facets working on date modification times, but even without that, I figure there could very well exist a good example of how these files should be when the default output from Tika is enough.
If there is no such thing as a best-practice schema.xml
and/or solrconfig.xml
I'm also interested in good examples, preferably from existing open source projects or even good blog posts.
Any pointers are welcome!