Index rich document with Solr Cell Solr uses Apache Tika , framework for wrapping many different format parsers like PDFBox , POI, and others Example: curl "http:// localhost:9090/ solr /update/extract?literal.id=doc1&commit=true " -F " myfile
[email protected]" curl "http ://localhost:9090/ solr /update/extract?literal.id= doc1 &uprefix= attr _& fmap.content = attr_content&commit =true" -F myfile =@ tutorial.html (index html) Capture <div> tags separate, and then map that field to a dynamic field named foo_t : curl "http://localhost:9090/ solr /update/extract?literal.id= doc2 &captureAttr= true&defaultField = text&fmap.div = foo_t&capture =div" -F
[email protected] (index pdf ) Confidential 24