Intro to Apache Solr

shalinmangar 1,418 views 16 slides Jul 25, 2016
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Learn the capabilities of Apache Solr, including how to run in standalone and cloud mode as well as how to contribute


Slide Content

Apache Solr
Introduction & Demo

•What is Apache Solr?
•Start/stop Solr
•Indexing data to Solr
•Searching data
•Running a SolrCloud cluster
•Hacking Solr
Agenda

•Lucene based search server + other features
•Access Lucene over HTTP:
•Java, Python, Ruby, .NET, PHP over XML/JSON and
other formats
•Faceting (guided navigation), suggestions,
highlighting etc.
•Replication and distributed search
•Lucene best practices
What is Apache Solr?

•Extract:
•tar xvf solr-5.1.0.tgz (linux/mac)
•unzip solr-5.1.0.zip or click+extract (windows)
•Run:
•./bin/solr start -e schemaless
•./bin/solr start -e schemaless -p 8983
•./bin/solr -help
•./bin/solr start -help
•Stop:
•./bin/solr stop
Running Solr

•./bin/post script
•Using curl directly
•Using the Admin UI
•SolrJ and other indexing clients
Indexing data

Demo time

Inverted index

•+red +shoes = red AND shoes
•+shoes -red = shoes NOT red
•“android phone”
•“android phone” -samsung = “android phone” NOT samsung “android
samsung”~4
•merced*
•createDate:[201301 TO 201401]
•author:shalin
•author:”shalin mangar”
•author:”shalin mangar” AND project:(lucene OR solr) title:samsung^5
category:phone
Lucene/Solr query syntax

•DataImportHandler: Index databases, Email, RSS, XMLs etc.
•Rich document support: PDF, MS Office, Images etc.
•Faceting, stats, analytics
•Replication for high query volume
•Production systems with billions of documents
•Very extensible and customizable
•Embedded in commercial search products from Lucidworks,
DataStax, Cloudera, Hortonworks, Pivotal, Amazon
Cloudsearch, Riak etc.
Other features of Solr

•Subset of optional features in Solr to enable and
simplify horizontal scaling a search index using
sharding and replication
•Goals: scalability, performance, high-availability,
simplicity, and elasticity
What is SolrCloud?

•./bin/solr -e cloud
•Yeah, it’s that simple!
Running SolrCloud

SolrCloud demo

•http://wiki.apache.org/solr/HowToContribute
•Pre-requisites:
•git: git clone http://git-wip-us.apache.org/repos/asf/
lucene-solr.git
•github: fork and clone apache/lucene-solr
•ant 1.8.x or above
•Eclipse or Intellij Idea (I recommend Idea)
•Put svn/git and ant in your $PATH or %PATH%
Hacking Solr

•ant ivy-bootstrap (required only once)
•ant idea or ant eclipse (generated a complete project for you which
you can open in your favourite IDE)
•Find an existing Jira issue or open a new one at http://
issues.apache.org/jira/browse/SOLR
•Make changes, write tests, once finished:
•run ‘cd solr; ant server’ to build Solr and start via bin/solr scripts
•run ‘ant test’ (it can take a while), ensure all tests pass
•run ‘ant precommit’, (run from the checkout root) ensure it passes
•Generate a patch with ‘svn diff’ or ‘git diff’ and attach to Jira
Hacking Solr

•http://lucene.apache.org/solr
•https://cwiki.apache.org/confluence/display/solr/
Apache+Solr+Reference+Guide
•https://issues.apache.org/jira/browse/SOLR
•Ask me: solr-help.slack.com
•Ask other users: [email protected]
•Ask developers: [email protected] (use
sparingly)
Resources

Thank you
Shalin Shekhar Mangar, [email protected]