Searching on Drupal with search API and Typesense

lussoluca 173 views 52 slides Sep 26, 2024
Slide 1
Slide 1 of 52
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52

About This Presentation

Search is usually one of the features of a website that is underestimated and left at the end of the project.

You can install the Search API module, and you're done with it, right?
Well, things are more complex than that, and nowadays, users are getting used to powerful search engines like the ...


Slide Content

Search API
meets
Typesense
Lusso Luca

About me
Drupal / PHP / Go developer @ SparkFabrik
Drupal contributor (WebProfiler, Monolog, Symfony
Messenger, Search API Typesense) and speaker

Drupal.org: https://www.drupal.org/u/lussoluca
LinkedIn: www.linkedin.com/in/lussoluca
Slack (drupal.slack.com): lussoluca
Mastodon: @[email protected]



@lussoluca

WE ARE A TECH COMPANY OF ENGINEERS,
DEVELOPERS AND DESIGNERS WHO WILL
THINK, DESIGN AND BUILD YOUR CUSTOM APPLICATIONS,
MODERNIZE YOUR LEGACY
AND TAKE YOU TO THE CLOUD NATIVE ERA

4
PROUD OF OUR PARTNERSHIPS
We help italian businesses to
bridge the gap with China
thanks to our
official partnership with Alibaba
Cloud
We are Google Cloud Platform
Technology Partner
We are official AWS partners

5
PROUD OF OUR MEMBERSHIPS
We arere Silver Member of the
Open Source Security
Foundation
We are supporter of the
Cloud Transformation
Observatory of the PoliMi
We are Silver Member of the
Cloud Native Computing
Foundation

We are Silver Member of the
Linux Foundation Europe

Demo
6

Typesense is a modern, blazing-fast,
developer-friendly, open source search engine.
It uses cutting-edge algorithms that take
advantage of the latest advances in Hardware
Capabilities & Machine Learning.

https://typesense.org
7

C++
+20k stars on Github
Self-hosted or SaaS
Easy to set-up
8

➔ Current version is v27
➔ A new release every 2-3
months

Public roadmap
https://github.com/orgs/typesense/projects/1/views/1
9

Typesense doesn’t provide a UI, only REST
endpoints

If you need a UI:
●Typesense cloud (paid product)
●https://github.com/bfritscher/typesense-dashboard
10

A lot of features
11
Source: https://typesense.org

API Libraries are available in many languages
12
Source: https://typesense.org

Integrations exist for many platform
13
Source: https://typesense.org
Sadly, no Drupal here…

There’s a module for that™
Search API Typesense
https://www.drupal.org/project/search_api_typesense
14

Try Typesense on DDEV
https://github.com/kevinquillen/ddev-typesense
15

16
➔ Admin API key is set when Typesense server is
started
➔ Typesense is distributed, here you can set all the
available nodes
➔ Nearest node is to leverage Search Delivery Network
in Typesense Cloud
➔ Showed configuration is the one to connect to a
DDEV managed Typesense server
Connect to the server

17
New tabs for Search API server

18

19

20

21
New tabs for Search API index

22
➔ Search API Typesense defines a set of new
field types, it’s mandatory to use them when
indexing data to Typesense
➔ Supported field type are:
◆ string
◆ string[]
◆ int32
◆ int32[]
◆ int64
◆ int64[]
◆ float
◆ float[]
◆ bool
◆ bool[]
Fields configuration

23
➔ For every field added to Search API we must
configure how it will be exposed to Typesense
➔ A collection on Typesense server will not be created
until the schema form is submitted
Fields configuration

24
➔ Typesense handles typographical errors
automatically for you out-of-the-box
➔ Different parameters can be added to the search
query to fine tune typo tolerance:
https://typesense.org/docs/27.0/api/search.html#t
ypo-tolerance-parameters
➔ One useful parameter is
enable_typos_for_alpha_numerical_tokens to
enable/disable typo tolerance on alphanumeric
fields (like product code or SKUs)
Typo Tolerance

25
➔ The synonyms feature allows you to define search
terms that should be considered equivalent

Synonyms

26
➔ Can be configured directly on Drupal
➔ Typesense supports two types of synonyms:
◆ One-way synonyms: Defining the words
iphone and android as one-way synonyms of
smart phone will cause searches for smart
phone to return documents containing iphone
or android or both.
◆ Multi-way synonyms: Defining the words
blazer, coat and jacket as multi-way
synonyms will cause searches for any one of
those words (eg: coat) to return documents
containing at least one of the words in the
synonym set (eg: records with blazer or coat
or jacket are returned).
Synonyms

27
➔ Well, what we all expect about a faceting feature…
➔ More info here:
https://typesense.org/docs/27.0/api/search.html#f
acet-results

Faceting

28
Searching for ckeditor, the
document with id
entity:node-3456622:en is ranked
third.

I want to promote this result to be
the first one when users are looking
for ckeditor results.
Merchandising

29
Let’s create a curation that says:

when the query is ckeditor, include
the document
entity:node-3456622:en as the first
one.
Merchandising

30
??????
Curations can be used to push
desired results to fixed positions on
the search result page.
Curations can also exclude one or
more results to appear when a user
types a specific query.
Merchandising

search_api_typesense module
doesn’t have an integration
with Views…

…and probably never will
31

introducing

Instant search
32

InstantSearch.js is an open
source UI library for Vanilla JS
that lets you build a search
interface in your frontend app

https://www.algolia.com/doc/guides/building-search-ui/widgets/showcase/js/
33

➔ Typesense team has developed
an adapter to make
instantsearch API calls
compatible with Typesense
server
➔ With Instantsearch, a client (like
a browser), directly talks with
Typesense, without Drupal in the
middle
➔ Important: the api key must be
one that only allows the
documents:search action on
desired collections
instantsearch.js
import instantsearch from "instantsearch.js";
import { searchBox, hits } from "instantsearch.js/es/widgets" ;
import TypesenseInstantSearchAdapter from "typesense-instantsearch-adapter" ;

const typesenseInstantsearchAdapter = new TypesenseInstantSearchAdapter ({
server: {
apiKey: "S6Af42wX9HmxF5Ar36ajNNawNywAYwGr" ,
nodes: [
{
host: "search-api-typesense.ddev.site" ,
port: "8108",
protocol: "https",
},
],
},
additionalSearchParameters : {
query_by: "body",
},
});
const searchClient = typesenseInstantsearchAdapter .searchClient;
34

➔ Instantsearch provides a set of
widgets:
◆ configure
◆ searchBox
◆ hits
◆ pagination
◆ stats
◆ refinementList (for facets)
◆ clearRefinements
◆ currentRefinements
◆ sortBy
◆ and many more…
instantsearch.js
const search = instantsearch({
searchClient,
indexName: "standard_search",
});

search.addWidgets([
searchBox({
container: "#searchbox",
}),
hits({
container: "#hits",
templates: {
item: `
<div class="hit-name">
{{#helpers.highlight}}{ "attribute": "name" }{{/helpers.highlight}}
</div>
`,
},
}),
]);

search.start();
35

Complete widget showcase on:
https://www.algolia.com/doc/guides/bui
lding-search-ui/widgets/showcase/js/
instantsearch.js
search.addWidgets([
pagination({
container: '#pagination',
}),
stats({
container: '#stats',
}),
refinementList({
container: `#facet`,
attribute: facet,
searchable: true,
}),
]);

36

Search private contents
37

Scoped Search Key

An API key that embed a predefined set of search parameters, that
cannot be altered on client side

38

Scoped Search Key
39
➔ Make sure that the parent search
key you use to generate a scoped
search key has no other
permissions besides
documents:search
➔ Generated on backend and sent
to fronted
➔ expires_at parameter is optional
➔ NEVER expose the parent API
key!!
$keyWithSearchPermissions = 'RN23GFr1s6jQ9kgSNg2O7fYcAUXU7127' ;
$client
->keys
->generateScopedSearchKey (
$keyWithSearchPermissions ,
['filter_by' => 'company_id:124', 'expires_at' => 1906054106]
);

40
➔ Quickly generate a scoped API key directly on the
Drupal UI
Scoped Search Key

Semantic search
41

Semantic search
42
➔ Embeds are generated using an LLM (both locally or
with some remote provider, like OpenAI)
➔ Embeds are saved in a float[] field named
embedding
➔ Content from embedding fields is concatenated
together and then splitted in chunks
➔ It’s possible to configure the size of chunks and the
overlap between consecutive chunks

➔ Typesense will generate the
embedding for the search key
and use the cosine similarity to
find similar documents
➔ Usually we want to exclude the
embedding field from the results
because it’s just a huge vector of
float numbers
Semantic search
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"q": "device to type things on",
"query_by": "embedding",
"collection": "products",
"prefix": "false",
"exclude_fields": "embedding",
"per_page": 1
}
]
}'
43

➔ Typesense will do a keyword
search on all the regular fields,
and a semantic search on the
embedding field and combine
the results using a rank fusion
algorithm to arrive at a fusion
score that is used to rank the hits
Hybrid search
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"q": "desktop copier",
"query_by": "product_name,embedding",
"collection": "products",
"prefix": "false",
"exclude_fields": "embedding",
"per_page": 2
}
]
}'
44

Conversation
45

46

47

RAG

Retrieval-Augmented Generation

48

How a RAG works (https://learnitnow.medium.com/building-rag-using-google-gemini-d11f8095e035)
49

Search API Typesense missing features
50
➔ Geosearch support
➔ Aliases
➔ Search preset
➔ Analytics
➔ Tests
➔ Detailed documentation

Join us for contribution opportunities!
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
27 September:
09:00 – 18:00
Room 111

24 September: 16:30 - 17:15
Room BoF 4 (121)
25 September: 11:30 - 12:15
Room BoF 4 (121)
27 September: 09:00 - 12:30
Room 111

24-26 September: 9:00 - 18:00
Area 1
27 September: 09 - 18:00
Room 112

#DrupalContributions

Please fill out the Individual session survey
(in the Mobile App using QR code)
What did you think?
QR Will be provided to
the Speakers prior to the
conference