Search is usually one of the features of a website that is underestimated and left at the end of the project.
You can install the Search API module, and you're done with it, right?
Well, things are more complex than that, and nowadays, users are getting used to powerful search engines like the ...
Search is usually one of the features of a website that is underestimated and left at the end of the project.
You can install the Search API module, and you're done with it, right?
Well, things are more complex than that, and nowadays, users are getting used to powerful search engines like the ones used to search the Web. We expect search results in milliseconds, enhanced by AI, and targeted to us.
In recent years, a new generation of platforms and tools has emerged to provide this set of features; some are paid SaaS, but others are open source.
Typesense is an open source search engine written in C++ that is fast, typo tolerant, and easy to integrate into a Drupal website. With features such as search-as-you-type, faceted navigation, semantic search, LLM augmentation, synonyms, and curations, Typesense has what your users are looking for, and the Search API Typesense module (https://www.drupal.org/project/search_api_typesense) covers most of it.
In this session, we'll discuss installing and configuring a Typesense instance, using Search API to index our content to Typesense, and writing JavaScript code to build a fully decoupled, state-of-the-art search experience.
Size: 3.9 MB
Language: en
Added: Sep 26, 2024
Slides: 52 pages
Slide Content
Search API
meets
Typesense
Lusso Luca
About me
Drupal / PHP / Go developer @ SparkFabrik
Drupal contributor (WebProfiler, Monolog, Symfony
Messenger, Search API Typesense) and speaker
WE ARE A TECH COMPANY OF ENGINEERS,
DEVELOPERS AND DESIGNERS WHO WILL
THINK, DESIGN AND BUILD YOUR CUSTOM APPLICATIONS,
MODERNIZE YOUR LEGACY
AND TAKE YOU TO THE CLOUD NATIVE ERA
4
PROUD OF OUR PARTNERSHIPS
We help italian businesses to
bridge the gap with China
thanks to our
official partnership with Alibaba
Cloud
We are Google Cloud Platform
Technology Partner
We are official AWS partners
5
PROUD OF OUR MEMBERSHIPS
We arere Silver Member of the
Open Source Security
Foundation
We are supporter of the
Cloud Transformation
Observatory of the PoliMi
We are Silver Member of the
Cloud Native Computing
Foundation
We are Silver Member of the
Linux Foundation Europe
Demo
6
Typesense is a modern, blazing-fast,
developer-friendly, open source search engine.
It uses cutting-edge algorithms that take
advantage of the latest advances in Hardware
Capabilities & Machine Learning.
https://typesense.org
7
C++
+20k stars on Github
Self-hosted or SaaS
Easy to set-up
8
➔ Current version is v27
➔ A new release every 2-3
months
Public roadmap
https://github.com/orgs/typesense/projects/1/views/1
9
Typesense doesn’t provide a UI, only REST
endpoints
If you need a UI:
●Typesense cloud (paid product)
●https://github.com/bfritscher/typesense-dashboard
10
A lot of features
11
Source: https://typesense.org
API Libraries are available in many languages
12
Source: https://typesense.org
Integrations exist for many platform
13
Source: https://typesense.org
Sadly, no Drupal here…
There’s a module for that™
Search API Typesense
https://www.drupal.org/project/search_api_typesense
14
Try Typesense on DDEV
https://github.com/kevinquillen/ddev-typesense
15
16
➔ Admin API key is set when Typesense server is
started
➔ Typesense is distributed, here you can set all the
available nodes
➔ Nearest node is to leverage Search Delivery Network
in Typesense Cloud
➔ Showed configuration is the one to connect to a
DDEV managed Typesense server
Connect to the server
17
New tabs for Search API server
18
19
20
21
New tabs for Search API index
22
➔ Search API Typesense defines a set of new
field types, it’s mandatory to use them when
indexing data to Typesense
➔ Supported field type are:
◆ string
◆ string[]
◆ int32
◆ int32[]
◆ int64
◆ int64[]
◆ float
◆ float[]
◆ bool
◆ bool[]
Fields configuration
23
➔ For every field added to Search API we must
configure how it will be exposed to Typesense
➔ A collection on Typesense server will not be created
until the schema form is submitted
Fields configuration
24
➔ Typesense handles typographical errors
automatically for you out-of-the-box
➔ Different parameters can be added to the search
query to fine tune typo tolerance:
https://typesense.org/docs/27.0/api/search.html#t
ypo-tolerance-parameters
➔ One useful parameter is
enable_typos_for_alpha_numerical_tokens to
enable/disable typo tolerance on alphanumeric
fields (like product code or SKUs)
Typo Tolerance
25
➔ The synonyms feature allows you to define search
terms that should be considered equivalent
Synonyms
26
➔ Can be configured directly on Drupal
➔ Typesense supports two types of synonyms:
◆ One-way synonyms: Defining the words
iphone and android as one-way synonyms of
smart phone will cause searches for smart
phone to return documents containing iphone
or android or both.
◆ Multi-way synonyms: Defining the words
blazer, coat and jacket as multi-way
synonyms will cause searches for any one of
those words (eg: coat) to return documents
containing at least one of the words in the
synonym set (eg: records with blazer or coat
or jacket are returned).
Synonyms
27
➔ Well, what we all expect about a faceting feature…
➔ More info here:
https://typesense.org/docs/27.0/api/search.html#f
acet-results
Faceting
28
Searching for ckeditor, the
document with id
entity:node-3456622:en is ranked
third.
I want to promote this result to be
the first one when users are looking
for ckeditor results.
Merchandising
29
Let’s create a curation that says:
when the query is ckeditor, include
the document
entity:node-3456622:en as the first
one.
Merchandising
30
??????
Curations can be used to push
desired results to fixed positions on
the search result page.
Curations can also exclude one or
more results to appear when a user
types a specific query.
Merchandising
search_api_typesense module
doesn’t have an integration
with Views…
…and probably never will
31
introducing
Instant search
32
InstantSearch.js is an open
source UI library for Vanilla JS
that lets you build a search
interface in your frontend app
➔ Typesense team has developed
an adapter to make
instantsearch API calls
compatible with Typesense
server
➔ With Instantsearch, a client (like
a browser), directly talks with
Typesense, without Drupal in the
middle
➔ Important: the api key must be
one that only allows the
documents:search action on
desired collections
instantsearch.js
import instantsearch from "instantsearch.js";
import { searchBox, hits } from "instantsearch.js/es/widgets" ;
import TypesenseInstantSearchAdapter from "typesense-instantsearch-adapter" ;
An API key that embed a predefined set of search parameters, that
cannot be altered on client side
38
Scoped Search Key
39
➔ Make sure that the parent search
key you use to generate a scoped
search key has no other
permissions besides
documents:search
➔ Generated on backend and sent
to fronted
➔ expires_at parameter is optional
➔ NEVER expose the parent API
key!!
$keyWithSearchPermissions = 'RN23GFr1s6jQ9kgSNg2O7fYcAUXU7127' ;
$client
->keys
->generateScopedSearchKey (
$keyWithSearchPermissions ,
['filter_by' => 'company_id:124', 'expires_at' => 1906054106]
);
40
➔ Quickly generate a scoped API key directly on the
Drupal UI
Scoped Search Key
Semantic search
41
Semantic search
42
➔ Embeds are generated using an LLM (both locally or
with some remote provider, like OpenAI)
➔ Embeds are saved in a float[] field named
embedding
➔ Content from embedding fields is concatenated
together and then splitted in chunks
➔ It’s possible to configure the size of chunks and the
overlap between consecutive chunks
➔ Typesense will generate the
embedding for the search key
and use the cosine similarity to
find similar documents
➔ Usually we want to exclude the
embedding field from the results
because it’s just a huge vector of
float numbers
Semantic search
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"q": "device to type things on",
"query_by": "embedding",
"collection": "products",
"prefix": "false",
"exclude_fields": "embedding",
"per_page": 1
}
]
}'
43
➔ Typesense will do a keyword
search on all the regular fields,
and a semantic search on the
embedding field and combine
the results using a rank fusion
algorithm to arrive at a fusion
score that is used to rank the hits
Hybrid search
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"q": "desktop copier",
"query_by": "product_name,embedding",
"collection": "products",
"prefix": "false",
"exclude_fields": "embedding",
"per_page": 2
}
]
}'
44
Conversation
45
46
47
RAG
Retrieval-Augmented Generation
48
How a RAG works (https://learnitnow.medium.com/building-rag-using-google-gemini-d11f8095e035)
49
Search API Typesense missing features
50
➔ Geosearch support
➔ Aliases
➔ Search preset
➔ Analytics
➔ Tests
➔ Detailed documentation
Join us for contribution opportunities!
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
27 September:
09:00 – 18:00
Room 111
Please fill out the Individual session survey
(in the Mobile App using QR code)
What did you think?
QR Will be provided to
the Speakers prior to the
conference