In order to use semantic search you must first generate semantic indexes. We say that a semantic index is a searchable index that is the result of converting your searchable documents to vectors, called embeddings, based on their content. Once a semantic index is set up, new documents that are passed to TellusR via the API will automatically be added to it, so long as the documents contain relevant fields for the index.
The default behavior is that queries performed with the /tellusr/api/v1/{project}/query
GET and POST operations target all semantic indexes and their results are merged.
When checking out queries in the dashboard, the normalized semantic scores are displayed as:
You can manage your semantic indexes under Admin -> Indexing. Here you can configure new semantic indexes and see an overview of ongoing and completed reindexing tasks.
Pressing START INDEXING will reindex all documents from scratch. The search is available during indexing, but pressing START INDEXING will immediately replace the existing index.
title
and content_segment
you can select both fields to create embeddings based on joining title with description.category
and you want to perform
semantic searches filtered by category. Then category
needs to be supplied here.If you have uploaded data to tellusr using the file uploading endpoints, e.g. /tellusr/api/v1/{project}/upload-file
,
then the recommended setting is to make semantic indexes use content_segment
(and maybe a few other metadata fields).
This field is parsed from the pdfs/word-docs in such a way to that it represents semantically relevant chunks of the document with respect to the
document structure.
Do NOT select fields, such that their field-values combined becomes much longer than a hundred words.
So avoid using fields with large field values and instead rely on smaller fields like content_segment
,
which is a chunked version of uploaded file content.
Only use fields whose content as text is descriptive of the document. Avoid numeric fields and attributes that
do not carry any semantically meaningful content.