Semantic

TellusR’s semantic search excels at understanding user intent, making it robust against misspellings, synonyms, and general topic identification—unlike traditional keyword search, which relies strictly on exact wording.

At its core, TellusR integrates a powerful NLP module capable of handling multiple languages. By default, it includes pre-trained models optimized for common search-related use cases, eliminating the need for additional training. These models generate embeddings for both documents and queries, allowing searches to return the most relevant results based on vector similarity.

To enable semantic search, you must first generate semantic indexes.

A semantic index is a searchable index created by converting your documents into vector representations, known as embeddings. These embeddings capture the meaning and context of the content, allowing for more intuitive and context-aware search results.

Once a semantic index is set up, any new documents sent to TellusR via the API will automatically be added—provided they contain the necessary fields for indexing.

The default behavior is that queries performed with the /tellusr/api/v1/{project}/query GET and POST operations target all semantic indexes and their results are merged.

When checking out queries in the dashboard, the normalized semantic scores are displayed like this in the search result list:

boost_score.png

You can manage your semantic indexes under Admin -> Indexing. Here you can configure new semantic indexes and see an overview of ongoing and completed reindexing tasks.

Reindexing

Configure new semantic indexes

Configuration
  1. Create a semantic index:
    • Click the “+” sign in the Semantic Indexes section.
    • Configure the following settings:
      • Project: the project from which to retrieve documents.
      • Index tag: preferably a short descriptive tag of the index you are about to create. You will need this tag if you will configure search components manually. See Advanced Configuration.
      • Fields to use in index: Here you can specify which fields should be treated as the documents content. Their content is then joined and converted to an embedding. E.g. if your documents have title and content_segment you can select both fields to create embeddings based on joining title with description (content_segment).
      • Model: Select one of the provided language models unless you want to use a custom model.
        • Available language models: Select a language that best matches your documents.
        • Custom model: To use a custom sentence transformer model from Hugging Face, enter the model name into the widget as shown below.
Configuration

If you have uploaded data to tellusr using the file uploading endpoints, e.g. /tellusr/api/v1/{project}/upload-file, then the recommended setting is to make semantic indexes use content_segment (and maybe a few other metadata fields). This field is parsed from the pdfs/word-docs in such a way to that it represents semantically relevant chunks of the document with respect to the document structure. Do NOT select fields, such that their field-values combined becomes much longer than a hundred words. So avoid using fields with large field values and instead rely on smaller fields like content_segment, which is a chunked version of uploaded file content. Only use fields whose content as text is descriptive of the document. Avoid numeric fields and attributes that do not carry any semantically meaningful content.

  1. Indexing documents
    • After creating a semantic index, initiate the first indexing process by clicking “START INDEXING”.
    • Once the index is set up, all new documents sent via the API will be automatically indexed according to the selected fields.

Assuming you have starting indexing your document base, or have added documents after creating the semantic index, then the /tellusr/api/v1/{project}/query endpoint will return semantic hits if the semanticWeight is left untouched or if semanticWeight is non-zero. Use semanticWeight=1 if you only want the search to return semantic hits.

  1. Querying with semantic search
    • By default, queries sent to /tellusr/api/v1/{project}/query target all semantic indexes within the specified project, merging their results.
    • To control query behavior, use semanticWeight:
      • semanticWeight=1 retrieves only semantic hits.
      • A non-zero semanticWeight blends semantic and traditional search results.
    • When checking results in the search result list in the administration interface, normalized semantic scores are displayed for better interpretability.

Reindexing

If needed, you can reindex all documents from scratch by pressing START INDEXING. This operation replaces the existing index but does not interrupt ongoing searches.