Semantic Search

In order to use semantic search you must first generate semantic indexes. We say that a semantic index is a searchable index that is the result of converting your searchable documents to vectors, called embeddings, based on their content. Once a semantic index is set up, new documents that are passed to TellusR will automatically be added to it, and you can schedule tasks for reindexing all documents.

The default behavior is that /select_a, /select_b, and /select_c use all semantic indexes that are available.

Below is an overview of query parameters for semantic search in TellusR. By default, the /select_a, /select_b, and /select_c search handlers include a semantic component as part of the search process. The default behavior is that this component uses all semantic indexes that are available. If a semantic component is not set to target an index, then it just reverts to regular search; in that scenario the below parameters do nothing.

  • semantic.q (Optional): A query as text. Optional in the sense that if this parameter is not provided then the query is treated as a regular query.
  • semantic.ratio (Optional): A number between 0 and 1. Assuming a regular query parameter is provided in addition to the semantic query parameter, this parameter determines the weight of semantic hits vs regular hits in their final score; 1 means full semantic search, 0 means full regular search, and anything between is a linear mix between normalized scores.
  • semantic.k (Optional): Determines k in knn vector queries. The default value is 100. The given k determines how many results are fetched from each semantic index.

Example query, search for “viking” with ratio = 0.5:

https.//your-domain.com/solr/<your-core>/select_a?indent=true&q.op=OR&q=viking&semantic.q=viking&semantic.ratio=0.5

You can manage your semantic indexes under Admin -> Indexing. Here you can:

  • Configure new semantic indexes
  • Schedule reindexing
  • See an overview of queued and completed reindexing tasks

alt text

Configure new semantic indexes

alt text

To configure semantic indexes, you must decide on which document fields should be made available for semantic indexing. Go to Configuration -> Solr Schema -> and mark the fields that you want to use.

alt text

  • Core: the core from which to take documents.
  • Index tag: preferably a short descriptive tag of the index you are about to create. You will need this tag if you will configure search components manually. See Advanced Configuration.
  • Semantic model: Which model to use.
  • Language: Select a language. It is recommended to select a language that fits the language of your documents.
  • Fields to use in index: Here you can specify which fields should be treated as the documents content. Their content is then joined and converted to an embedding. E.g. if your documents have title and description you can select both fields to create embeddings based on joining title with description.

Tips & guidelines:

  • Only use fields whose content as text is descriptive of the document, and find the right balance between creating many indexes and joining fields.
  • Having trouble hitting keywords? Try combining short, keyword-like fields such as author, genre, and maybe even title, into one index with the multi-embedding model.
  • Keep in mind that adding too many indexes will add unnecessary overhead to your application. Usually you don’t need more than 2-3 indexes to get a good search.
  • If you have many documents, then it is recommended to use multi-embedding if you are running without GPU.

Schedule reindexing

alt text

Each index can be scheduled to be re-created from scratch daily, or weekly. Why is this necessary when indexes are also updated live? The answer is that live updates only catch new documents, and not changes to existing documents. Changes or additions to documents can potentially impact the way the selected model operates for the index. A full reindex ensures that the model is up-to-date with all the latest information from documents.

Tips & guidelines:

  • If possible run reindexing during the night (make sure you select an appropriate timezone).
  • If you have few documents, daily reindexing is fine.
  • If you have many documents, weekly reindexing may be best. Try to reindex different indexes at different weekdays.

About sorting & paging

Regular Solr queries support sorting. Semantic search has a limited support for sorting - you can sort ascending and descending on sortable fields, but functions in terms of sortable fields is not yet officially supported.

Regular Solr queries support paging using the rows and start parameter. Paging is at this time not officially supported for semantic queries.