Documents that are passed to TellusR via the API will automatically be indexed in the regular keyword search. The regular keyword search in TellusR is implemented by solr.
When you create a tellusr project, using
./tellusr.sh create <project>
then a solr collection of the same name will be added to solr.
When posting documents to TellusR, fields that are not specified in the solr schema for your project, will be autogenerated based on their content. This also means that, e.g., first sending a list and then sending a string will lead to errors. So keep the data that is passed consistent. If you need specific solr types for your data, then make sure to configure solr prior to posting documents.
You can modify the solr-configuration and schema to change the behavior of the regular search. We refer to the official solr documentation for how to make changes to solr. Or you may choose to use the tellush dashboard, see (Admin -> Configuration).
For example, let’s say your data consists of Norwegian documents as pdfs.
By default, the field content_segment
, which contains text of chunks after parsing pdfs, has the type text_general
, but you may want to configure
Norwegian stopwords and so on.
To make such configuration changes we refer to solr’s official documentation.
After making the desired changes, you must reindex your documents for the changes to take effect.
If your data is already indexed, but you need to change its solr configuration,
it is advised to first drop the solr collection for the project,
then create new project using ./tellusr.sh create <project>
,
then modify the configuration of the new project, and finally push all data again.