Skip to content

Latest commit

 

History

History
20 lines (14 loc) · 2.78 KB

CUSTOM-INDEXES.md

File metadata and controls

20 lines (14 loc) · 2.78 KB

Custom Indexes / Knowledgebase

This bot supports per-user custom indexes. This means that users can upload files of their choosing, such as PDFs and ask GPT to answer questions based on those files. We also support using URLs for indexes.

This feature uses a large amount of tokens and money, and you should restrict it to trusted users.

Supported filetypes/sources:

  • All text and data based files (PDF, TXT, DOCX, PPTX, CSV etc)
  • Images (JPG, PNG, etc) (Note: The bot will do OCR on the images to extract the text, this requires a lot of processing power sometimes)
  • Videos/Audio (MP4, MP3, etc) (Note: The bot will use OpenAI on the audio to extract the text, this requires a lot of processing power sometimes)
  • Youtube Videos - For all youtube videos that are transcribable, the bot will index the entire transcription of the given youtube video URL!
  • GitHub repositories can also be indexed by the bot! To enable this, add an access token to GITHUB_TOKEN in your .env file. You can create a token at https://github.com/settings/tokens.

Index Compositions:
Indexes can be combined with other indexes through a composition. To combine indexes, you can run the /index compose command, and select the indexes that you want to combine together. You should only combine relevant indexes together, combining irrelevant indexes together will result in poor results (for example, don't upload a math textbook and then upload a large set of poems and combine them together). When creating a composition, you will be given the option to do a "Deep" composition, deep compositions are more detailed and will give you better results, but are incredibly costly and will sometimes take multiple minutes to compose.

You can also compose a singular index with itself with "Deep Compose", this will give you a more detailed version of the index, but will be costly and will sometimes take multiple minutes to compose. Deep compositions are useless for very short documents!

Doing a deep composition will also allow you to use the child_branch_factor parameter for /index query, increasing this past 1 will take a much longer time to query and will be much more expensive for large documents, so be wary.

When doing Deep Compositions, it's highly recommended to keep the document size small, or only do deep compositions on single documents. This is because a deep composition reorganizes the simple index into a tree structure and uses GPT to summarize different nodes of the tree, which will lead to high costs. For example, a deep composition of a 300 page lab manual and the contents of my personal website at https://kaveenk.com cost me $2 USD roughly. To save on costs, you can limit the max price a deep composition can charge you by setting MAX_DEEP_COMPOSE_PRICE in your .env file.