Paper Database Search Tool

FTSC RAG Search Tool

Welcome to the Flight Test Safety Committee's paper database RAG search tool!

You can use this tool to search the FTSC paper database for sources about a specific topic.

Guidelines for using the tool:

You can ask in complete sentences ("What do I need to know about high-altitude flight test?") or short phrases ("high-altitude flight testing")
The LLM may create fake information (hallucinate), or even fabricate entire sources. Carefully check all responses.
If the model returns "no relevant information found in the available sources", try submitting a slightly longer, more detailed prompt.
LLM performance tends to degrade as a conversation gets longer. We strongly recommend clearing the conversation history using the "Clear Chat" button when you switch topics, or when the conversation starts to get long.

How does this tool work?

This tool uses a technique called Retrieval-Augmented Generation, which uses a large language model (LLM) connected to a database of information.

Retrieval-Augmented Generation (RAG) is a technique to fine-tune an LLM to a specific database or use case without the need for retraining, which would be expensive and infeasible for small-scale use.

When you ask a question, the RAG tool first generates a general answer and then backs it up with information retrieved directly from the database.

RAG does the following:

1. Database Creation: The files in the database are broken into text "chunks" which are converted into numerical vectors ("embeddings") that encode their semantic meaning. From this point on, the RAG tool only uses this vector database of chunk embeddings, and does not have access to the raw files (e.g. PDFs) in the original database.

2. Document Retrieval: When you ask a query, it is converted into a numerical vector embedding in the same way as the database files. The retrieval system then searches through the embeddings of all corpus documents to find the document chunks that are most similar to the query (using vector euclidean distance or another metric). These correspond to the most relevant documents.

3. Context Assembly: The most relevant document chunks are retrieved and combined with your original question and conversation history to create a comprehensive context.

4. Response Generation: A large language model (in our case, a lightweight variant of Gemini) uses the combined context from step 3 to generate a response to your query.

Because the context from step 3 only contains the most relevant content from the database, the LLM's response is tailored to your query and is less likely to be distracted by irrelevant content.

Multi-message conversations: This tool now maintains conversation history. Each new message includes the previous conversation context, allowing for follow-up questions and coherent multi-turn discussions.

Like other LLM-based chat tools (e.g. ChatGPT, Claude, Gemini), this tool uses a system prompt which your query is appended to. This prompt shapes the model's behavior, tone, and things it is allowed and not allowed to say in response to your query.

Guidelines for Use

Guidelines for Use

FTSC

How does this tool work?

Promoting Flight Test Safety ftsc@flighttestsafety.org Flight Test Safety Committee P. O. Box 986 Lancaster, CA 93584

Promoting Flight Test Safety

ftsc@flighttestsafety.org
Flight Test Safety Committee
P. O. Box 986
Lancaster, CA 93584