BlogDemoRegisterSign In
Back to Blog
Technology

MetaLearner's Research on Web Search Optimization

Posted By:
Tenzin Sim
Lim Ting Hui

MetaLearner demonstrates innovative approach to optimizing web search retrieval-augmented generation pipelines using Nvidia NIMs and Llama 3.1

alt MetaLearner Logo

We're excited to share MetaLearner's research on optimizing text-based data using Nvidia NIMs and the newly introduced Llama 3.1. In this blog, we demonstrate our innovative approach to optimizing traditional web search retrieval-augmented generation pipelines. This new methodology addresses common challenges such as speed, accuracy, and the risk of hallucination, ensuring a streamlined and reliable process.

alt Figure 1

Figure 1. Overview of RAG approach

MetaLearner's Online Search Pipeline

alt Figure 2

Figure 2. MetaLearner's Online Search Pipeline

Our pipeline consists of the following key steps:

  1. Generate Data Sources: We leveraged the Step Back technique to generate multiple queries relevant to the user's original query and perform parallel searches using the Google Search SDK.
  2. Data source shortlisting powered by LLMs: Basic descriptions and headers of websites are evaluated by the LLM to determine their relevance to the user's question.
  3. Web Scraping: Once relevant links are shortlisted, we web scrape the text inside the link to allow LLM to access precise information relevant to the domain in question.
  4. Relevant summarization of web document chunks with LLMs: The LLM processes lengthy web documents, leveraging techniques analogous to MemGPT to retain important points.
  5. Parallelizing these processes to enhance response times: We performed web scraping and summarization operations concurrently to minimize waiting time.

alt Figure 3

Figure 2. llustration of Parallelization

Web scraping and summarization waiting time depends heavily on the length of context. Hence, we performed these operations concurrently to minimize waiting time. We also leveraged different models for different purposes to optimize the processing time and cost without sacrificing the output quality.

Final Result

alt Figure 4

Figure 4. Final Result of Search

Why This Matters

Traditional systems often struggle with exceeding the model's context length or causing 'information overload,' which can make the model lose focus. Our method tackles these issues by first narrowing down the relevant sources and then summarizing the data into manageable chunks. This not only mitigates the risk of hallucination—where the AI might generate inaccurate or irrelevant information—but also ensures that the final output is comprehensive and directly relevant to the user's query. Our process builds on our previous step-back approach, enabling the LLM to search broadly across sources, providing a well-rounded and accurate response to complex search topics.

Empower ERP users with chat-based AI forecasting without the need for technical expertise.Copyright © 2024 - All Rights Reserved
LINKSSupportRegister your interestBlog
LEGALTerms of ServicePrivacy Policy