Google's DataGemma is the first large-scale Gen AI with RAG - why it matters

Monday, 16 September 2024

50 Hits

Google

The increasingly popular generative artificial intelligence technique known as retrieval-augmented generation -- or RAG, for short -- has been a pet project of enterprises, but now it's coming to the AI main stage.

Google last week unveiled DataGemma, which is a combination of Google's Gemma open-source large language models (LLMs) and its Data Commons project for publicly available data. DataGemma uses RAG approaches to fetch the data before giving an answer to a query prompt.

The premise is to ground generative AI, to prevent "hallucinations," says Google, "by harnessing the knowledge of Data Commons to enhance LLM factuality and reasoning."

Also: What are o1 and o1-mini? OpenAI's mystery AI models are finally here

While RAG is becoming a popular approach for enabling enterprises to ground LLMs in their proprietary corporate data, using Data Commons represents the first implementation to date of RAG at the scale of cloud-based Gen AI.

Data Commons is an open-source development framework that lets one build publicly available databases. It also gathers actual data from institutions such as the United Nations that have made their data available to the public.

In connecting the two, Google notes, it is taking "two distinct approaches."

The first approach is to use the publicly available statistical data of Data Commons to fact-check specific questions entered into the prompt, such as, "Has the use of renewables increased in the world?" Google's Gemma will respond to the prompt with an assertion that cites particular stats. Google refers to this as "retrieval-interleaved generation," or RIG.

In the second approach, full-on RAG is used to cite sources of the data, "and enable more comprehensive and informative outputs," states Google. The Gemma AI model draws upon the "long-context window" of Google's closed-source model, Gemini 1.5. Context window represents the amount of input in tokens -- usually words -- that the AI model can store in temporary memory to act on.

Also: Understanding RAG: How to integrate generative AI LLMs with your business knowledge

Gemini advertises Gemini 1.5 at a context window of 128,000 tokens, though versions of it can juggle as much as a million tokens from input. Having a larger context window means that more data retrieved from Data Commons can be held in memory and perused by the model when preparing a response to the query prompt.

"DataGemma retrieves relevant contextual information from Data Commons before the model initiates response generation," states Google, "thereby minimizing the risk of hallucinations and enhancing the accuracy of responses."

Google

The research is still in development; you can dig into the details in the formal research paper by Google researcher Prashanth Radhakrishnan and colleagues.

Google says there's more testing and development to be done before DataGemma is made available publicly in Gemma and Google's closed-source model, Gemini.

Already, claims Google, the RIG and RAG have lead to improvements in quality of output such that "users will experience fewer hallucinations for use cases across research, decision-making or simply satisfying curiosity."

Also: First Gemini, now Gemma: Google's new, open AI models target developers

DataGemma is the latest example of how Google and other dominant AI firms are building out their offerings with things that go beyond LLMs.

OpenAI last week unveiled its project internally code-named "Strawberry" as two models that use a machine learning technique called "chain of thought," where the AI model is directed to spell out in statements the factors that go into a particular prediction it is making.

Original link