Last week, Cerulean attended The AI Conference in San Francisco, where engineers and researchers shared insights on AI’s rapid progress. Key topics included the increasing sophistication of AI agents, retrieval-augmented generation (RAG) techniques, and how companies are harnessing AI in their businesses.
In a conversation with an executive from a leading grocery delivery app, we learned two specific ways they are using AI:
1. AI-generated relevance labels: Search results are typically guided by human annotations for relevance. For example, a human knows ‘blueberry yogurt’ isn’t relevant when searching for ‘blueberries.’ An LLM, however, can now make this distinction as well, so the developer is using LLMs to classify the relevance of grocery items in response to search terms. This has reduced operational costs and sped up improvements to their recommendation models.
2. Identifying inaccuracies: the app provides product descriptions, but occasionally these don’t match the actual products, leading to customer confusion and returns. Manually finding these mismatches is labor-intensive. The company is using LLMs to identify inconsistencies and categorize them as urgent, medium, or minor issues to address, dramatically reducing the prevalence of inaccuracies.
Other recent applications of AI in the news:
1. Best Buy: Launched an AI-powered live delivery tracking system, offering customers real-time order updates, enhancing transparency and improving the overall delivery experience.
2. Accenture: Implementing Salesforce’s AgentForce to build autonomous sales agents capable of acting as sales reps, coaches, or service agents.
3. Amazon: Released Project Amelia, an AI assistant that helps third-party sellers resolve issues and manage their businesses more effectively.
Also last week, OpenAI introduced its new o1 model, the first in a series designed to ‘think’ before responding—enabling it to tackle more complex tasks.
Why This Matters
It’s worth taking a moment to understand the rationale behind this shift.
Since the launch of ChatGPT 3.5 in November 2022, which became the fastest-growing consumer product in history, each AI model has built upon the idea of predicting the next word. But these models are path-dependent: one mistake can lead to compounding errors.
To improve accuracy, techniques like “Chain of Thought” prompt engineering have emerged, encouraging LLMs to break down tasks into steps—similar to guiding a junior employee through a large project. OpenAI’s o1 takes this further by generating multiple responses, evaluating them, and delivering the one it calculates to be the best.
For example, here is o1 taking 16 seconds to consider the puzzle, how can 8+8=4: