Case Study • RetailGPT

Building Conversational Commerce Infrastructure.

Ionio partnered with a Saudi-based retail analytics company to build the future of retail before the tools existed. We implemented function calling and digital twins months before they became industry standards.

2M+Monthly Active Users
1M+Products Enriched
42Mobile Screens
8 Mo.Time to Production
Case Study • RetailGPT

Building Conversational Commerce Infrastructure.

Ionio partnered with a Saudi-based retail analytics company to build the future of retail before the tools existed. We implemented function calling and digital twins months before they became industry standards.

IONIO
Replace this img with your Webflow Element
2M+ Monthly Active Users
1M+ Products Enriched
42 Mobile Screens
8 Mo. Time to Production

The Expensive Problem

A mid-market retail analytics platform serving 2M+ monthly active users was bleeding value at the discovery layer. Shoppers searched. They didn't find. They left.

The platform had built a successful business connecting shoppers to physical retail locations across the region. But the experience was manual, fragmented, and increasingly outdated. Users searching through a catalog of 1M+ products had to cross-reference inventory across mall locations themselves.

DATA LAYERS (LEGACY) Ionio INVENTORY LOYALTY PROFILE HISTORY PIM
Hover to Unify Data Intelligence Unified

The deeper problem was data. Nine years of behavioral signals—purchase histories, browse patterns, loyalty interactions—sat scattered across 35+ microservices and 150+ API routes. A shopper could buy baby products for three years, and the platform would still show them generic recommendations.

The client's vision: build a conversational AI layer that could finally unlock this data. Not a chatbot. Infrastructure for prescriptive commerce.

!
Constraint Analysis: Late 2023 Environment

The Catch: In late 2023, the tools to build this didn't exist yet. LLMs couldn't reliably call external functions. Voice-to-voice AI was experimental. Building persistent customer memory—storing preferences, relationships, dietary restrictions—required custom architecture that no off-the-shelf solution could provide.

Why This Was Hard

This wasn't a wrapper around ChatGPT. The project required solving five problems that the industry hadn't standardized solutions for. We started in early 2024—months before the tooling would catch up.

Function Calling Before It Existed
Frontier

The AI needed to do more than generate text. It needed to query inventory APIs, check loyalty balances, search products by attribute, and calculate shipping estimates—all within a natural conversation. But when we started, OpenAI's function calling was either non-existent or unreliable for production. The open-source models we needed (for cost and compliance) had no native tool-use capabilities at all.

A Nine-Year-Old Codebase
Complex

The platform's backend had evolved through multiple technology generations. 35+ microservices. 150+ API routes. Authentication patterns that changed depending on which era the service was built. The backend team was fully occupied keeping production stable—they couldn't dedicate resources to building new integration layers for an experimental AI project. Whatever we built had to work around their constraints, not require them to change.

A Million SKUs of Dirty Data
Scale

Product catalogs in retail are notoriously messy. Missing descriptions. Inconsistent categorization. Nutritional data that exists for some products but not others. When you're building semantic search and personalized recommendations, data quality determines your ceiling. The catalog had a million products. Most of them had incomplete or inconsistent metadata. You can't recommend 'gluten-free snacks for kids' if your snack products don't have allergen tags.

Personalization Without Exposing PII
Critical

The platform served multiple retail brands. Shoppers expected personalization—but brands couldn't see each other's customer data. The AI needed to know 'this shopper prefers organic products, has a child with a nut allergy, and typically shops at Mall X on weekends'without that context ever exposing raw PII to inference pipelines or leaking across brand boundaries.

Brand Voice at Multi-Tenant Scale
Novel

Every brand on the platform had a distinct voice. The luxury cosmetics brand communicates differently than the budget grocery chain. The platform's own chatbot had its own personality. The AI needed to shift between these voices seamlessly based on context—and do it consistently across millions of conversations without requiring separate fine-tuned models for each tenant.

"We weren't building features. We were building the infrastructure that would make those features possible—months before the rest of the industry had the tools to even attempt it."

The Solution

What We Built

The solution architecture operated across two layers—a data intelligence layer and an experience layer—connected through a real-time API mesh we built to bridge the legacy backend.

System Architecture Map
Experience Layer
Native iOS/Android apps (Flutter), Voice-to-voice, App Clips.
AI Engine
Dual-model architecture, Tool-routing, Custom RAG pipeline.
Data Layer
SKU enrichment (1M+ products), Persistent Memory, Vector Embeddings.
Integration Layer
Custom REST API mesh wrapping 35+ legacy microservices.

The Dual-Model Architecture

To solve function calling without native support, we built a two-model system. The first model handled natural conversation, while the second specialized in tool routing. The conversation model would generate intent; the routing model would execute it.

Intent vs Execution Flow
Model 1 Chat
Model 2 Router
Inventory API Loyalty Check Semantic Search

This architecture shipped 8-12 months before function calling became an industry standard.

"We built function calling before function calling existed. By the time OpenAI shipped it as a feature, we'd already processed millions of tool-routed conversations."

The Persistent Memory Layer

Instead of passing raw customer data through inference, we built an abstracted context layer. The system stored preferences, family relationships, and behavioral patterns as anonymized attributes.

PII Abstraction Pipeline
Raw: Rohan S.
Raw: Card ****
Safe: Male 25+
Safe: High Value

We shipped this in early 2024. ChatGPT launched its memory feature in November 2024—eight months later.

The SKU Enrichment Pipeline

We built a data pipeline that processed the full 1M+ product catalog. For each SKU, we generated AI descriptions, sourced nutritional data, and applied hierarchical tagging.

Continuous Processing (1M+ SKUs)

Brand Voice System

We created brand voice profiles that captured tone, vocabulary constraints, and formality levels. This handled multi-tenant personality switching through structured prompting.

"Indulge in our restorative night cream, crafted with rare botanical extracts to rejuvenate your skin's natural luminosity."
"Check out this night cream! It's got great ingredients to help your skin look fresh, and it's on sale this week."
"I found a night cream that matches your search. It contains botanical extracts and is rated 4.5 stars."
Project Roadmap

Our Approach

We structured the engagement in four phases, designed to deliver incremental value while managing technical risk.

Execution Timeline (32 Weeks)
Data
AI
Mobile
Phase 1: Foundation Weeks 1-6
  • Legacy system audit and API mapping across 35+ microservices
  • Data architecture design for the Persistent Memory Layer
  • SKU enrichment pipeline MVP processing initial product batch
  • LLM evaluation and selection (Mistral for initial deployment)
Phase 2: AI Development Weeks 7-16
  • Dual-model architecture implementation
  • Custom RAG pipeline for product search and recommendations
  • Brand voice personality system development
  • Self-hosted vLLM deployment for inference at scale
"Phases 2 and 3 ran in parallel. While the AI team built the inference layer, the mobile team was already implementing screens against mocked endpoints. This overlap cut months off the timeline."
Phase 3: Mobile Impl. Weeks 12-24
  • Flutter application development (iOS and Android, 42 screens)
  • Voice-to-voice conversation integration
  • App Clips for instant demos (scan QR, experience AI immediately)
  • Payment provider integration (regional retail payment flows)
  • Loyalty system unification across brands
Phase 4: Integration Weeks 20-32
  • Full catalog enrichment (1M+ SKUs processed)
  • Model migration (Mistral → Llama) as capabilities improved
  • Offer generation engine pilot (personalized promotions)
  • SOC2 and GDPR compliance validation
  • Production deployment and performance optimization
Internal Infrastructure

Custom Tooling We Had to Build

Standard tooling didn't exist for what we needed. We built three internal systems that made the project possible—and that we now use across all our retail AI engagements.

These aren't throwaway scripts. They're production infrastructure that now accelerates every project we take on. When clients ask how we move fast, this is the answer.

1. Ionio Conversation Data Collection Platform™

To build effective prompts and calibrate brand voice personalities, we needed high-quality training data. We built a platform where team members could have conversations with the model, annotate responses, flag quality issues, and generate synthetic variations. This data fed directly into prompt optimization.

Prompt Engineering Interface
I need a moisturizer for sensitive skin.
I recommend the 'Calm & Restore' gel. It's oat-based and fragrance-free.
Does it have SPF?
Quality Annotation
★★★★☆
🚩 Flag Hallucination
Actions

2. Ionio Conversational Test Bench™

Testing conversational AI at scale is fundamentally different from testing traditional software. Unit tests don't catch the ways conversations fail. We built a test bench that simulates buyers with different intents, preferences, and conversation styles—running automated quality assessments across model updates and catching regressions before they hit production.

Automated Persona Simulation
Persona: Frugal Parent
Persona: Luxury
Persona: Dietary Restr.
⚠ REGRESSION DETECTED: Latency Spike in "Inventory Check" VIEW LOGS

3. SKU Enrichment Platform

Not a one-time data cleaning script—an ongoing pipeline. The platform processed the full catalog with AI-generated descriptions, auto-sourced nutritional data, and hierarchical tagging. It ran continuously as new products entered the system, maintaining data quality at scale.

Real-Time Enrichment Queue
Category: Beauty84% Complete
Category: Grocery42% Complete
SKU-9021: Organic Oat MilkEnriched (+Nutri-Score)
SKU-4412: Night CreamEnriched (+Semantic Tags)
SKU-1102: Kids Vitamin CEnriched (+Safety Warning)
SKU-3329: Gluten-Free PastaEnriched (+Allergen Data)

Technology Stack

LLMs
Mistral Llama 3 vLLM (Self-Hosted)
Mobile
Flutter iOS / Android App Clips
Backend
Node.js Express Microservices Custom REST API
Infra
Azure Vector DB Docker
Compliance
SOC2 GDPR
The ROI of Infrastructure

Business & Technical Outcomes

Technical Outcomes

  • Seamless integration with legacy codebase via custom API mesh.
  • Sub-second AI response times 2M+ Monthly Active Users.
  • 1M+ products enriched with AI-generated metadata and tagging.
  • Full SOC2-ready and GDPR complaint data pipelines.

Operational Impact

  • Unified loyalty experience across differing brand tenants.
  • Location-aware inventory driving foot traffic to physical stores.
  • Persistent Memory enabling true personalization across sessions.
  • Discovery friction significantly reduced, increasing conversion velocity.

What We Learned

Building at the frontier means building your own tools. We were 8-12 months ahead of industry standard on function calling, persistent memory, and production-grade conversational commerce infrastructure.

The data layer determines the ceiling.

No amount of prompt engineering compensates for dirty product data. The enrichment pipeline wasn't optional—it was foundational.

Legacy integration is a feature.

Retail companies have years of valuable data locked in old systems. The ability to unlock that data without requiring platform rewrites is the actual value proposition.

Test infrastructure is non-negotiable.

Conversational AI fails in ways unit tests don't catch. The test bench—simulating real buyer conversations at scale—was essential for maintaining quality.

Self-hosted inference changes economics.

At 2M+ MAU, API costs would have been prohibitive. vLLM deployment gave us the latency and cost structure the business required.

Innovation Gap Analysis
Function Calling 2024
JAN
JUN
DEC
Ionio Shipped
+8 MONTHS
Industry Std
Persistent Memory 2024
Ionio Shipped
+8 MONTHS
OpenAI Feature

The internal tools we built—the data collection platform, the test bench, the enrichment pipeline—now form the foundation of how we approach every retail AI engagement. These aren't one-off solutions; they're reusable infrastructure that accelerates every project that follows.

Most retail and e-commerce platforms are at a decision point: continue adding AI features to legacy architecture, or rebuild core systems to be AI-native from the foundation.

The difference matters. Bolted-on AI feels like an add-on. AI-native platforms use intelligence as the organizing principle—the data layer, the experience layer, and the business logic are all designed for machine reasoning, not retrofitted for it.

Ionio partners with platforms in the $5M-$100M ARR range to make that transition. We don't build chatbots. We build the data infrastructure, AI engines, and experience layers that make intelligence central to how your platform creates value.

How We Operate
01 // Integration
We embed with your technical team. The work you just read represents how we operate—we build production systems that integrate with legacy architecture, and transfer the knowledge so you own what we build.
02 // Acceleration
We don't start from zero. The tooling we've developed through our engagements—test benches, data pipelines, personalization engines—now accelerates every retail AI project we take on. We start from battle-tested systems that have already processed thousands of production interactions.
03 // Experience
We know what works. We've been building AI systems over the last decade. We shipped architectures before they became mainstream. We deployed features before ChatGPT had it. We know the pitfalls because we've made the mistakes.
When to talk to us

You're facing a version of what this client faced—valuable data locked in fragmented systems, the need to compete with AI-native experiences, or the gap between your technical vision and the specialized talent required to execute it.

You need a strategic partner who understands both the technology and the business model—not a dev shop that builds what you specify.

When we're not a fit

You want a chatbot for your dashboard, AI for the press release, or features that OpenAI will commoditize in six months.

We'll tell you that directly.

The Bottom Line

Building Infrastructure vs. Adding Features

If you're building the infrastructure that makes your platform AI-native—not just AI-enabled—let's talk.

hello@ionio.ai