The Expensive Problem

A mid-market retail analytics platform serving 2M+ monthly active users was bleeding value at the discovery layer. Shoppers searched. They didn't find. They left.

The platform had built a successful business connecting shoppers to physical retail locations across the region. But the experience was manual, fragmented, and increasingly outdated. Users searching through a catalog of 1M+ products had to cross-reference inventory across mall locations themselves.

Hover to Unify Data Intelligence Unified

The deeper problem was data. Nine years of behavioral signals—purchase histories, browse patterns, loyalty interactions—sat scattered across 35+ microservices and 150+ API routes. A shopper could buy baby products for three years, and the platform would still show them generic recommendations.

The client's vision: build a conversational AI layer that could finally unlock this data. Not a chatbot. Infrastructure for prescriptive commerce.

Constraint Analysis: Late 2023 Environment

The Catch: In late 2023, the tools to build this didn't exist yet. LLMs couldn't reliably call external functions. Voice-to-voice AI was experimental. Building persistent customer memory—storing preferences, relationships, dietary restrictions—required custom architecture that no off-the-shelf solution could provide.

Why This Was Hard

This wasn't a wrapper around ChatGPT. The project required solving five problems that the industry hadn't standardized solutions for. We started in late 2023—months before the tooling would catch up.

Function Calling Before It Existed

Frontier

The AI needed to do more than generate text. It needed to query inventory APIs, check loyalty balances, search products by attribute, and calculate shipping estimates—all within a natural conversation. But when we started, OpenAI's function calling was either non-existent or unreliable for production. The open-source models we needed (for cost and compliance) had no native tool-use capabilities at all.

A Nine-Year-Old Codebase

Complex

The platform's backend had evolved through multiple technology generations. 35+ microservices. 150+ API routes. Authentication patterns that changed depending on which era the service was built. The backend team was fully occupied keeping production stable—they couldn't dedicate resources to building new integration layers for an experimental AI project. Whatever we built had to work around their constraints, not require them to change.

A Million SKUs of Dirty Data

Scale

Product catalogs in retail are notoriously messy. Missing descriptions. Inconsistent categorization. Nutritional data that exists for some products but not others. When you're building semantic search and personalized recommendations, data quality determines your ceiling. The catalog had a million products. Most of them had incomplete or inconsistent metadata. You can't recommend 'gluten-free snacks for kids' if your snack products don't have allergen tags.

Personalization Without Exposing PII

Critical

The platform served multiple retail brands. Shoppers expected personalization—but brands couldn't see each other's customer data. The AI needed to know 'this shopper prefers organic products, has a child with a nut allergy, and typically shops at Mall X on weekends'—without that context ever exposing raw PII to inference pipelines or leaking across brand boundaries.

Brand Voice at Multi-Tenant Scale

Novel

Every brand on the platform had a distinct voice. The luxury cosmetics brand communicates differently than the budget grocery chain. The platform's own chatbot had its own personality. The AI needed to shift between these voices seamlessly based on context—and do it consistently across millions of conversations without requiring separate fine-tuned models for each tenant.

"We weren't building features. We were building the infrastructure that would make those features possible—months before the rest of the industry had the tools to even attempt it."

The Solution

What We Built

The solution architecture operated across two layers—a data intelligence layer and an experience layer—connected through a real-time API mesh we built to bridge the legacy backend.

System Architecture Map

Experience Layer

Native iOS/Android apps (Flutter), Voice-to-voice, App Clips.

AI Engine

Dual-model architecture, Tool-routing, Custom RAG pipeline.

Data Layer

SKU enrichment (1M+ products), Persistent Memory, Vector Embeddings.

Integration Layer

Custom REST API mesh wrapping 35+ legacy microservices.

The Dual-Model Architecture

To solve function calling without native support, we built a two-model system. The first model handled natural conversation, while the second specialized in tool routing. The conversation model would generate intent; the routing model would execute it.

Intent vs Execution Flow

Model 1 Chat

Model 2 Router

Inventory API Loyalty Check Semantic Search

This architecture shipped 8-12 months before function calling became an industry standard.

"We built function calling before function calling existed. By the time OpenAI shipped it as a feature, we'd already processed millions of tool-routed conversations."

The Persistent Memory Layer

Instead of passing raw customer data through inference, we built an abstracted context layer. The system stored preferences, family relationships, and behavioral patterns as anonymized attributes.

PII Abstraction Pipeline

Raw: Rohan S.

Raw: Card ****

Safe: Male 25+

Safe: High Value

We shipped this in early 2024. ChatGPT launched its memory feature in November 2024—eight months later.

The SKU Enrichment Pipeline

We built a data pipeline that processed the full 1M+ product catalog. For each SKU, we generated AI descriptions, sourced nutritional data, and applied hierarchical tagging.

Continuous Processing (1M+ SKUs)

Brand Voice System

We created brand voice profiles that captured tone, vocabulary constraints, and formality levels. This handled multi-tenant personality switching through structured prompting.

Luxury Brand Budget Grocery Default Platform

"Indulge in our restorative night cream, crafted with rare botanical extracts to rejuvenate your skin's natural luminosity."

"Check out this night cream! It's got great ingredients to help your skin look fresh, and it's on sale this week."

"I found a night cream that matches your search. It contains botanical extracts and is rated 4.5 stars."

Project Roadmap

Our Approach

We structured the engagement in four phases, designed to deliver incremental value while managing technical risk.

Execution Timeline (32 Weeks)

Data

Mobile

Phase 1: Foundation Weeks 1-6

Legacy system audit and API mapping across 35+ microservices
Data architecture design for the Persistent Memory Layer
SKU enrichment pipeline MVP processing initial product batch
LLM evaluation and selection (Mistral for initial deployment)

Phase 2: AI Development Weeks 7-16

Dual-model architecture implementation
Custom RAG pipeline for product search and recommendations
Brand voice personality system development
Self-hosted vLLM deployment for inference at scale

"Phases 2 and 3 ran in parallel. While the AI team built the inference layer, the mobile team was already implementing screens against mocked endpoints. This overlap cut months off the timeline."

Phase 3: Mobile Impl. Weeks 12-24

Flutter application development (iOS and Android, 42 screens)
Voice-to-voice conversation integration
App Clips for instant demos (scan QR, experience AI immediately)
Payment provider integration (regional retail payment flows)
Loyalty system unification across brands

Phase 4: Integration Weeks 20-32

Full catalog enrichment (1M+ SKUs processed)
Model migration (Mistral → Llama) as capabilities improved
Offer generation engine pilot (personalized promotions)
SOC2 and GDPR compliance validation
Production deployment and performance optimization

Internal Infrastructure

Custom Tooling We Had to Build

Standard tooling didn't exist for what we needed. We built three internal systems that made the project possible—and that we now use across all our retail AI engagements.

These aren't throwaway scripts. They're production infrastructure that now accelerates every project we take on. When clients ask how we move fast, this is the answer.

1. Ionio Conversation Data Collection Platform™

To build effective prompts and calibrate brand voice personalities, we needed high-quality training data. We built a platform where team members could have conversations with the model, annotate responses, flag quality issues, and generate synthetic variations. This data fed directly into prompt optimization.

Prompt Engineering Interface

I need a moisturizer for sensitive skin.

I recommend the 'Calm & Restore' gel. It's oat-based and fragrance-free.

Does it have SPF?

Quality Annotation

★★★★☆

🚩 Flag Hallucination

Actions

2. Ionio Conversational Test Bench™

Testing conversational AI at scale is fundamentally different from testing traditional software. Unit tests don't catch the ways conversations fail. We built a test bench that simulates buyers with different intents, preferences, and conversation styles—running automated quality assessments across model updates and catching regressions before they hit production.

Automated Persona Simulation

Persona: Frugal Parent

Persona: Luxury

Persona: Dietary Restr.

⚠ REGRESSION DETECTED: Latency Spike in "Inventory Check" VIEW LOGS

3. SKU Enrichment Platform

Not a one-time data cleaning script—an ongoing pipeline. The platform processed the full catalog with AI-generated descriptions, auto-sourced nutritional data, and hierarchical tagging. It ran continuously as new products entered the system, maintaining data quality at scale.

Real-Time Enrichment Queue

Category: Beauty84% Complete

Category: Grocery42% Complete

SKU-9021: Organic Oat MilkEnriched (+Nutri-Score)

SKU-4412: Night CreamEnriched (+Semantic Tags)

SKU-1102: Kids Vitamin CEnriched (+Safety Warning)

SKU-3329: Gluten-Free PastaEnriched (+Allergen Data)

Technology Stack

LLMs

Mistral Llama 3 vLLM (Self-Hosted)

Mobile

Flutter iOS / Android App Clips

Backend

Node.js Express Microservices Custom REST API

Infra

Azure Vector DB Docker

Compliance

SOC2 GDPR

The ROI of Infrastructure

Business & Technical Outcomes

Technical Outcomes

Seamless integration with legacy codebase via custom API mesh.
Sub-second AI response times for 2M+ Monthly Active Users.
1M+ products enriched with AI-generated metadata and tagging.
Full SOC2-ready and GDPR compliant data pipelines.

Operational Impact

Unified loyalty experience across differing brand tenants.
Location-aware inventory driving foot traffic to physical stores.
Persistent Memory enabling true personalization across sessions.
Discovery friction significantly reduced, increasing conversion velocity.

What We Learned

Building at the frontier means building your own tools. We were 8-12 months ahead of industry standard on function calling, persistent memory, and production-grade conversational commerce infrastructure.

The data layer determines the ceiling.

No amount of prompt engineering compensates for dirty product data. The enrichment pipeline wasn't optional—it was foundational.

Legacy integration is a feature.

Retail companies have years of valuable data locked in old systems. The ability to unlock that data without requiring platform rewrites is the actual value proposition.

Test infrastructure is non-negotiable.

Conversational AI fails in ways unit tests don't catch. The test bench—simulating real buyer conversations at scale—was essential for maintaining quality.

Self-hosted inference changes economics.

At 2M+ MAU, API costs would have been prohibitive. vLLM deployment gave us the latency and cost structure the business required.

Innovation Gap Analysis

Function Calling 2024

JAN

JUN

DEC

Ionio Shipped

+8 MONTHS

Industry Std

Persistent Memory 2024

Ionio Shipped

+8 MONTHS

OpenAI Feature

The internal tools we built—the data collection platform, the test bench, the enrichment pipeline—now form the foundation of how we approach every retail AI engagement. These aren't one-off solutions; they're reusable infrastructure that accelerates every project that follows.

Most retail and e-commerce platforms are at a decision point: continue adding AI features to legacy architecture, or rebuild core systems to be AI-native from the foundation.

The difference matters. Bolted-on AI feels like an add-on. AI-native platforms use intelligence as the organizing principle—the data layer, the experience layer, and the business logic are all designed for machine reasoning, not retrofitted for it.

Ionio partners with platforms in the $5M-$100M ARR range to make that transition. We don't build chatbots. We build the data infrastructure, AI engines, and experience layers that make intelligence central to how your platform creates value.

How We Operate

01 // Integration

We embed with your technical team. The work you just read represents how we operate—we build production systems that integrate with legacy architecture, and transfer the knowledge so you own what we build.

02 // Acceleration

We don't start from zero. The tooling we've developed through our engagements—test benches, data pipelines, personalization engines—now accelerates every retail AI project we take on. We start from battle-tested systems that have already processed thousands of production interactions.

03 // Experience

We know what works. We've been building AI systems over the last decade. We shipped architectures before they became mainstream. We deployed features before ChatGPT had it. We know the pitfalls because we've made the mistakes.

When to talk to us

You're facing a version of what this client faced—valuable data locked in fragmented systems, the need to compete with AI-native experiences, or the gap between your technical vision and the specialized talent required to execute it.

You need a strategic partner who understands both the technology and the business model—not a dev shop that builds what you specify.

When we're not a fit

You want a chatbot for your dashboard, AI for the press release, or features that OpenAI will commoditize in six months.

We'll tell you that directly.

Next Step

Let's See If There's a Fit

30 minutes. No deck. We'll talk through your challenge, share some relevant work, and see if it makes sense to work together. Or honestly, just grab coffee and chat—no agenda needed.

Book an Intro Call →

Prefer email? contact@ionio.ai

The Ledger

Turning Legacy Platforms into AI-Native Competitors

From "Idea" to "Cash-Flowing SaaS Asset" in 8 Weeks

Automating a Manual Consulting Workflow into SaaS

Transforming "Passive Databases" into Revenue Drivers

Enabling "Enterprise-Grade" Personalization for SaaS

Building the Churn Prevention Module Merchants Need

Fine-Tuning an Embedding Model to Solve Retail Misclassification

Caching, Model Selection & Cost Strategies — Saving ~50% on OpenAI API

Leveraging LLMs for Hyper-Personalized Offer Generation

The End of the SaaS Dashboard as You Know It

Where AI Meets Your Customer Before You Do

Author-Style Steering via Contrastive Activation Vectors

Quantized LLM Benchmark Study 2025

2025 Edge Speech-to-Text Model Benchmark

Building Conversational Commerce Infrastructure.