CustomGPT.ai Blog

How to Implement RAG API in Production: Complete Developer Walkthrough

Production RAG API: Complete Developer Walkthrough

TL;DR

Production RAG API implementation requires battle-tested patterns for security, scalability, and reliability.

This guide uses real production code from the CustomGPT Starter Kit—a complete implementation currently serving thousands of users. Instead of theoretical examples, you’ll work with actual code patterns proven in production deployments.

We’ll explore the multi-deployment architecture, API proxy security model, performance monitoring, and container deployment configurations that power starterkit.customgpt.ai.

Perfect for engineering teams moving from prototype to production-scale RAG applications using proven, source-verified implementations.

Moving from a working RAG prototype to a production system requires fundamental architectural changes. The CustomGPT Starter Kit provides a complete reference implementation that demonstrates enterprise-ready patterns used in real production deployments.

You can see this implementation running live at https://starterkit.customgpt.ai/ and examine the complete source code to understand how production RAG applications are built.

Production Architecture: Real Implementation Patterns

The starter kit demonstrates a multi-deployment architecture supporting three production modes. Let’s examine the actual implementation:

Multi-Deployment Architecture

From the starter kit’s architecture documentation:

src/
├── app/                  # Next.js App Router pages
│   ├── api/proxy/       # API proxy routes (adds auth headers server-side)
│   └── dashboard/       # Dashboard pages
├── components/          # React components
│   ├── chat/           # Chat UI components
│   ├── dashboard/      # Dashboard components
│   └── ui/             # Reusable UI primitives
├── lib/                # Core utilities
│   ├── api/            # API client and proxy handler
│   └── streaming/      # SSE message streaming
├── store/              # Zustand state stores
│   └── widget-stores/  # Isolated stores for widget mode
└── widget/             # Widget-specific entry points

API Proxy Security Pattern

The starter kit implements a comprehensive security model. From the actual src/lib/api/proxy-handler.ts:

// Real implementation from CustomGPT Starter Kit
export async function handleProxyRequest(request: Request, path: string) {
  const url = new URL(request.url);
  const config = getConfig();
  
  // Construct target URL
  const targetUrl = new URL(path, config.customgpt.baseUrl);
  
  // Copy search params
  url.searchParams.forEach((value, key) => {
    targetUrl.searchParams.set(key, value);
  });

  // Prepare headers with server-side authentication
  const headers = new Headers();
  headers.set('Authorization', `Bearer ${config.customgpt.apiKey}`);
  headers.set('Content-Type', 'application/json');
  headers.set('User-Agent', 'CustomGPT-Starter-Kit/1.0');
  
  // Add request ID for tracing
  const requestId = crypto.randomUUID();
  headers.set('X-Request-ID', requestId);

  try {
    // Forward request with timeout
    const controller = new AbortController();
    const timeoutId = setTimeout(() => controller.abort(), config.customgpt.timeout);
    
    const response = await fetch(targetUrl.toString(), {
      method: request.method,
      headers,
      body: request.method !== 'GET' ? await request.text() : undefined,
      signal: controller.signal,
    });

    clearTimeout(timeoutId);
    
    if (!response.ok) {
      console.error(`[PROXY] API Error ${response.status}:`, await response.text());
    }

    return response;
  } catch (error) {
    console.error('[PROXY] Request failed:', error);
    throw error;
  }
}

This pattern ensures API keys never reach client-side code while providing comprehensive error handling and request tracing.

Real Environment Configuration

The starter kit uses a sophisticated configuration system. From the actual .env.example:

# Required - Server-side only (never exposed to client)
CUSTOMGPT_API_KEY=your-api-key-here

# Optional - For voice features (speech-to-text, voice chat)
OPENAI_API_KEY=your-openai-api-key-here

# Optional - Custom API base URL
CUSTOMGPT_API_BASE_URL=https://app.customgpt.ai/api/v1

# Optional - CORS origins for widget deployments
ALLOWED_ORIGINS=https://yourdomain.com,https://anotherdomain.com

# Optional - Metrics and monitoring
METRICS_ENDPOINT=https://your-monitoring-service.com/metrics

Multi-Environment Setup

From src/lib/config/environment.ts (actual file):

export const getConfig = (): EnvironmentConfig => {
  const env = process.env.NODE_ENV || 'development';
  
  return {
    customgpt: {
      apiKey: process.env.CUSTOMGPT_API_KEY!,
      baseUrl: process.env.CUSTOMGPT_API_BASE_URL || 'https://app.customgpt.ai/api/v1',
      agentId: process.env.CUSTOMGPT_AGENT_ID!,
      timeout: env === 'production' ? 15000 : 30000,
      retries: 3
    },
    security: {
      allowedOrigins: process.env.ALLOWED_ORIGINS?.split(',') || 
        (env === 'development' ? ['http://localhost:3000'] : []),
      corsEnabled: env !== 'development'
    },
    monitoring: {
      logLevel: env === 'production' ? 'warn' : 'debug',
      metricsEndpoint: process.env.METRICS_ENDPOINT
    }
  };
};

Production Chat Implementation

Instead of theoretical examples, let’s examine the actual chat implementation from the starter kit:

Core Chat Component

From src/components/chat/ChatContainer.tsx (actual component):

// Real implementation serving production traffic
export function ChatContainer({ agentId, className }: ChatContainerProps) {
  const messages = useMessages();
  const { sendMessage, isLoading } = useConversation();
  const { currentAgent } = useAgents();
  
  const handleSendMessage = useCallback(async (content: string) => {
    if (!content.trim() || isLoading) return;
    
    try {
      await sendMessage({
        content,
        agentId: agentId || currentAgent?.id,
        stream: true // Enable real-time streaming
      });
    } catch (error) {
      console.error('Failed to send message:', error);
      // Graceful error handling with user feedback
      toast.error('Failed to send message. Please try again.');
    }
  }, [sendMessage, isLoading, agentId, currentAgent]);

  return (
    <div className={cn("flex flex-col h-full", className)}>
      <ChatMessages messages={messages} />
      <ChatInput 
        onSendMessage={handleSendMessage}
        disabled={isLoading}
        placeholder="Ask me anything..."
      />
    </div>
  );
}

Message Streaming Implementation

The starter kit includes production-ready streaming. From src/lib/streaming/handler.ts:

// Real streaming implementation used in production
export class StreamingHandler {
  private activeStreams = new Map<string, AbortController>();
  
  async handleStreamingResponse(
    url: string,
    options: RequestInit,
    onChunk: (chunk: string) => void,
    onComplete: () => void,
    onError: (error: Error) => void
  ) {
    const streamId = crypto.randomUUID();
    const controller = new AbortController();
    this.activeStreams.set(streamId, controller);

    try {
      const response = await fetch(url, {
        ...options,
        signal: controller.signal,
      });

      if (!response.ok) {
        throw new Error(`Stream request failed: ${response.status}`);
      }

      const reader = response.body?.getReader();
      if (!reader) throw new Error('No response body');

      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value, { stream: true });
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') continue;

            try {
              const parsed = JSON.parse(data);
              if (parsed.choices?.[0]?.delta?.content) {
                onChunk(parsed.choices[0].delta.content);
              }
            } catch (e) {
              console.warn('Failed to parse streaming chunk:', e);
            }
          }
        }
      }

      onComplete();
    } catch (error) {
      if (error.name !== 'AbortError') {
        onError(error as Error);
      }
    } finally {
      this.activeStreams.delete(streamId);
    }
  }

  abortStream(streamId: string) {
    const controller = this.activeStreams.get(streamId);
    if (controller) {
      controller.abort();
      this.activeStreams.delete(streamId);
    }
  }
}

Widget Deployment Patterns

The starter kit supports multiple deployment modes with real implementation examples:

Embedded Widget

From examples/widget-example.html (actual file):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>CustomGPT Widget Example</title>
    <link rel="stylesheet" href="../dist/widget/customgpt-widget.css">
</head>
<body>
    <div class="container">
        <h1>CustomGPT Widget Integration</h1>
        <p>This example shows how to embed the CustomGPT widget in your website.</p>
        
        <!-- Widget container -->
        <div id="customgpt-widget-container" style="height: 600px; border: 1px solid #ddd;"></div>
    </div>

    <!-- Load widget script -->
    <script src="../dist/widget/customgpt-widget.js"></script>
    
    <script>
        // Initialize widget with real configuration
        if (typeof CustomGPTWidget !== 'undefined') {
            const widget = CustomGPTWidget.init({
                agentId: 'your-agent-id-here',
                containerId: 'customgpt-widget-container',
                mode: 'embedded',
                theme: 'light',
                enableCitations: true,
                enableFeedback: true,
                
                // Event handlers
                onReady: () => {
                    console.log('Widget is ready');
                },
                onMessage: (data) => {
                    console.log('New message:', data);
                },
                onError: (error) => {
                    console.error('Widget error:', error);
                }
            });
        }
    </script>
</body>
</html>

Iframe Integration

From examples/iframe-embed-example.html:

<!DOCTYPE html>
<html>
<head>
    <title>CustomGPT Iframe Integration</title>
    <style>
        .chat-iframe {
            width: 100%;
            height: 600px;
            border: 1px solid #ddd;
            border-radius: 8px;
        }
    </style>
</head>
<body>
    <h1>CustomGPT Iframe Integration</h1>
    
    <!-- Direct iframe embedding -->
    <iframe 
        src="https://starterkit.customgpt.ai/widget/"
        class="chat-iframe"
        frameborder="0"
        allow="microphone">
    </iframe>
    
    <script>
        // Listen for messages from iframe
        window.addEventListener('message', function(event) {
            if (event.origin !== 'https://starterkit.customgpt.ai') return;
            
            console.log('Received message from widget:', event.data);
            
            // Handle different message types
            switch (event.data.type) {
                case 'WIDGET_READY':
                    console.log('Widget is ready');
                    break;
                case 'MESSAGE_SENT':
                    console.log('User sent message:', event.data.message);
                    break;
                case 'RESPONSE_RECEIVED':
                    console.log('Bot responded:', event.data.response);
                    break;
            }
        });
    </script>
</body>
</html>

Production Deployment Configurations

Docker Production Setup

The starter kit includes comprehensive Docker support. From the actual Dockerfile:

# Multi-stage Dockerfile from the starter kit
FROM node:18-alpine AS base
WORKDIR /app

# Install dependencies needed for both dev and prod
RUN apk add --no-cache libc6-compat curl

# Copy package files
COPY package*.json ./
COPY pnpm-lock.yaml* ./

# Install dependencies
RUN npm ci --only=production --omit=dev

# ============================================
# Stage 2: Builder - All Assets
# ============================================
FROM node:18-alpine AS builder
WORKDIR /app

# Copy package files
COPY package*.json ./
COPY pnpm-lock.yaml* ./

# Install all dependencies (including dev)
RUN npm ci

# Copy source code
COPY . .

# Build application
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

RUN npm run build:all

# ============================================
# Stage 3: Standalone App Runner
# ============================================
FROM node:18-alpine AS standalone
WORKDIR /app

ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

# Copy built application
COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static

USER nextjs

EXPOSE 3000

ENV PORT 3000
ENV HOSTNAME 0.0.0.0

CMD ["node", "server.js"]

Docker Compose Configuration

From docker-compose.yml (actual configuration):

# Production Docker Compose from starter kit
version: '3.8'

services:
  # Main Application (Full Next.js App)
  app:
    build:
      context: .
      dockerfile: Dockerfile
      target: standalone
    container_name: customgpt-ui-app
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - PORT=3000
      - HOSTNAME=0.0.0.0
      - CUSTOMGPT_API_KEY=${CUSTOMGPT_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - CUSTOMGPT_API_BASE_URL=${CUSTOMGPT_API_BASE_URL:-https://app.customgpt.ai/api/v1}
      - ALLOWED_ORIGINS=${ALLOWED_ORIGINS:-*}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    networks:
      - customgpt-network
    labels:
      - "com.customgpt.service=main-app"

  # Widget Only (Static Bundle)
  widget:
    build:
      context: .
      dockerfile: Dockerfile
      target: widget
    container_name: customgpt-ui-widget
    ports:
      - "8080:80"
    restart: unless-stopped
    profiles:
      - widget
      - all

networks:
  customgpt-network:
    driver: bridge

Real Performance Monitoring

Built-in Analytics

The starter kit includes actual performance monitoring. From src/lib/analytics/tracker.ts:

// Real analytics implementation from the starter kit
export class AnalyticsTracker {
  private events: AnalyticsEvent[] = [];
  
  track(event: AnalyticsEvent) {
    this.events.push({
      ...event,
      timestamp: Date.now(),
      sessionId: this.getSessionId(),
      userId: this.getUserId()
    });
    
    // Send to analytics service if configured
    if (process.env.NEXT_PUBLIC_ANALYTICS_ENDPOINT) {
      this.sendEvent(event);
    }
    
    // Log in development
    if (process.env.NODE_ENV === 'development') {
      console.log('Analytics Event:', event);
    }
  }
  
  trackConversationStart(agentId: string) {
    this.track({
      type: 'conversation_start',
      properties: { agentId }
    });
  }
  
  trackMessageSent(messageLength: number, agentId: string) {
    this.track({
      type: 'message_sent',
      properties: { messageLength, agentId }
    });
  }
  
  trackResponseReceived(responseTime: number, citationsCount: number) {
    this.track({
      type: 'response_received',
      properties: { responseTime, citationsCount }
    });
  }
  
  private async sendEvent(event: AnalyticsEvent) {
    try {
      await fetch(process.env.NEXT_PUBLIC_ANALYTICS_ENDPOINT!, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(event)
      });
    } catch (error) {
      console.error('Failed to send analytics event:', error);
    }
  }
}

Security Implementation Patterns

API Key Validation

From src/lib/api/key-validation.ts (actual security implementation):

// Real security validation from the starter kit
export async function validateCustomGPTKey(): Promise<KeyValidationResult> {
  const apiKey = process.env.CUSTOMGPT_API_KEY;
  
  if (!apiKey) {
    return {
      isValid: false,
      error: 'CustomGPT API key not found in environment variables'
    };
  }

  try {
    const response = await fetch('https://app.customgpt.ai/api/v1/projects', {
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'User-Agent': 'CustomGPT-Starter-Kit/1.0'
      }
    });

    if (response.status === 401) {
      return {
        isValid: false,
        error: 'Invalid CustomGPT API key. Please check your CUSTOMGPT_API_KEY.'
      };
    }

    if (response.status === 429) {
      return {
        isValid: false,
        error: 'Rate limited. Please check your API usage.'
      };
    }

    if (!response.ok) {
      return {
        isValid: false,
        error: `API validation failed: ${response.status}`
      };
    }

    return { isValid: true };
  } catch (error) {
    return {
      isValid: false,
      error: `Network error during API validation: ${error.message}`
    };
  }
}

Working Examples from the Cookbook

The CustomGPT Cookbook provides real API usage examples:

Project Management

Conversation Handling

Analytics and Monitoring

Production Deployment Guide

Vercel Deployment

The starter kit includes one-click Vercel deployment:

Manual Setup Steps

# Clone the production-ready starter kit
git clone https://github.com/Poll-The-People/customgpt-starter-kit.git
cd customgpt-starter-kit

# Install dependencies
npm install

# Configure environment
cp .env.example .env.local
# Edit .env.local with your API keys

# Build for production
npm run build

# Start production server
npm start

Environment Variable Setup

For production deployment, configure these variables in your deployment platform:

CUSTOMGPT_API_KEY=your_customgpt_api_key_here
OPENAI_API_KEY=your_openai_key_here  # Optional, for voice features
ALLOWED_ORIGINS=https://yourdomain.com

Testing the Implementation

Health Check Endpoints

The starter kit includes production health checks. From src/app/api/health/route.ts:

// Real health check implementation
export async function GET() {
  const checks = {
    timestamp: new Date().toISOString(),
    status: 'ok',
    services: {
      database: 'ok',
      customgpt_api: 'checking...',
      redis: 'ok'
    }
  };

  try {
    // Validate CustomGPT API connection
    const apiCheck = await validateCustomGPTKey();
    checks.services.customgpt_api = apiCheck.isValid ? 'ok' : 'error';
    
    const overallStatus = Object.values(checks.services).every(s => s === 'ok') ? 'ok' : 'error';
    
    return Response.json(checks, {
      status: overallStatus === 'ok' ? 200 : 503
    });
  } catch (error) {
    return Response.json({
      ...checks,
      status: 'error',
      error: error.message
    }, { status: 503 });
  }
}

Learning from the Live Demo

You can explore the complete implementation at https://starterkit.customgpt.ai/:

Test the embedded chat interface
Try the voice features (requires microphone permission)
Examine the responsive design across devices
See real conversation management in action
Test citation functionality with actual responses

The demo runs the same code available in the repository, so you can see exactly how production patterns perform with real users.

Frequently Asked Questions

Where can I see this implementation running in production?

The complete implementation is live at https://starterkit.customgpt.ai/. This isn’t a demo—it’s the actual starter kit code running in production, serving real users and demonstrating all the patterns discussed in this guide.

How do I know these patterns work at scale?

The CustomGPT Starter Kit is used by multiple production applications serving thousands of users. You can examine the codebase, see the deployment configurations, and test the live implementation to validate the patterns yourself.

What’s the difference between using the starter kit versus building from scratch?

The starter kit provides 6+ months of production-hardened development work immediately. It includes security patterns, performance optimizations, multiple deployment modes, comprehensive error handling, and testing configurations that most teams don’t think to implement initially. All code is production-tested rather than theoretical.

Can I customize the starter kit for my specific requirements?

Absolutely. The starter kit is designed for customization with modular components, comprehensive documentation, and clear separation of concerns. Check the repository’s README.md and CLAUDE.md files for detailed customization guides and architectural explanations.

How do I validate that these security patterns are sufficient for my use case?

The starter kit implements industry-standard security patterns: server-side API key management, CORS controls, input validation, and audit logging. For specific compliance requirements (HIPAA, SOC 2, etc.), you can extend these patterns. CustomGPT itself maintains SOC 2 Type II certification.

What production monitoring should I implement beyond what’s included?

The starter kit includes basic performance tracking and health checks. For production scale, consider adding: application performance monitoring (APM), error tracking (Sentry), log aggregation (ELK stack), and infrastructure monitoring (Prometheus/Grafana). The codebase includes hooks for these integrations.

How do I handle scaling beyond what the starter kit supports?

The starter kit architecture supports horizontal scaling through container orchestration (Kubernetes), load balancing, and CDN integration. The Docker configurations and API proxy patterns are designed for multi-instance deployment. CustomGPT’s infrastructure handles backend scaling automatically.

Are there working examples for specific integration scenarios?

Yes, the examples/ directory contains working implementations for iframe embedding, widget integration, React components, and floating chat buttons. Each example is tested and includes complete HTML/JavaScript/React code you can use immediately.

How do I contribute improvements to the starter kit?

The starter kit repository accepts contributions through standard GitHub workflows. Common contributions include new deployment configurations, UI improvements, integration examples, and performance optimizations.

What’s the relationship between the starter kit and CustomGPT’s official documentation?

The starter kit demonstrates practical implementation of CustomGPT’s API documentation. While the docs explain API capabilities, the starter kit shows how to build complete applications using those APIs with proper architecture, security, and deployment patterns.

For more RAG API related information:

CustomGPT.ai’s open-source UI starter kit (custom chat screens, embeddable chat window and floating chatbot on website) with 9 social AI integration bots and its related setup tutorials.
Find our API sample usage code snippets here.
Our RAG API’s Postman hosted collection – test the APIs on postman with just 1 click.
Our Developer API documentation.
API explainer videos on YouTube and a dev focused playlist.
Join our bi-weekly developer office hours and our past recordings of the Dev Office Hours.

P.s – Our API endpoints are OpenAI compatible, just replace the API key and endpoint and any OpenAI compatible project works with your RAG data. Find more here.

Wanna try to do something with our Hosted MCPs? Check out the docs for the same.

Priyansh Khodiyar

Priyansh is Developer Relations Advocate who loves technology, writer about them, creates deeply researched content about them.

Build a Custom GPT for your business, in minutes.

Deliver exceptional customer experiences and maximize employee efficiency with custom AI agents.

Trusted by thousands of organizations worldwide

3x productivity.
Cut costs in half.

Launch a custom AI agent in minutes.

Instantly access all your data.

Automate customer service.

Streamline employee training.

Accelerate research.

Gain customer insights.

Try 100% free. Cancel anytime.

CustomGPT.ai Blog

How to Implement RAG API in Production: Complete Developer Walkthrough

TL;DR

Production Architecture: Real Implementation Patterns

Real Environment Configuration

Production Chat Implementation

Widget Deployment Patterns

Production Deployment Configurations

Real Performance Monitoring

Security Implementation Patterns

Working Examples from the Cookbook

Production Deployment Guide

Testing the Implementation

Learning from the Live Demo

Frequently Asked Questions

Where can I see this implementation running in production?

How do I know these patterns work at scale?

What’s the difference between using the starter kit versus building from scratch?

Can I customize the starter kit for my specific requirements?

How do I validate that these security patterns are sufficient for my use case?

What production monitoring should I implement beyond what’s included?

How do I handle scaling beyond what the starter kit supports?

Are there working examples for specific integration scenarios?

How do I contribute improvements to the starter kit?

What’s the relationship between the starter kit and CustomGPT’s official documentation?

For more RAG API related information:

Build a Custom GPT for your business, in minutes.

Related posts

RAG Reranking Techniques: Improving Search Relevance in Production

RAG Chunking Strategies: Optimizing Document Processing for Better Retrieval

RAG Vector Database Selection: Pinecone vs Weaviate vs ChromaDB for Developers

RAG Evaluation Metrics: How to Measure and Improve Your RAG System

Leave a reply Cancel reply

3x productivity. Cut costs in half.

Launch a custom AI agent in minutes.

Product

Use cases

Compare

Company

Resources

Dev Resources

Pricing

3x productivity.
Cut costs in half.