Provides DNS resolution for the retail chat domain (e.g., chat.retailbrand.my) with latency-based routing and health checks for failover between AZs.
Route 53 resolves the chat domain to CloudFront distribution endpoints. Health checks monitor ALB and API Gateway endpoints, automatically removing unhealthy targets. Supports geolocation routing for Malaysia-specific traffic.
Hosted zone with alias records to CloudFront. Health checks on /health endpoint every 10s. DNSSEC enabled. TTL: 60s for A records.
| Field | Value | Unit | Description |
|---|---|---|---|
| Route 53 | |||
| Hosted Zones | 1 | zone | Single hosted zone for chat.retailbrand.my domain |
| DNS Queries | 10 | M/month | Estimated monthly DNS lookups from all chat channels |
| Health Checks | 4 | endpoints | ALB, API GW, CloudFront origin, and failover endpoint |
Accelerates static chat widget assets (JS, CSS, images) and provides WebSocket support for real-time chat delivery to Malaysian end-users with edge locations in Kuala Lumpur.
CloudFront serves the chat widget bundle from S3 origin with compression enabled. WebSocket connections are passed through to API Gateway. Origin shield enabled in ap-southeast-5 to reduce origin load. Custom error pages for graceful degradation.
Price class: Asia Pacific. Origin shield: ap-southeast-5. Cache policy: Managed-CachingOptimized for static. WebSocket protocol enabled. TLS 1.3 enforced. Custom domain with ACM certificate.
| Field | Value | Unit | Description |
|---|---|---|---|
| CloudFront | |||
| Data Transfer Out | 500 | GB/month | Chat widget assets + cached responses served to Malaysian users |
| Requests | 50 | M/month | HTTP/HTTPS requests including WebSocket upgrades and static assets |
| WebSocket Connections | 100K | concurrent | Peak simultaneous real-time chat sessions via WebSocket |
Protects the chat API from SQL injection, XSS, bot attacks, and rate-limits abusive users. Essential for public-facing retail chat endpoints.
WAF is attached to CloudFront and API Gateway. Rules include: AWS Managed Rules (Core, SQL, Known Bad Inputs), rate limiting (2000 req/5min per IP), geographic restriction (block non-ASEAN traffic optionally), and custom rules for chat-specific abuse patterns like prompt injection attempts.
Web ACL with 6 rule groups: AWSManagedRulesCommonRuleSet, AWSManagedRulesSQLiRuleSet, AWSManagedRulesKnownBadInputsRuleSet, Bot Control, Rate Limit (2000/5min), Custom prompt injection detection.
| Field | Value | Unit | Description |
|---|---|---|---|
| WAF | |||
| Web ACLs | 2 | ACLs | One for CloudFront distribution, one for API Gateway |
| Rules | 12 | rules | 6 AWS Managed + 3 rate-limit + 3 custom prompt injection rules |
| Requests Inspected | 50 | M/month | All chat API and CDN requests pass through WAF inspection |
Provides the unified entry point for all chat channels — REST API for WhatsApp webhooks and WebSocket API for real-time web/mobile chat. Handles authentication, throttling, and request routing.
Two API stages: (1) WebSocket API for persistent chat connections from web widget and mobile SDK, handling connect/disconnect/message routes. (2) REST API for WhatsApp webhook verification and message ingestion. Both integrate with VPC Link to reach internal ALB. API keys and usage plans enforce per-client rate limits.
WebSocket API with $connect, $disconnect, $default routes. REST API with /webhook/whatsapp endpoint. VPC Link to private ALB. Throttle: 10,000 req/s burst, 5,000 sustained. CloudWatch access logging enabled.
| Field | Value | Unit | Description |
|---|---|---|---|
| API Gateway | |||
| WebSocket Connections | 100K | concurrent | Persistent chat connections from web widget and mobile SDK |
| REST API Calls | 10 | M/month | WhatsApp webhooks + admin API + health check endpoints |
| Data Transfer | 100 | GB/month | Chat message payloads + product card rich media responses |
Manages retail customer authentication for the chat interface — social login (Google, Facebook), email/password, and guest sessions. Provides JWT tokens for API Gateway authorization.
Cognito User Pool stores customer profiles with custom attributes (loyalty tier, preferred language, store location). Identity Pool enables fine-grained IAM access for mobile SDK. Triggers Lambda for post-authentication flows (sync profile to Aurora, update OpenViking context).
User Pool with email + social providers (Google, Facebook). MFA optional. Custom domain: auth.retailbrand.my. Token validity: access 1hr, refresh 30 days. Advanced security: adaptive authentication enabled.
| Field | Value | Unit | Description |
|---|---|---|---|
| Cognito | |||
| Monthly Active Users | 500K | MAUs | Unique retail customers authenticating to chat per month |
| Advanced Security | 500K | MAUs | Adaptive authentication, compromised credential detection |
| Social Federation | 3 | providers | Google, Facebook, Apple ID — common for Malaysian retail users |
Distributes incoming chat traffic across NemoClaw and OpenViking pods running on EKS across both AZs. Performs health checks and TLS termination for internal service-to-service communication.
Internal ALB with target groups for: (1) NemoClaw agent pods on port 8080, (2) OpenViking context API on port 9090, (3) Channel adapter service on port 8000. Path-based routing: /api/chat → NemoClaw, /api/context → OpenViking. Sticky sessions enabled for WebSocket connection affinity.
Internal ALB, not internet-facing. Target groups with health check /health every 15s. Deregistration delay: 30s. Cross-zone load balancing enabled. Access logs to S3.
| Field | Value | Unit | Description |
|---|---|---|---|
| ALB | |||
| LCU Hours | 720 | hours/month | Load balancer capacity units — scales with active connections and throughput |
| New Connections | 100K | /second | Peak new TCP connections during sale events or flash promotions |
| Processed Bytes | 500 | GB/month | Chat request/response payloads including product media |
NVIDIA NemoClaw is the core AI agent orchestration platform. It runs on EKS with OpenShell runtime, managing the full agent stack: secure sandboxing, inference routing to Bedrock, safety guardrails, and multi-step task execution for retail chat workflows.
NemoClaw deploys as a Kubernetes deployment on EKS Fargate profiles. The nemoclaw CLI orchestrates: (1) OpenShell gateway for secure agent execution, (2) Sandbox environment isolating each chat session, (3) Inference provider routing prompts to Amazon Bedrock, (4) Network policies restricting agent access. Each retail chat request spawns an agent that loads context from OpenViking, constructs a RAG-enriched prompt, calls Bedrock, and applies response safety filters before delivery.
EKS 1.29 with Fargate profiles. NemoClaw v1.0 deployed via Helm chart. 4 replicas min, 20 max (HPA on CPU 70%). Pod resources: 2 vCPU, 4Gi memory. Service mesh: AWS App Mesh for mTLS. Namespace isolation per environment.
| Field | Value | Unit | Description |
|---|---|---|---|
| EKS + Fargate | |||
| Fargate vCPU Hours | 2880 | hours/month | NemoClaw agent pods: 4 pods x 2 vCPU x 720hrs base |
| Fargate Memory | 5760 | GB-hours/month | 4 pods x 4Gi memory x 720hrs — scales with HPA to 20 pods peak |
| Pods (min/max) | 4/20 | replicas | HPA scales on CPU 70% — burst to 20 during retail peak hours |
OpenViking (by Volcengine) serves as the unified context database for the AI chat agents. It replaces fragmented vector storage with a file-system paradigm, managing agent memory (conversation history), resources (product catalogs, policies), and skills (retail-specific functions) through hierarchical L0/L1/L2 tiered context loading.
OpenViking runs as a StatefulSet on EKS with persistent volumes (EBS gp3). Its three-tier context loading: L0 (always loaded — system prompt, retail brand identity), L1 (session loaded — customer profile, active cart, recent orders), L2 (on-demand — full product catalog search, policy documents via OpenSearch vector retrieval). OpenViking's file-system paradigm organizes retail knowledge as: /memory/ (chat history), /resources/ (catalogs, FAQs), /skills/ (order lookup, inventory check). Self-evolving: learns from successful interactions to improve future context selection.
StatefulSet with 3 replicas. EBS gp3 volumes: 100Gi per pod. Context tiers: L0 (5MB system), L1 (50MB session), L2 (unlimited on-demand). File-system mount: /openviking/retail/. Backup: hourly snapshots to S3.
| Field | Value | Unit | Description |
|---|---|---|---|
| EKS + EBS | |||
| Fargate vCPU Hours | 2160 | hours/month | OpenViking StatefulSet: 3 pods x 2 vCPU x 720hrs for context serving |
| EBS gp3 Storage | 300 | GB | 100Gi per pod — stores L0/L1 context tiers and file-system index |
| EBS IOPS | 3000 | IOPS | Baseline gp3 IOPS — sufficient for context read-heavy workload |
Stores structured retail data: customer profiles, chat transcripts, order history, product catalog metadata, loyalty program data, and analytics events. Aurora PostgreSQL provides the relational backbone with read replicas for analytics queries.
Aurora PostgreSQL 15.4 with Multi-AZ deployment. Primary instance handles writes (chat history, profile updates). Read replica serves analytics dashboards and reporting queries. pgvector extension installed for hybrid vector search alongside OpenSearch. Schemas: chat_history, customer_profiles, product_catalog, analytics_events, loyalty_program.
Aurora PostgreSQL 15.4 Serverless v2. Min ACU: 2, Max ACU: 32. Storage: 200GB with auto-scaling. Multi-AZ enabled. Backup retention: 14 days. Performance Insights enabled. pgvector extension for hybrid search.
| Field | Value | Unit | Description |
|---|---|---|---|
| Aurora PostgreSQL | |||
| ACU Hours | 1440 | hours/month | Serverless v2: avg 2 ACU x 720hrs — bursts to 32 ACU peak |
| Storage | 200 | GB | Chat transcripts, customer profiles, product metadata, analytics |
| I/O Operations | 100 | M/month | Read-heavy: session lookups, profile queries, analytics writes |
| Backup Storage | 200 | GB | 14-day automated backups with point-in-time recovery |
Provides sub-millisecond caching for: (1) Active chat session state, (2) OpenViking L1 context cache (customer profiles, recent orders), (3) Rate limiting counters, (4) Bedrock response cache for frequently asked retail questions. Dramatically reduces latency and Bedrock API costs.
ElastiCache Redis 7.x cluster with 2 shards, each with 1 replica. Data structures: Hash for session state, Sorted Set for conversation ordering, String for cached Bedrock responses (TTL 1hr for product info, 24hr for policies). Pub/Sub for real-time notification delivery across chat instances.
Redis 7.x, r7g.large nodes. 2 shards, 1 replica per shard (4 nodes total). Memory: 13.07GB per node. Encryption at-rest (KMS) and in-transit (TLS). Automatic failover enabled. Backup: daily snapshot retention 7 days.
| Field | Value | Unit | Description |
|---|---|---|---|
| ElastiCache Redis | |||
| Node Type | r7g.large | Graviton3-based — optimized for memory-intensive caching workloads | |
| Nodes | 4 | nodes | 2 shards x 2 nodes (1 primary + 1 replica per shard) for HA |
| Data Transfer | 100 | GB/month | Cross-AZ replication traffic + cache read/write from EKS pods |
Enables EKS pods (NemoClaw, OpenViking) in private subnets to reach external APIs: WhatsApp Business API callbacks, NemoClaw registry updates, and OpenViking package updates — without exposing pods to inbound internet traffic.
NAT Gateway in public subnet provides outbound internet access for private subnet workloads. All egress traffic from EKS pods routes through NAT GW. VPC Flow Logs capture all traffic for security monitoring.
NAT Gateway in each AZ for HA. Elastic IP attached. VPC Flow Logs enabled (CloudWatch Logs). Data processing: ~200GB/month outbound.
| Field | Value | Unit | Description |
|---|---|---|---|
| NAT Gateway | |||
| NAT Gateway Hours | 720 | hours/month | Always-on NAT GW in each AZ for continuous outbound access |
| Data Processed | 200 | GB/month | EKS pod egress: WhatsApp API calls, NemoClaw updates, telemetry |
High-availability replica of NemoClaw agent pods in AZ2. EKS pod anti-affinity rules ensure agents are distributed across both AZs. If AZ1 fails, all traffic automatically shifts to AZ2 replicas.
Same Helm chart deployment as AZ1. Pod topology spread constraints ensure even distribution. NemoClaw agents are stateless — session state lives in ElastiCache, context in OpenViking. Seamless failover with zero chat disruption.
Mirror of AZ1 configuration. Pod anti-affinity: requiredDuringSchedulingIgnoredDuringExecution. Topology spread: maxSkew 1, topologyKey topology.kubernetes.io/zone.
High-availability replica of OpenViking context database pods in AZ2. Persistent volumes replicate across AZs via EBS snapshots. Ensures context retrieval remains available during AZ failures.
OpenViking StatefulSet replicas in AZ2 with independent EBS gp3 volumes. Cross-AZ data sync handled by OpenViking's built-in replication protocol. Read-heavy operations can be load-balanced across both AZs.
StatefulSet replicas with volumeClaimTemplates. EBS gp3 cross-AZ replication via OpenViking sync. Pod disruption budget: minAvailable 2.
Amazon Bedrock provides fully managed access to Claude Sonnet 4.6 (Anthropic) as the primary LLM for the retail chatbot. NemoClaw routes all inference requests to Bedrock, eliminating the need to self-host GPU instances while maintaining enterprise-grade security and compliance.
Bedrock invocation flow: NemoClaw constructs the prompt (system prompt + OpenViking context + customer query) → calls Bedrock InvokeModel API with Claude Sonnet 4.6 → receives streaming response → applies NemoClaw safety guardrails → delivers to customer. Provisioned throughput ensures consistent latency for peak retail hours. Guardrails for Bedrock add additional content filtering for retail-appropriate responses.
Model: anthropic.claude-sonnet-4-6-20250514. Provisioned Throughput: 2 model units. Max tokens: 4096 output. Temperature: 0.3 for factual retail responses. Guardrails: PII redaction, retail content filter, no competitor mentions. CloudWatch metrics for latency and token usage.
| Field | Value | Unit | Description |
|---|---|---|---|
| Amazon Bedrock | |||
| Model | Claude Sonnet 4.6 | Anthropic Claude Sonnet 4.6 — optimal balance of quality, speed and cost for retail chat | |
| Provisioned Throughput | 2 | model units | Guaranteed throughput for consistent low-latency during peak retail hours |
| Input Tokens | 50 | M/month | System prompt + OpenViking context + customer query per request (~2.5K tokens avg) |
| Output Tokens | 15 | M/month | Chat responses avg 750 tokens — product recommendations, order status, FAQs |
Provides hybrid vector + keyword search for OpenViking's L2 on-demand context retrieval. Indexes retail product catalogs, policy documents, FAQs, and training manuals with both dense vector embeddings and BM25 text scoring for optimal RAG retrieval quality.
OpenSearch Serverless with vector search collection. Documents are embedded using Amazon Titan Embeddings v2 (via Bedrock) and indexed with k-NN (HNSW algorithm). Hybrid search combines vector similarity (cosine) with BM25 keyword matching. OpenViking queries OpenSearch for L2 context: product descriptions, return policies, promotion details, supplier information.
OpenSearch Serverless, vector search collection. Index: retail-knowledge (2M+ documents). Embedding dimensions: 1024 (Titan v2). k-NN: HNSW, ef_search: 512. Hybrid search: 0.7 vector + 0.3 BM25 weighting. OCU: 4 search, 2 indexing.
| Field | Value | Unit | Description |
|---|---|---|---|
| OpenSearch Serverless | |||
| Search OCUs | 4 | OCUs | OpenSearch Compute Units for vector + keyword hybrid search queries |
| Indexing OCUs | 2 | OCUs | Handles real-time re-indexing from inventory/product updates |
| Storage | 50 | GB | Vector embeddings (1024-dim) + text index for 2M+ retail documents |
| Documents | 2 | M documents | Product catalog, FAQs, policies, SOPs, supplier docs, promotions |
Central repository for all retail knowledge documents: product catalogs (JSON/CSV), policy PDFs, FAQ documents, training manuals, supplier docs, promotional materials, and store location data. Also stores chat widget static assets, OpenViking snapshots, and analytics exports.
Multi-bucket strategy: (1) s3://retail-knowledge-base/ — structured product data and documents for RAG ingestion, (2) s3://chat-assets/ — CloudFront origin for widget JS/CSS, (3) s3://openviking-snapshots/ — hourly context DB backups, (4) s3://analytics-exports/ — QuickSight data. S3 Event Notifications trigger Lambda for automatic re-indexing when new documents are uploaded.
S3 Standard for active data. Intelligent-Tiering for knowledge base (access patterns vary). Versioning enabled. Server-side encryption (SSE-KMS). Lifecycle: transition to IA after 90 days, Glacier after 365 days. Cross-region replication disabled (single region).
| Field | Value | Unit | Description |
|---|---|---|---|
| Amazon S3 | |||
| Storage (Standard) | 500 | GB | Knowledge base docs, chat widget assets, OpenViking snapshots, exports |
| PUT/POST Requests | 1 | M/month | Document uploads, OpenViking hourly snapshots, analytics exports |
| GET Requests | 10 | M/month | OpenViking L2 retrieval, CloudFront origin pulls, Lambda reads |
| Data Transfer Out | 200 | GB/month | CloudFront origin fetch + cross-service data reads within region |
Central event bus for retail data integration. Receives real-time events from POS/ERP systems (inventory changes, price updates, new products) and internal events (chat completed, feedback received, knowledge base updated). Routes events to appropriate processors.
Custom event bus: retail-chat-events. Event patterns: (1) inventory.updated → Lambda for OpenSearch re-index, (2) product.created → Lambda for knowledge base update, (3) chat.completed → Lambda for analytics pipeline, (4) feedback.received → Lambda for OpenViking learning update. Schema registry for event validation.
Custom event bus with 6 rules. Schema registry enabled. Archive: 30-day retention for replay. Dead-letter queue: SQS. CloudWatch metrics for failed invocations. Event throughput: ~10K events/hour peak.
| Field | Value | Unit | Description |
|---|---|---|---|
| EventBridge | |||
| Custom Events | 5 | M/month | Inventory updates, price changes, product creation, chat events, feedback |
| Rules | 6 | rules | Pattern matching rules routing events to Lambda, SQS, and CloudWatch |
| Archive Storage | 10 | GB | 30-day event archive for replay and debugging failed processing |
Serverless compute for event-driven data processing: document ingestion, embedding generation, knowledge base indexing, analytics ETL, and WhatsApp webhook processing. Handles all async workloads without managing infrastructure.
Lambda functions: (1) document-ingester — processes new S3 uploads, extracts text (Textract for PDFs), generates embeddings via Bedrock Titan, indexes to OpenSearch. (2) whatsapp-adapter — processes WhatsApp webhook events, translates to internal chat format. (3) analytics-processor — aggregates chat metrics, writes to Aurora. (4) inventory-sync — processes EventBridge inventory events, updates OpenViking resources. (5) feedback-processor — processes customer ratings, triggers OpenViking self-evolution.
Runtime: Python 3.12. Memory: 512MB-3008MB depending on function. Timeout: 15min for document processing, 30s for webhooks. Provisioned concurrency: 50 for whatsapp-adapter. VPC-attached for Aurora/OpenSearch access. Layers: shared utilities, boto3 latest.
| Field | Value | Unit | Description |
|---|---|---|---|
| AWS Lambda | |||
| Invocations | 10 | M/month | Document ingestion + WhatsApp adapter + analytics + inventory sync + feedback |
| Duration (avg) | 2 | seconds | Varies: 30s webhooks, 2min document processing, 5s analytics |
| Memory | 1024 | MB avg | 512MB for webhooks, 3008MB for Textract/embedding generation |
| Provisioned Concurrency | 50 | units | WhatsApp adapter always warm — prevents cold start latency on webhooks |
Decouples async processing from real-time chat flow. Queues bulk document ingestion jobs, analytics events, notification delivery, and retry logic for failed Bedrock calls. Ensures no data loss during traffic spikes.
Queues: (1) document-ingestion-queue (Standard) — bulk uploads from admin team, (2) analytics-events-queue (Standard) — chat completion events for batch processing, (3) notification-queue (FIFO) — ordered delivery of WhatsApp outbound messages, (4) dead-letter-queue — failed messages for investigation. Visibility timeout tuned per consumer.
3 Standard queues + 1 FIFO queue. Message retention: 7 days. Visibility timeout: 300s (document processing), 30s (analytics). DLQ: maxReceiveCount 3. Encryption: SSE-SQS. Long polling: 20s.
| Field | Value | Unit | Description |
|---|---|---|---|
| Amazon SQS | |||
| Standard Queue Messages | 5 | M/month | Bulk document ingestion, analytics batches, retry queues |
| FIFO Queue Messages | 1 | M/month | WhatsApp outbound — must maintain message ordering per conversation |
Publishes alerts and notifications: operational alerts (high latency, error rates), business notifications (daily chat summary to managers), and customer notifications (order status updates via chat). Fan-out pattern for multi-channel delivery.
Topics: (1) ops-alerts — CloudWatch alarm notifications to PagerDuty/Slack, (2) business-reports — daily/weekly retail chat analytics summaries, (3) customer-notifications — fan-out to SQS for WhatsApp/push delivery. Filter policies route messages to correct subscribers.
3 Standard topics. Subscriptions: Lambda, SQS, Email, HTTPS. Message filtering enabled. Encryption: SSE-SNS (KMS). Delivery retry policy: 3 attempts with exponential backoff.
| Field | Value | Unit | Description |
|---|---|---|---|
| Amazon SNS | |||
| Notifications Published | 500K | /month | Ops alerts, business report digests, customer notification fan-out |
| Subscriptions | 10 | endpoints | Lambda, SQS, email, HTTPS webhook targets across 3 topics |
| Step | Source | Target | Description |
|---|---|---|---|
| 1 | End Users (Web/Mobile/WhatsApp) | Amazon Route 53 | DNS resolution — customers access chat via web, mobile, or WhatsApp |
| 2 | WhatsApp Business API | Amazon API Gateway | WhatsApp Business API sends webhooks to API Gateway |
| 3 | Amazon Route 53 | Amazon CloudFront | Route 53 routes to CloudFront CDN for static assets & WebSocket upgrade |
| 4 | Amazon CloudFront | AWS WAF | WAF inspects all incoming requests for threats and bot protection |
| 5 | AWS WAF | Amazon API Gateway | Clean traffic forwarded to API Gateway (REST + WebSocket) |
| 6 | Amazon API Gateway | Application Load Balancer | API Gateway routes chat requests to internal ALB |
| 7 | Application Load Balancer | NemoClaw on EKS | ALB distributes to NemoClaw agent framework on EKS |
| 8 | NemoClaw on EKS | OpenViking on EKS | NemoClaw queries OpenViking context DB for agent memory, skills & resources |
| 9 | OpenViking on EKS | Amazon S3 | OpenViking retrieves retail documents from S3 knowledge store |
| 10 | NemoClaw on EKS | Amazon Bedrock (Claude Sonnet 4.6) | NemoClaw sends enriched prompt to Amazon Bedrock (Claude Sonnet 4.6) |
| 11 | Amazon Bedrock | NemoClaw on EKS | Bedrock returns LLM response — NemoClaw applies safety guardrails |
| 12 | NemoClaw on EKS | Amazon ElastiCache Redis | Session state & conversation cache stored in ElastiCache Redis |
| 13 | NemoClaw on EKS | Amazon Aurora PostgreSQL | Chat history, user profiles & analytics persisted to Aurora PostgreSQL |
| 14 | POS / ERP Systems | Amazon EventBridge | POS/ERP systems push real-time inventory & pricing events |
| 15 | Amazon EventBridge | AWS Lambda | EventBridge triggers Lambda for data ingestion & transformation |
| 16 | AWS Lambda | Amazon S3 | Lambda processes and stores documents in S3 knowledge base |
| 17 | Amazon OpenSearch Serverless | OpenViking on EKS | OpenSearch provides vector search for OpenViking hybrid retrieval |
| 18 | Amazon SQS | AWS Lambda | SQS queues async tasks — bulk ingestion, analytics, notifications |
Click any section below to expand detailed architecture documentation for CTO-level review.
NVIDIA NemoClaw is an open-source AI agent platform that adds enterprise-grade privacy and security controls to OpenClaw. Announced at GTC 2026, NemoClaw simplifies running always-on autonomous AI assistants with a single command. As part of the NVIDIA Agent Toolkit, it installs the NVIDIA OpenShell runtime — a secure environment for running autonomous agents.
The NemoClaw agent is configured with retail-specific skills and tools:
Traditional RAG architectures fragment context across multiple vector stores, creating management complexity and inconsistent retrieval quality. OpenViking (by Volcengine/ByteDance) introduces a file-system paradigm that unifies all agent context — memory, resources, and skills — into a single hierarchical system. This dramatically simplifies the retail knowledge management pipeline.
OpenViking organizes retail knowledge in a familiar file-system structure:
OpenViking learns from successful interactions. When a customer rates a chat session positively, OpenViking analyzes the context selection that led to that success and adjusts its retrieval weightings. Over time, the system becomes increasingly accurate at selecting the right context for each type of retail query — whether it's a product recommendation, order tracking, or policy clarification.
The AI Smart Chat platform supports three primary channels, all converging to the same NemoClaw agent backend. This ensures consistent responses regardless of channel while respecting channel-specific UI constraints.
The retail AI chat ingests data from multiple enterprise sources to maintain a comprehensive, up-to-date knowledge base that OpenViking serves to NemoClaw agents.
The Personal Data Protection Act 2010 (PDPA) governs how personal data is collected, processed, and stored in Malaysia. This architecture implements comprehensive controls to ensure full compliance.
Based on enterprise retail deployment: ~500K monthly active customers, ~2M chat sessions/month, ~10M messages/month, average 5 messages per session. Prices are estimated for ap-southeast-5 region.