Gateway Layer
AI Gateway (Orchestration Layer)
The AI Gateway acts as the central brain of the iG3 Edge Network. It manages all device interactions, workload assignments, and dynamic routing. Every edge device connects through this gateway to receive tasks, report results, and maintain secure identity verification.
Key Responsibilities:
Device Registration & DID Verification: Each device is authenticated using Decentralized Identifiers (DID) on the peaq network.
Workload Distribution: Distributes AI inference tasks to the most suitable nodes, preferring edge devices first and falling back to the cloud if needed.
Task Queueing & Retry: Ensures fault tolerance with automatic re-queuing of failed jobs.
Topology-Aware Load Balancing: Routes tasks based on proximity, device capability, and current load to optimize performance.
Edge Model Discovery: Detects which models are available on which edge nodes for efficient dispatching.
Tech Stack:
Kubernetes for service orchestration and mesh networking.
Kafka as an event bus for task scheduling and status updates.
gRPC APIs for high performance communication between services.
Pub/Sub System to stream results and updates back to users and dashboards.
LLM Gateway
The LLM Gateway specializes in handling natural language tasks and large model interactions. It abstracts away model complexity and provides a streamlined interface for edge users to interact with LLMs in real time whether from a desktop client or a Telegram bot.
Key Responsibilities:
Token Based Authentication: Secures access using wallet signatures or DID verification.
Text Generation & Chat Handling: Receives prompts and routes them to the most appropriate model.
Model Sharding & Orchestration: Supports parallelism across multiple GPUs or devices for scalability.
Local Edge or Cloud Inference: Automatically selects between lightweight edge LLMs or cloud-hosted large models (e.g., H100-backed) depending on complexity and urgency.
Tech Stack:
LiteLLM for API compatibility with major LLMs and simplified routing.
vLLM / NVIDIA Triton / Text generation webui for efficient model hosting and scaling.
Faiss / Qdrant to support retrieval-augmented generation (RAG) pipelines using vector search.
Redis for caching frequent responses and reducing redundant compute.
Billing & Metering Hooks to track usage and apply token-based pricing models.
Last updated