Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot pui...
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
Layered Architecture for Gen AI Infrastructure
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
1. Hardware & Cloud:
Infrastructure
2. Model Foundation: LLM & RAG usage
3. Integration, Orchestration &
Deployment tooling
4. Gen AI Applications
9
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
Operating System Installation
Install the OS
Install the OS: Install IG1 AI OS, a specially designed operating system tailored for AI services, leveraging
our deep expertise and capability in managing "plug and play" platforms for AI.
Purpose: Provides the underlying operating system for all software and services.
Update the System
Run system updates to ensure all packages are up to date.
Purpose: Ensures the system has the latest security patches and features.
GPU Drivers and CUDA Installation
NVidia Drivers
Install the latest NVidia drivers for the GPUs.
Purpose: Enables the operating system to communicate with the GPUs.
CUDA Toolkit
"CUDA toolkit" is embedded in IG1 OS.
Purpose: Provides the necessary libraries and tools for developing and running GPU-accelerated applications.
Base System
11
Layered Architecture for Gen AI Infrastructure / Layer 01: Hardware & Cloud
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
KUBE by IG1 for AI
Installation and Configuration
Install KUBE by IG1
Follow the installation guide for KUBE by IG1 to set up the virtualization layer.
Purpose: Provides a platform for managing virtual machines and containers.
Configure Networking
Set up networking within KUBE to ensure communication between nodes and external access.
Purpose: Ensures seamless communication and data transfer within the cluster and with external clients.
Cluster Installation
Initialize KUBE Cluster
Initialize the KUBE cluster to create a control plane and add worker nodes.
Purpose: Establishes the core infrastructure for managing containerized applications.
Verify Cluster Health
Check the health and status of the KUBE cluster to ensure all components are functioning correctly.
Purpose: Identifies and resolves any issues before proceeding with further setup.
12
Download LLM
Obtain the LLM from the appropriate source.
Purpose: Provides the base AI model for various applications.
LLM Optimization
Optimization consists in optimising resource usage by preparing and enhancing LLMs through a process
called quantization. Quantization increases inference performance without significantly compromising
accuracy. Our quantization management services utilize the AWQ project, which provides excellent
performance in terms of speed and accuracy.
LLMs Inference servers
Similar to database engines, LLMs inference servers run LLMs for inference or embedding. IG1 installs
and manages all the necessary services for the proper functioning of LLM models. For this, we rely on
several instances of:
- VLLM, ideal for models without quantization FP16,
- Nvidia Triton Inference server, for optimized models with Nvidia TensorRT-LLM
- TGI (Text Generation Inference) for Hugging Face models
Plug n Play Gen AI Platform
Layered Architecture for Gen AI Infrastructure / Layer 02: Model Foundation
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
13
Integrate RAG Components
Set up the necessary RAG components (example using the LlamaIndex framework):
- Retriever: Finds the most relevant information from the data.
- Generator: Uses the retrieved information to generate accurate responses.
- Embedding: Transforms data into vector representations to improve retrieval accuracy.
- Reranking: Organizes and prioritizes the retrieved results based on relevance.
Purpose: Enhances the LLM with retrieval-augmented capabilities for more accurate and relevant responses.
Deploy RAG Pipeline
Deploy the RAG pipeline within the KUBE environment.
Purpose: Ensures the RAG system is operational and integrated with the LLM model.
Plug n Play Gen AI Platform
Layered Architecture for Gen AI Infrastructure / Layer 02: Model Foundation
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
14
Integrate various AI services seamlessly to ensure efficient communication and operation.
This includes:
API Integrations
Connect your AI models to various APIs for extended functionalities, including data retrieval, processing,
and user interface interactions.
Data Pipelines
Establish data pipelines to ensure smooth data flow between different components, facilitating real-time
data processing and analysis.
The API Core acts as a Proxy LLM, balancing the load between LLMs inference server instances.
LiteLLM, deployed in High Availability, is used for this purpose. It offers wide support for LLM servers,
robustness, and usage information and API key storage through PostgreSQL. LiteLLM also enables
synchronization between different instances and sends LLM usage information to our
observability tools.
Integration AI
Services
Plug n Play Gen AI Platform
Layered Architecture for Gen AI Infrastructure / Layer 03: Integration, Orchestration & Deployment Tooling
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
15
Implement observability tools to gain insights into the behavior and performance of your AI applications:
Centralized Logging
Aggregate logs from different services and applications in a central location for easier analysis and
troubleshooting
Metrics Collection
Collect metrics on various aspects of your applications' performance, such as response times, error rates,
and resource usage.
Distributed Tracing
Use distributed tracing to track requests as they flow through different services, helping to identify
bottlenecks and optimize performance..
The LLMs observability layer collects usage data and execution traces, ensuring proper LLM
management. IG1 efficiently manages LLM usage through a monitoring stack connected to the
LLM orchestrator. Lago and OpenMeter collect information, which is then transmitted to our
central observability system, Sismology.
Plug n Play Gen AI Platform
Layered Architecture for Gen AI Infrastructure / Layer 03: Integration, Orchestration & Deployment Tooling
PLUG N PLAY
GEN AI PLATFORM
Layered Architecture
Concept: Gen AI
16
Benefit from our expert-led AI services that deliver
tailored solutions, from infrastructure design to
ongoing support, ensuring seamless integration
and immediate usability. With advanced security,
data integrity oversight, and personalized interfaces,
our services enhance your operations with
integrated AI tools for efficient DevOps, MLOps, and
AIOps, supporting scalable and effective AI
management.
Hosting
Elevate your AI applications with our cutting-edge
hardware and cloud infrastructure, featuring
NVIDIA GPU-equipped servers, optimized
Linux-based IG1 AI OS, and KUBE by IG1 for
efficient virtual machine and container management.
Our comprehensive solutions cover everything from
initial server setup to seamless deployment,
ensuring exceptional performance and flexibility for
your AI workloads.
Software
Enhance your AI projects with our all-inclusive
software solutions designed for deploying and
managing Large Language Models (LLMs) and
Retrieval-Augmented Generation (RAG)
systems. Our packages offer powerful tools for API
integration, data pipeline management,
containerized deployment, and comprehensive
observability, ensuring smooth operations and
insightful performance metrics for optimal resource
management.
Pricing
Our Offers, starting July 2024
+ +
24
PLUG N PLAY
GEN AI PLATFORM
Our Offers