Local AI

Self-hosted AI platform running on Kubernetes (K3s) with an NVIDIA RTX 3060. Powers a RAG chatbot for my portfolio website and serves as an AI experimentation playground.

Stack — vLLM serves Gemma 3 4B with OpenAI-compatible API. LangGraph orchestrates the chat flow (intent classification → retrieval → generation → validation). Qdrant stores vector embeddings, TEI generates them at inference time. Langflow handles automated content indexing via webhook.

Observability & Security — Prometheus + Grafana for GPU stats and request metrics, LangSmith for LLM tracing, DCGM Exporter for GPU telemetry. Keycloak provides SSO/OIDC for all AI services. The chatbot implements OWASP-aligned security: prompt injection detection, rate limiting, CSRF protection.