Skip to main content
Marcus Pichler
  • About
  • Experience
  • Skills
  • Homelab
  • Projects
  • Blog
  • Contact

Local AI

Self-hosted AI platform running on Kubernetes (K3s) with an NVIDIA RTX 3060. Powers a RAG chatbot for my portfolio website and serves as an AI experimentation playground.

Stack — vLLM serves Gemma 3 4B with OpenAI-compatible API. LangGraph orchestrates the chat flow (intent classification → retrieval → generation → validation). Qdrant stores vector embeddings, TEI generates them at inference time. Langflow handles automated content indexing via webhook.

Observability & Security — Prometheus + Grafana for GPU stats and request metrics, LangSmith for LLM tracing, DCGM Exporter for GPU telemetry. Keycloak provides SSO/OIDC for all AI services. The chatbot implements OWASP-aligned security: prompt injection detection, rate limiting, CSRF protection.

Marcus Pichler
  • Legal Notice
  • Privacy Policy

© 2026 Marcus Pichler. All rights reserved.

Can I help you?

Privacy Notice

This AI assistant processes your inputs to answer your questions. No personal data is stored. Details in the Privacy Policy.

Portfolio Assistant

Online

Processing
0/500

Press Enter to send, Shift+Enter for a new line. Ctrl+K to toggle chat. Ctrl+E to expand or collapse.

Shift+Enter for new line | Ctrl+E expand | Ctrl+K toggle

idle --% --/-- -- --° --W
LLM • vLLM • self-hosted • RTX
AI-generated · no guarantee Privacy Policy