Sovereign On-Premises AI Platform for Real Estate

Project Details

Customer: Immobilien Treuhand Software (ITS)
Service: Kubernetes Engineering, Platform Engineering, AI Platform, GitOps
Technologies: Rancher RKE2, NVIDIA H100 + GPU Operator, MIG, KubeAI, vLLM, Envoy Gateway, Gateway API, Rook/Ceph, Keycloak (Azure AD IdP), External Secrets Operator, Azure Key Vault, GitLab, ArgoCD, Prometheus, Grafana, Hyper-V
Timespan: 2025 – ongoing
Sovereign On-Premises AI Platform for Real Estate

Sovereign On-Premises AI Platform for Real Estate

Challenge

Immobilien Treuhand Software (ITS) wanted the productivity and quality of modern AI assistants - ChatGPT-class Large Language Models - without sending sensitive customer, real-estate and accounting data to OpenAI, Azure OpenAI or any other third-party provider. Data sovereignty, GDPR compliance and data residency had to be guaranteed by design, while costs needed to stay predictable and the platform had to integrate cleanly with the existing Microsoft Azure AD and on-premises Hyper-V landscape.

The first iteration - two Ollama instances on Kubernetes with one full GPU each - quickly showed the limits of the initial setup: users queued and waited, only a single model could run at a time, no autoscaling existed and the cluster was not capable of serving more than ~20 concurrent users. A production-grade, multi-tenant AI platform was needed.

Solution

Together with ITS, WhizUs designed and implemented a self-hosted, GitOps-driven Hybrid AI Platform on Kubernetes - combining bare-metal NVIDIA H100 GPUs in baremetal server with virtualized control planes and workers, all managed through Rancher and ArgoCD:

  • Hybrid Rancher RKE2 cluster: 3 control-plane and 3 worker VMs on Hyper-V, plus 2 bare-metal HP ProLiant DL380 Gen11 GPU workers with NVIDIA H100 96GB and 100GbE networking - no virtualization overhead for GPU workloads.
  • NVIDIA GPU Operator: Automated driver lifecycle, RKE2 containerd integration and RuntimeClass: nvidia. MIG partitioning with mixed profiles (1g.12gb / 2g.24gb / 3g.47gb) lets a single H100 host multiple isolated models with hardware-level isolation.
  • Self-hosted GitLab Enterprise Edition on the cluster: Crunchy PGO PostgreSQL, Redis HA, Praefect/Gitaly cluster, Container Registry with HPA, GitLab Runner with the Kubernetes executor, GitLab Pages - all object storage, artifacts and backups backed by Rook/Ceph.
  • Rook/Ceph storage: Unified storage layer providing block (RWO) for databases, CephFS (RWX) for shared model caches, and S3-compatible object storage for GitLab artifacts, LFS, packages, registry and backups.
  • Rancher Manager as the multi-cluster control plane - a single pane of glass for cluster lifecycle, RBAC, project isolation and day-2 operations across the on-prem and burst clusters.
  • Prometheus, Grafana and Alertmanager (kube-prometheus-stack) for cluster-wide metrics, dashboards and alerting - including dedicated NVIDIA DCGM dashboards for live GPU temperature, power draw and utilization across all H100 workers, plus centralized logging with Fluentd/Fluentbit.
  • Keycloak as identity broker (deployed via the EDP Keycloak Operator) federated with Azure AD as the upstream Identity Provider via SAML/OIDC. One single sign-on login spans Rancher, GitLab, Open WebUI, Grafana - no more local user silos.
  • External Secrets Operator with Azure Key Vault: a central ClusterSecretStore federates Kubernetes Secrets to Azure Key Vault - TLS certificates, GitLab and Keycloak credentials, HuggingFace tokens and other sensitive material live in Azure Key Vault and are projected into the cluster on a short refresh interval. No long-lived secrets in Git, no ad-hoc kubectl edits.
  • Migration from NGINX Ingress to Envoy Gateway on the Kubernetes Gateway API: dual gateways (internal and external), centralized TLS termination at the Gateway, HTTPRoute-based routing for 14+ services, native CRDs (SecurityPolicy, ClientTrafficPolicy, BackendTrafficPolicy) instead of brittle NGINX annotations.
  • KubeAI + vLLM as the AI inference engine: an OpenAI-compatible API in front of multiple models (Llama 3.1, Qwen 2.5, Mistral, Gemma, …), prefix-aware load balancing, request queueing, scale-to-zero by default and "Very Important Models" pinned to one replica for low-latency responses.
  • End-to-end GitOps with ArgoCD: ApplicationSets and sync waves (CRDs → operators → gateway → models). Every cluster change - infrastructure, apps and AI models - is a Git commit, fully auditable and reproducible.
Impact

Sensitive real-estate and customer data never leaves ITS infrastructure - GDPR and data residency guaranteed by design.

From a single-model, queueing Ollama setup to multi-model concurrent inference with autoscaling and MIG-based GPU sharing on the same hardware.

Standards-based, vendor-neutral ingress on the Kubernetes Gateway API, replacing brittle NGINX annotation-driven configuration.

A clean, GitOps-managed AI platform that is reproducible, observable and ready to scale with future models, customers and use cases.

Testimonials

"As a developer on the ITS side who deploys on this platform every day, I can assess the implementation first-hand. The clean GitOps structure with ArgoCD including sync waves, the MIG partitioning of the H100 nodes, and the migration from NGINX Ingress to Envoy Gateway have made operations noticeably more maintainable. The move from the initial Ollama setup to KubeAI/vLLM with multi-model inferencing was a real turning point for us. Thanks to the WhizUs team for the thorough and professional work."
Soheil Mahvi
Soheil Mahvi

Developer

Immobilien Treuhand Software (ITS)