Hello friends,
Two months after launching our local AI setup – based on Ollama running on Ubuntu with a RAG (Retrieval-Augmented Generation) extension – it’s clear: our daily work with documents has significantly changed for the better.
Our Document Management System (DMS) follows a deliberately simple structure: HTML files, JSON metadata, and organized folder hierarchies. Nothing fancy or overly complex. The key difference now is the new AI integration: all contents are automatically transferred into a vector database (Chroma), where they’re indexed and stored with context. The locally installed AI can access this data – and provides clear, accurate answers.
What has changed in practice:
The DMS became a knowledge source: Instead of just storing documents, the AI “understands” their content and can respond meaningfully.
Context-based answers: The AI doesn’t just look for keywords – it intelligently searches across documents and combines relevant information.
Fully automated integration: Every new document automatically becomes part of the knowledge network – without extra effort.
High-quality responses: The answers are precise, easy to understand, and far superior to traditional full-text search.
100% local operation: No cloud, no API fees, no internet dependency – and full control over data privacy.
A surprising bonus:
What really stands out is the practical side effect: Everyone in the company – whether at the front desk, in the kitchen, or in maintenance – contributes to the AI knowledge base simply by writing into the DMS. Without even realizing it. It’s what we’d call effortless knowledge management.
For developers:
Integration is refreshingly simple. You send a text request via the local HTTP API and receive a JSON response. That’s it. Existing applications can easily tap into the system without major changes – but with a huge gain in functionality.
Conclusion:
For us, combining a local DMS with AI is far from a hype. It’s a practical, forward-thinking step – especially for small businesses that want to actively use their internal knowledge instead of just storing it.
Best regards,
Otto
Experience Report: Two Months with Our Local AI System
- Antonio Linares
- Site Admin
- Posts: 42844
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 181 times
- Been thanked: 124 times
- Contact:
Re: Experience Report: Two Months with Our Local AI System
Dear Otto,
Would you be so kind to explain us how to use Chroma and some examples ?
Congratulations!
Would you be so kind to explain us how to use Chroma and some examples ?
Congratulations!

- cnavarro
- Posts: 6661
- Joined: Wed Feb 15, 2012 8:25 pm
- Location: España
- Has thanked: 9 times
- Been thanked: 12 times
Re: Experience Report: Two Months with Our Local AI System
Dear Otto,
Reading comments like yours helps encourage us all to delve deeper into the topic of AI.
Thank you and congratulations.
Reading comments like yours helps encourage us all to delve deeper into the topic of AI.
Thank you and congratulations.
Cristobal Navarro
Hay dos tipos de personas: las que te hacen perder el tiempo y las que te hacen perder la noción del tiempo
El secreto de la felicidad no está en hacer lo que te gusta, sino en que te guste lo que haces
Hay dos tipos de personas: las que te hacen perder el tiempo y las que te hacen perder la noción del tiempo
El secreto de la felicidad no está en hacer lo que te gusta, sino en que te guste lo que haces
- leandro
- Posts: 1779
- Joined: Wed Oct 26, 2005 2:49 pm
- Location: Colombia
- Has thanked: 53 times
- Been thanked: 16 times
- Contact:
Re: Experience Report: Two Months with Our Local AI System
Que bien Otto
Que versión de oLlama estas usando? y que configuración de maquina tienes para lograr el objetivo? con la configuración de maquina que tienes, responde de manera rápida?
Gracias de antemano por la respuesta.

Que versión de oLlama estas usando? y que configuración de maquina tienes para lograr el objetivo? con la configuración de maquina que tienes, responde de manera rápida?
Gracias de antemano por la respuesta.
Saludos
LEANDRO AREVALO
Bogotá (Colombia)
https://hymlyma.com
https://hymplus.com/
leandroalfonso111@gmail.com
leandroalfonso111@hotmail.com
[ Turbo Incremental Link64 6.98 Embarcadero 7.70 ] [ FiveWin 24.09 ] [ xHarbour 64 bits) ]
LEANDRO AREVALO
Bogotá (Colombia)
https://hymlyma.com
https://hymplus.com/
leandroalfonso111@gmail.com
leandroalfonso111@hotmail.com
[ Turbo Incremental Link64 6.98 Embarcadero 7.70 ] [ FiveWin 24.09 ] [ xHarbour 64 bits) ]
- Otto
- Posts: 6430
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 48 times
- Been thanked: 15 times
- Contact:
Re: Experience Report: Two Months with Our Local AI System
Dear Antonio, dear Cristobal,
thank you for the kind words.
It's not so easy for me – I'm a beginner – to explain this here, as I don't fully understand it myself yet.
But I have documented all the points.
I think, in principle, this one Python function is sufficient.
Best regards,
Otto

thank you for the kind words.
It's not so easy for me – I'm a beginner – to explain this here, as I don't fully understand it myself yet.
But I have documented all the points.
I think, in principle, this one Python function is sufficient.
Best regards,
Otto
Code: Select all | Expand
meine raglogic.py
nano ~/ollama-rag/ragserver/raglogic.py
import chromadb
from chromadb.utils import embedding_functions
import requests
# Chroma vorbereiten
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(
name="hotelwissen",
embedding_function=embedding_functions.DefaultEmbeddingFunction()
)
def frage_beantworten(frage: str) -> str:
result = collection.query(query_texts=[frage], n_results=3)
kontext = " ".join([doc for doc in result["documents"][0]])
payload = {
"model": "mistral",
"prompt": f"Beantworte folgende Frage basierend auf diesem Kontext:\n\n>
}
response = requests.post("http://localhost:11434/api/generate", json=payloa>
for line in response.iter_lines():
if line:
data = line.decode("utf-8")
if '"response":"' in data:
return data.split('"response":"')[1].split('"')[0]
return "Keine Antwort erhalten"
start.sh
cd ~/ollama-rag/ragserver
source venv/bin/activate
uvicorn main:app --host 127.0.0.1 --port 8000
otto-atzwanger@KI-Server:~$ nano ~/ollama-rag/ragserver/raglogic.py
otto-atzwanger@KI-Server:~$ cd ~/ollama-rag/ragserver
source venv/bin/activate
uvicorn main:app --host 127.0.0.1 --port 8000
INFO: Started server process [6402]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

- Otto
- Posts: 6430
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 48 times
- Been thanked: 15 times
- Contact:
Re: Experience Report: Two Months with Our Local AI System
Daer Leandro,
I use the following hardware. The responses are really fast. I asked the AI some questions because we want to deploy the systems directly at customer sites, including how it performs when everything is installed in a VM on existing customer systems.
**CPU**: AMD Ryzen 9 7950X (16 cores / 32 threads)
- Ideal for parallel AI inference (e.g., handling multiple hotel queries simultaneously).
**GPU**: RTX 3090 (24 GB VRAM)
- **Large VRAM** suitable for 7B–13B models (e.g., `llama3-70B` in 4-bit quantized format).
- *Tip:* With `mistral`/`llama3`-7B, multiple models can even fit in memory simultaneously.
**RAM**: 64 GB DDR5-6000
- Sufficient for CPU inference in case the GPU is under load.
**SSD**: 2 TB PCIe 4.0
- Fast loading of large models (e.g., `nougat` for invoice recognition).
Since the **RTX 4090 is not available**
Here’s my question to the AI:
File Details
How do you install a VirtualBox for Windows on a Linux PC? I need Windows Server.
Sure, let me walk you through it step by step — so you want to run a virtual machine with Windows Server on a Linux machine (e.g., Ubuntu), perhaps for testing or as an RDP target. Here's the guide:
Best regards,
Otto
I use the following hardware. The responses are really fast. I asked the AI some questions because we want to deploy the systems directly at customer sites, including how it performs when everything is installed in a VM on existing customer systems.
- Ideal for parallel AI inference (e.g., handling multiple hotel queries simultaneously).
- **Large VRAM** suitable for 7B–13B models (e.g., `llama3-70B` in 4-bit quantized format).
- *Tip:* With `mistral`/`llama3`-7B, multiple models can even fit in memory simultaneously.
- Sufficient for CPU inference in case the GPU is under load.
- Fast loading of large models (e.g., `nougat` for invoice recognition).
Since the **RTX 4090 is not available**
Here’s my question to the AI:
File Details
How do you install a VirtualBox for Windows on a Linux PC? I need Windows Server.
Sure, let me walk you through it step by step — so you want to run a virtual machine with Windows Server on a Linux machine (e.g., Ubuntu), perhaps for testing or as an RDP target. Here's the guide:
Best regards,
Otto
File Details
How does it work to install a virtual box for Windows on a Linux PC? I need Windows Server.
Sure, I'll explain it step by step – so you want to run a virtual machine with Windows Server on a Linux computer (e.g., Ubuntu), for testing or as an RDP target. Here's the guide:
Requirements
A Linux PC with sufficient RAM (at least 8 GB recommended)
Installed Linux operating system (e.g., Ubuntu, Debian, Fedora …)
Windows Server ISO file (e.g., Windows Server 2019, 2022)
VirtualBox + Extension Pack
Step 1: Install VirtualBox
For Ubuntu/Debian:
bash
Kopieren
Bearbeiten
sudo apt update
sudo apt install virtualbox
For Fedora/RHEL-based:
bash
Kopieren
Bearbeiten
sudo dnf install virtualbox
Alternatively, you can get the latest version directly from Oracle’s website.
Step 2: Install Extension Pack
The Extension Pack is needed for USB 2.0/3.0, RDP, etc.
Download here: https://www.virtualbox.org/wiki/Downloads
Then install:
bash
Kopieren
Bearbeiten
sudo VBoxManage extpack install <path-to-.vbox-extpack-file>
Or via the GUI: Open VirtualBox → File → Preferences → Extensions.
Step 3: Create a VM for Windows Server
Start VirtualBox
Create new VM:
Name: e.g., WinServer2022
Type: Microsoft Windows
Version: Windows 2022 (64-bit), or depending on ISO
RAM: e.g., 4096 MB (or more, depending on system)
Hard disk: new virtual disk (e.g., 50 GB, VDI, dynamically allocated)
Then: Select VM → Settings → Storage → Insert ISO file as virtual CD/DVD
Step 4: Install Windows Server
Start the VM
Install Windows Server as you would on a real PC
At the end, activate Windows Server, set up RDP, etc.
Optional: Set up RDP access to the VM
If you want to access the VM via RDP:
Enable RDP in Windows Server
Set the network mode in VM settings to Bridged Adapter (so the VM gets its own IP address on the network)
Question: What is the minimum requirement for the customer's system to work reasonably well?
Answer: For reasonably smooth operation of a VirtualBox VM with Windows Server (e.g., for testing or RDP access), the host system (i.e., the customer’s Linux PC) should meet at least the following requirements:
Minimum Requirements for Host PC
Component Recommendation for productive use
CPU Quad-core processor, e.g., Intel i5 / AMD Ryzen 5 or better
RAM At least 16 GB (so that 4–8 GB can be allocated to the VM)
Storage SSD with at least 100 GB free, ideally NVMe
GPU (optional) Not required unless for graphics-heavy applications
Network Gigabit LAN or stable Wi-Fi for RDP connections
BIOS/UEFI Virtualization enabled (Intel VT-x / AMD-V)
Virtual Machine Configuration
Resource Minimum (for Windows Server 2019/2022)
RAM (VM) 4 GB (better: 6–8 GB)
CPU (VM) 2 virtual CPUs
HDD (VM) 50–80 GB dynamically allocated
Network Mode Bridged (for LAN-based RDP access)
What else is important?
SSD is mandatory – HDDs are too slow for virtual systems.
Cooling system: Ensure good airflow for continuous operation.
Power consumption: Mini-PCs (e.g., Intel NUC, Ryzen Mini PCs) are great if quiet and energy-efficient operation is desired.
Practical Tip
For small companies with 1–2 simultaneous RDP users, a low-cost mini-PC with Ryzen 5, 16 GB RAM, and 1 TB SSD is often sufficient. Windows Server runs stably in VirtualBox on such a system.
- Otto
- Posts: 6430
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 48 times
- Been thanked: 15 times
- Contact:
Re: Experience Report: Two Months with Our Local AI System
Hello friends,
There is a large .dbf file containing around 200,000 forum posts. The posts are stored in memo fields and have so far been used for a traditional full-text search.
Now these contents are to be fed into a local RAG system (Retrieval-Augmented Generation).
Each post has been exported as a separate .txt file – one file per post to allow independent processing of the texts.
Planned steps:
Automatic cleanup of the texts:
Removal of HTML markup, BBCode elements
Saving the cleaned content as plain text files
Transferring the cleaned files into the RAG document folder
Running load_docs.py to index the content using Chroma
I'm really curious to see the quality of the responses and how fast it will work.
Best regards,
Otto
There is a large .dbf file containing around 200,000 forum posts. The posts are stored in memo fields and have so far been used for a traditional full-text search.
Now these contents are to be fed into a local RAG system (Retrieval-Augmented Generation).
Each post has been exported as a separate .txt file – one file per post to allow independent processing of the texts.
Planned steps:
Automatic cleanup of the texts:
Removal of HTML markup, BBCode elements
Code: Select all | Expand
([code], <span>, (, etc.)
Transferring the cleaned files into the RAG document folder
Running load_docs.py to index the content using Chroma
I'm really curious to see the quality of the responses and how fast it will work.
Best regards,
Otto