Experience Report: Two Months with Our Local AI System

Otto · Post by **Otto** » Sat Apr 05, 2025 6:16 am

Hello friends,
Two months after launching our local AI setup – based on Ollama running on Ubuntu with a RAG (Retrieval-Augmented Generation) extension – it’s clear: our daily work with documents has significantly changed for the better.

Our Document Management System (DMS) follows a deliberately simple structure: HTML files, JSON metadata, and organized folder hierarchies. Nothing fancy or overly complex. The key difference now is the new AI integration: all contents are automatically transferred into a vector database (Chroma), where they’re indexed and stored with context. The locally installed AI can access this data – and provides clear, accurate answers.

What has changed in practice:
The DMS became a knowledge source: Instead of just storing documents, the AI “understands” their content and can respond meaningfully.

Context-based answers: The AI doesn’t just look for keywords – it intelligently searches across documents and combines relevant information.

Fully automated integration: Every new document automatically becomes part of the knowledge network – without extra effort.

High-quality responses: The answers are precise, easy to understand, and far superior to traditional full-text search.

100% local operation: No cloud, no API fees, no internet dependency – and full control over data privacy.

A surprising bonus:
What really stands out is the practical side effect: Everyone in the company – whether at the front desk, in the kitchen, or in maintenance – contributes to the AI knowledge base simply by writing into the DMS. Without even realizing it. It’s what we’d call effortless knowledge management.

For developers:
Integration is refreshingly simple. You send a text request via the local HTTP API and receive a JSON response. That’s it. Existing applications can easily tap into the system without major changes – but with a huge gain in functionality.

Conclusion:
For us, combining a local DMS with AI is far from a hype. It’s a practical, forward-thinking step – especially for small businesses that want to actively use their internal knowledge instead of just storing it.

Best regards,
Otto

Post by **Antonio Linares** » Sat Apr 05, 2025 6:35 am

Dear Otto,

Would you be so kind to explain us how to use Chroma and some examples ?

Congratulations!

Post by **cnavarro** » Sat Apr 05, 2025 2:36 pm

Dear Otto,

Reading comments like yours helps encourage us all to delve deeper into the topic of AI.
Thank you and congratulations.

Post by **leandro** » Sat Apr 05, 2025 5:05 pm

Que bien Otto

Que versión de oLlama estas usando? y que configuración de maquina tienes para lograr el objetivo? con la configuración de maquina que tienes, responde de manera rápida?

Gracias de antemano por la respuesta.

Otto · Post by **Otto** » Sat Apr 05, 2025 5:21 pm

Dear Antonio, dear Cristobal,
thank you for the kind words.
It's not so easy for me – I'm a beginner – to explain this here, as I don't fully understand it myself yet.
But I have documented all the points.

I think, in principle, this one Python function is sufficient.

Best regards,
Otto

Code: Select all | Expand

meine raglogic.py

nano ~/ollama-rag/ragserver/raglogic.py

import chromadb
from chromadb.utils import embedding_functions
import requests

# Chroma vorbereiten
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(
    name="hotelwissen",
    embedding_function=embedding_functions.DefaultEmbeddingFunction()
    )


def frage_beantworten(frage: str) -> str:
    result = collection.query(query_texts=[frage], n_results=3)
    kontext = " ".join([doc for doc in result["documents"][0]])

    payload = {
        "model": "mistral",
        "prompt": f"Beantworte folgende Frage basierend auf diesem Kontext:\n\n>
    }

    response = requests.post("http://localhost:11434/api/generate", json=payloa>
    for line in response.iter_lines():
        if line:
            data = line.decode("utf-8")
            if '"response":"' in data:
                return data.split('"response":"')[1].split('"')[0]
    return "Keine Antwort erhalten"

start.sh

cd ~/ollama-rag/ragserver
source venv/bin/activate
uvicorn main:app --host 127.0.0.1 --port 8000

otto-atzwanger@KI-Server:~$ nano ~/ollama-rag/ragserver/raglogic.py
otto-atzwanger@KI-Server:~$ cd ~/ollama-rag/ragserver
source venv/bin/activate
uvicorn main:app --host 127.0.0.1 --port 8000
INFO:     Started server process [6402]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Otto · Post by **Otto** » Sat Apr 05, 2025 5:58 pm

Daer Leandro,
I use the following hardware. The responses are really fast. I asked the AI some questions because we want to deploy the systems directly at customer sites, including how it performs when everything is installed in a VM on existing customer systems.

**CPU**: AMD Ryzen 9 7950X (16 cores / 32 threads)
- Ideal for parallel AI inference (e.g., handling multiple hotel queries simultaneously).

**GPU**: RTX 3090 (24 GB VRAM)
- **Large VRAM** suitable for 7B–13B models (e.g., `llama3-70B` in 4-bit quantized format).
- *Tip:* With `mistral`/`llama3`-7B, multiple models can even fit in memory simultaneously.

**RAM**: 64 GB DDR5-6000
- Sufficient for CPU inference in case the GPU is under load.

**SSD**: 2 TB PCIe 4.0
- Fast loading of large models (e.g., `nougat` for invoice recognition).

Since the **RTX 4090 is not available**

Here’s my question to the AI:
File Details
How do you install a VirtualBox for Windows on a Linux PC? I need Windows Server.

Sure, let me walk you through it step by step — so you want to run a virtual machine with Windows Server on a Linux machine (e.g., Ubuntu), perhaps for testing or as an RDP target. Here's the guide:
Best regards,
Otto

File Details
How does it work to install a virtual box for Windows on a Linux PC? I need Windows Server.

Sure, I'll explain it step by step – so you want to run a virtual machine with Windows Server on a Linux computer (e.g., Ubuntu), for testing or as an RDP target. Here's the guide:

Requirements

A Linux PC with sufficient RAM (at least 8 GB recommended)

Installed Linux operating system (e.g., Ubuntu, Debian, Fedora …)

Windows Server ISO file (e.g., Windows Server 2019, 2022)

VirtualBox + Extension Pack

Step 1: Install VirtualBox
For Ubuntu/Debian:

bash
Kopieren
Bearbeiten
sudo apt update
sudo apt install virtualbox
For Fedora/RHEL-based:

bash
Kopieren
Bearbeiten
sudo dnf install virtualbox
Alternatively, you can get the latest version directly from Oracle’s website.

Step 2: Install Extension Pack
The Extension Pack is needed for USB 2.0/3.0, RDP, etc.

Download here: https://www.virtualbox.org/wiki/Downloads

Then install:

bash
Kopieren
Bearbeiten
sudo VBoxManage extpack install <path-to-.vbox-extpack-file>
Or via the GUI: Open VirtualBox → File → Preferences → Extensions.

Step 3: Create a VM for Windows Server

Start VirtualBox

Create new VM:

Name: e.g., WinServer2022

Type: Microsoft Windows

Version: Windows 2022 (64-bit), or depending on ISO

RAM: e.g., 4096 MB (or more, depending on system)

Hard disk: new virtual disk (e.g., 50 GB, VDI, dynamically allocated)

Then: Select VM → Settings → Storage → Insert ISO file as virtual CD/DVD

Step 4: Install Windows Server

Start the VM

Install Windows Server as you would on a real PC

At the end, activate Windows Server, set up RDP, etc.

Optional: Set up RDP access to the VM
If you want to access the VM via RDP:

Enable RDP in Windows Server

Set the network mode in VM settings to Bridged Adapter (so the VM gets its own IP address on the network)

Question: What is the minimum requirement for the customer's system to work reasonably well?

Answer: For reasonably smooth operation of a VirtualBox VM with Windows Server (e.g., for testing or RDP access), the host system (i.e., the customer’s Linux PC) should meet at least the following requirements:

Minimum Requirements for Host PC

Component Recommendation for productive use
CPU Quad-core processor, e.g., Intel i5 / AMD Ryzen 5 or better
RAM At least 16 GB (so that 4–8 GB can be allocated to the VM)
Storage SSD with at least 100 GB free, ideally NVMe
GPU (optional) Not required unless for graphics-heavy applications
Network Gigabit LAN or stable Wi-Fi for RDP connections
BIOS/UEFI Virtualization enabled (Intel VT-x / AMD-V)
Virtual Machine Configuration

Resource Minimum (for Windows Server 2019/2022)
RAM (VM) 4 GB (better: 6–8 GB)
CPU (VM) 2 virtual CPUs
HDD (VM) 50–80 GB dynamically allocated
Network Mode Bridged (for LAN-based RDP access)
What else is important?

SSD is mandatory – HDDs are too slow for virtual systems.

Cooling system: Ensure good airflow for continuous operation.

Power consumption: Mini-PCs (e.g., Intel NUC, Ryzen Mini PCs) are great if quiet and energy-efficient operation is desired.

Practical Tip
For small companies with 1–2 simultaneous RDP users, a low-cost mini-PC with Ryzen 5, 16 GB RAM, and 1 TB SSD is often sufficient. Windows Server runs stably in VirtualBox on such a system.

Otto · Post by **Otto** » Sun Apr 06, 2025 7:28 am

Hello friends,
There is a large .dbf file containing around 200,000 forum posts. The posts are stored in memo fields and have so far been used for a traditional full-text search.

Now these contents are to be fed into a local RAG system (Retrieval-Augmented Generation).

Each post has been exported as a separate .txt file – one file per post to allow independent processing of the texts.

Planned steps:
Automatic cleanup of the texts:
Removal of HTML markup, BBCode elements

Code: Select all | Expand

([code], <span>, &#40;, etc.)

Saving the cleaned content as plain text files

Transferring the cleaned files into the RAG document folder

Running load_docs.py to index the content using Chroma

I'm really curious to see the quality of the responses and how fast it will work.

Best regards,
Otto

FiveTech Software tech support forums

Experience Report: Two Months with Our Local AI System

Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System

Re: Experience Report: Two Months with Our Local AI System