I have been slow hopping on the AI hype train, for several reasons.

I don’t like to use things I don’t understand, at least at a basic level. We’re getting more and more specialized and more and more complex. AI is a step above computing, and I’m still struggling to understand all the different aspects of computing. AI also makes me sketched out because it inverts basic tenets of programs - it’s non-deterministic, can’t be bug-fixed or troubleshot in the traditional ways, and so much data goes into it that it’s humanly impossible to know what the model has ingested.
I’m cheap and don’t want to pay for a subscription to one of these companies. I also can’t afford local hardware - I don’t have a firstborn child to sell to Nvidia to get a GPU right now.
I don’t trust these companies with the level of personal data that I would be handing over to them if I were to jump into testing the way I really want to.
General hesitation about the hype. What impact are these datacenters really having, when is the bubble going to burst, and so on.
I don’t want to become reliant and delegate my understanding.
In early testing I’ve found them wanting (in particular Copilot). They tend to be “yes-men”, and will hallucinate an answer if none exists.

I’m not vehemently anti-AI. I’ve been slowly utilizing some models, testing Claude, GPT, and Copilot at my workplace to see how they perform on various tasks, and understanding what use-cases they are helpful to me in. I’ve found a few: saving time on converting formats; drafting compliance documents that are bullshit anyway and no one is going to read; searching the web when I don’t know the right search term(s); searching, summarizing, and parsing laws that I can’t begin to manually parse or comprehend; creating marketing flyers; being a helpful assistant whilst stuyding; drafting shell scripts…

I’ve also found a few use-cases where I don’t want to use them at all. Particularly any sort of professional or creative writing that I expect another human to read - emails and messages, blog posts, technical documentation. (It drives me nuts the extent to which people use AI for this purpose and just copy paste with no understanding.) Automation with full access to my files - OpenClaw gives me the heebie jeebies. Chatbot assistants. Half-baked solutions that are shoved into anything and everything.

Back in December, I tried out Ollama on a mini PC, but it just wasn’t there. My PC couldn’t handle it - it didn’t have any sort of GPU power. I didn’t get the benefits of using a model because the model sucked. And running a medium-sized model was like watching a boomer type an email in real-time.

I’ve known my options are as follows:

Get my hands on some sort of decent machine for running a model, like a Mac Mini, and run it locally.
Rent a GPU in the cloud - prohibitively expensive, from my basic research.
Bite the bullet and sign up for one of these big-name services, like Claude or ChatGPT.

I knew option 3 was not something I was going to do, and I don’t have the money for a Mac Mini. I’d done minimal research on cloud GPUs and I thought that was a dead-end, too.

I had a lightbulb moment the other day, though. I was looking into GPU pricing and realized it was priced by the hour instead of listed as a monthly price. Why was that? I wondered if the pricing was different because you aren’t running it 24/7 like you would be paying for a VPS. Maybe that was a key misunderstanding. Running a GPU 24/7 would cost me an arm and a leg, but running one for ~3h a day while I did my prompting was much more feasible. I did a lot of searching, mostly to no avail. I gave up and went to Claude - a perfect example of a use-case that I described. I didn’t know how to search for something, so I asked an LLM.

Claude gave me the rundown, and I was off to the races. From what I could tell I had two options: spin up a dedicated instance on something like RunPod and deploy my own model, or use HuggingFace with its inference providers. In either case, you only pay by usage, not by reservation. I went with HuggingFace to start, as I had no idea what model to deploy, and wanted to experiment.

Now, to be fair, this isn’t completely on my own server. My prompts and context (like uploaded documents or web searches) are being sent off to some other server somewhere, and the responses are generated on that server and being sent back to me. However, I’m much more okay with this than I am OpenAI storing all my personal data and uploaded documents alongside my chats, tied to my email and identity. My chat history and documents live on my own server. I have my choice among many different models to experiment with. I can see under the hood in many respects, and adjust advanced options. Sure, I don’t have access to the frontier models, only open weight models, but that’s a more than fair trade in my eyes.

Deployment

Stack:

Running VPS with wildcard certificate, Docker, and Nginx reverse proxy infrastructure already in place
Open WebUI Docker container
HuggingFace account and API key

VPS

See my post here regarding my current infrastructure setup.

Open WebUI Docker container

Docker compose stack

Create docker-compose.yml and .env file

services:
  open-webui:
    container_name: open-webui
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    env_file:
      - .env
    ports:
      - 127.0.0.1:${WEBUI_PORT}:8080
    volumes:
      - ${DATA_DIR}:/app/backend/data
    networks:
      - npm_network
 
networks:
  npm_network:
    external: true

# .env
WEBUI_PORT=3002
DATA_DIR=./data
OPENAI_API_BASE_URL=https://router.huggingface.co/v1
OPENAI_API_KEY=<your token here>

Nginx reverse proxy

server {
 
  server_name openwebui.domain.com;
 
  location / {
    proxy_pass http://127.0.0.1:3002;
  }
 
  listen 443 ssl;
  include snippets/domain-com.conf;
  include snippets/ssl-params.conf;
 
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_http_version 1.1;
  proxy_read_timeout 900s;
  
}

Web UI configuration

Add models
Turn off suggestions

HuggingFace account and API key

Sign up at https://huggingface.co/join
Create an access token https://huggingface.co/docs/hub/en/security-tokens
- Type: fine-grained
- Permissions: inference

Conclusion

This is the bare minimum to get you started. Tools come next, like web search and knowledge bases, as well as figuring out what model(s) work for my use-cases

I was (and still am) overwhelmed at the sheer number of models to deploy and try out. This is where I’m going to have to do some research and study to understand how models work, how they are trained, what aspects of a model are important for my desired use-case, fine-tuning, and so forth.

One thing I have not tried out much with AI models is RAG, because that involves adding context with potentially very personal documents. However, I see this being one of the most useful aspects of a model, and it’s high on my priority list to try out.

I have two immediate places to go from here:

Get web search up and running. I have an idea to deploy SearXNG on the same server.
Get knowledge bases up and running - I have an idea to tie my Obsidian vault into my server with headless sync and a Docker read-only volume mount for full-text search of my vault files.

EOF

the vimoire

recent

Failing GPU passthrough on Proxmox to Debian host on MacPro 6,1

Obsidian vault restructure to Steph Ango's template

Running open weight AI models with Open WebUI and HuggingFace Inference Providers

Setting up SearXNG behind a VPN as a search plugin for Open WebUI

Migrating to NFS shares (from SMB) on TrueNAS