Xentoo blog

Self-hosted coding assistant with llamafile, continue.dev and docker

There was a recent dramatic improvement on the speed of LLM’s on CPU thanks to llamafile’s author. She goes on extensively about it on her blog but the short version is: expect 7-billion parameters to be usable on consumer-grade CPU even in Q8. Now it’s certainly possible to self-host a coding assistant with llamafile, continue.dev and Docker on a VPS. Let’s see how to achieve that. I’ll use Docker + Traefik but you can easily convert it to anything else (native + nginx for example). ...

Europe GPU prices update - March 28 2024

With all the buzz about AI these days, let’s have a look at the GPU prices in Europe and check which one gives the best “bang for the buck” as YouTubers like to say. YouTube is filled with people telling you how cheap GPUs are or that this model is the best value but unfortunately most of those people are living in the USA. Here in Europe, the story is usually different. I checked the cheapest model of each chip, and sorted them by the price per GB VRAM. The full table is available below. ...

Ollama, open-webui, mitmproxy in a docker compose stack, behind traefik

Reading Ollama discord channel, I notice many people want to self-host their chatGPT with Docker and don’t know how to do it. Here’s how to host the whole stack with docker compose. Here’s my docker-compose.yml including the mitmproxy from the previous article. version: "3" services: ollama: build: ollama user: 1001:1001 environment: - OLLAMA_HOST=0.0.0.0 - OLLAMA_DEBUG=1 - OLLAMA_KEEP_ALIVE=60m volumes: - /etc/localtime:/etc/localtime:ro - ollama_models:/home/ollama/.ollama/models mitmproxy: image: mitmproxy/mitmproxy command: mitmweb --web-host 0.0.0.0 --web-port 8080 --mode reverse:http://ollama:11434@11434 --verbose --anticache --anticomp depends_on: - ollama labels: - "traefik.enable=true" # ollama endpoint - "traefik.http.routers.ollama.rule=Host(`llm.example.com`)" - "traefik.http.routers.ollama.tls=true" - "traefik.http.routers.ollama.entrypoints=websecure" - "traefik.http.routers.ollama.tls.certresolver=le" - "traefik.http.routers.ollama.service=ollama" - "traefik.http.services.ollama.loadbalancer.server.port=11434" - "traefik.http.services.ollama.loadbalancer.server.scheme=http" # mitmweb endpoint - "traefik.http.routers.ollama-mitm.rule=Host(`inspector.example.com`)" - "traefik.http.routers.ollama-mitm.tls=true" - "traefik.http.routers.ollama-mitm.entrypoints=websecure" - "traefik.http.routers.ollama-mitm.tls.certresolver=le" - "traefik.http.routers.ollama-mitm.service=ollama-mitm" - "traefik.http.services.ollama-mitm.loadbalancer.server.port=8080" - "traefik.http.services.ollama-mitm.loadbalancer.server.scheme=http" - "traefik.http.middlewares.ollama-mitm-headers.headers.customrequestheaders.Host=0.0.0.0" - "traefik.http.middlewares.ollama-mitm-headers.headers.customrequestheaders.Origin=" open-webui: build: context: . args: OLLAMA_API_BASE_URL: '/ollama/api' dockerfile: Dockerfile image: ghcr.io/open-webui/open-webui:main volumes: - /etc/localtime:/etc/localtime:ro - open-webui:/app/backend/data depends_on: environment: - 'OLLAMA_API_BASE_URL=http://mitmproxy:11434/api' - 'WEBUI_SECRET_KEY=' labels: - "traefik.enable=true" - "traefik.http.routers.open-webui.rule=Host(`chatgpt.example.com`)" - "traefik.http.routers.open-webui.tls=true" - "traefik.http.routers.open-webui.entrypoints=websecure" - "traefik.http.routers.open-webui.tls.certresolver=le" - "traefik.http.routers.open-webui.service=open-webui" - "traefik.http.services.open-webui.loadbalancer.server.port=8080" - "traefik.http.services.open-webui.loadbalancer.server.scheme=http" volumes: ollama_models: open-webui: This exposes 3 different endpoints: ...

Troubleshoot HTTP API requests with mitmproxy

Sometimes you connect a new tool to one of your servers and it doesn’t work as expected. You are sure you follow the documentation or tutorials but you don’t get the expected results. Before you throw away everything, you should check what’s actually going on between the 2 applications. And if none of them supports logging requests and responses, you can use mitmproxy for troubleshooting. As the name imply (MITM = Man In the Middle), mitmproxy sits between both applications and intercepts all the traffic. You can use it to log the traffic but also modify the content of the requests and/or responses? On the fly. I will not cover that use-case here. ...

Ollama system prompt

Ollama I have recently started to use Ollama and I was unimpressed by some models as they did not follow instructions, especially in their output format. I knew about model system prompt but I thought it was fixed in the model. Then I found out you could change the system prompt at run time with the /set system command and immediately, most models responded as expected. That was so much better! ...

Intel N100 CPU performance review

I have just bought a mini PC based on Intel N100 CPU. Initially, I was going to buy another Raspberry PI or a used “TinyMiniMicro” PC, but I decided to have a look at the current mini PC offering. I am glad I did. On a major Chinese reseller website, I saw a lot of similar products with the Intel N100 CPU so I had a look at reviews (here here) and boy, this thing is powerful (for its size). I’ll talk about the mini PC in an other post. ...

Restrict docker container resource usage with docker compose

By default, resources available to containers are not limited. However, sometimes, you want to make sure a container is not going to use too much processing power or memory. To achieve such a thing, in the docker-compose.yml file, add the following sections to the service you want to restrict: deploy: resources: limits: cpus: "1.0" memory: 100M memswap_limit: 100M This will effectively limit the container to use at most one CPU and 100 megabytes of memory. ...

OpenSSH CVE-2023-48795 mitigation

If you cannot upgrade your OpenSSH client and/or server to fix CVE-2023-48795, also known as the Terrapin attack, the way to mitigate it is to disable the vulnerable ciphers as Red Hat explains very well. If you have a recent OpenSSH version, you can disable the the ciphers by adding “-” before them in the Ciphers and MACs options. This works for both the ssh client config (/etc/ssh/ssh_config by default) and the ssh server config (/etc/ssh/sshd_config). ...

Using generative AI to learn vocabulary

I wanted to help a friend learning English who has trouble learning new vocabulary. She often gets new list of words at school and it’s difficult for her to know how to use them, or remember what they mean. She usually gets one exercise about the topic where she must fill blanks with words from a list. Why not use generative AI for that? I could not achieve good results using a single large prompt, so I decided to explicitly break it into different steps and refer to the whole process later, with “OK” results. ...

Contabo: A great cloud for personal use

I’m a personal user of Contabo’s cloud services, and I’ve been delighted with them. They offer a wide range of services to choose from, including VPS, dedicated servers, and cloud storage. I’m currently using a VPS to host my personal website and email, and I have also used their Storage VPS and object storage in the past. I have had no issue with my VPS over the years. I’ve also been impressed with Contabo’s customer support. They’ve always been quick to respond to my questions. ...