Mistral Vibe VS Code extension with local models

Mistral Vibe VS Code extension stores its configuration in ~/.vibe/config.toml and it shares its settings with Mistral Vibe CLI. On Microsoft Windows, Terminal understands ~/.vibe, and if it does not, go to c:\users\<your user>\.vibe. Note that if you work with a remote, the configuration of the extension is on the remote computer. Providers configuration To use the VS Code extension with local models, or any 3rd party for that matter (OpenAI, Anthropic, …), edit ~/.vibe/config.toml and add a providers section like so: ...

June 1, 2026 · 2 min

Mistral retiring models during summer 2026

On May 29, 2026, Mistral sent an email to its customers indicating it will retire models on May 31, June 30 and July 31 2026. For each model, there is a newer, better and most of the times, more expensive alternative. Devstral Small 1.1, Mistral Small 3.2 and Magistral Small 1.2 are replaced by Mistral Small 4 at $0.15/$0.6 per million tokens. For those who were using the first two, it’s a 50% increase on input tokens and 100% increase on output tokens; ouch! For those using Magistral Small 1.2, it’s actually a 70%/60% decrease; nice! ...

May 30, 2026 · 2 min

Securing large language models with a reverse proxy

In a previous post, I explained how to host a private ChatGPT using Docker and Traefik. I didn’t spend a lot of time on the security aspect of the project. I see many people asking how to expose their large language model on Internet and ask how to secure it. Since most (all?) open-source projects have adopted the OpenAI API, it uses standard HTTP. Therefore you can use all the traditional techniques to secure your large language model with a reverse proxy. ...

April 5, 2024 · 4 min

Self-hosted coding assistant with llamafile, continue.dev and docker

There was a recent dramatic improvement on the speed of LLM’s on CPU thanks to llamafile’s author. She goes on extensively about it on her blog but the short version is: expect 7-billion parameters to be usable on consumer-grade CPU even in Q8. Now it’s certainly possible to self-host a coding assistant with llamafile, continue.dev and Docker on a VPS. Let’s see how to achieve that. I’ll use Docker + Traefik but you can easily convert it to anything else (native + nginx for example). ...

April 1, 2024 · 2 min

Europe GPU prices update - March 28 2024

With all the buzz about AI these days, let’s have a look at the GPU prices in Europe and check which one gives the best “bang for the buck” as YouTubers like to say. YouTube is filled with people telling you how cheap GPUs are or that this model is the best value but unfortunately most of those people are living in the USA. Here in Europe, the story is usually different. I checked the cheapest model of each chip, and sorted them by the price per GB VRAM. The full table is available below. ...

March 28, 2024 · 3 min

Ollama, open-webui, mitmproxy in a docker compose stack, behind traefik

Reading Ollama discord channel, I notice many people want to self-host their chatGPT with Docker and don’t know how to do it. Here’s how to host the whole stack with docker compose. Here’s my docker-compose.yml including the mitmproxy from the previous article. version: "3" services: ollama: build: ollama user: 1001:1001 environment: - OLLAMA_HOST=0.0.0.0 - OLLAMA_DEBUG=1 - OLLAMA_KEEP_ALIVE=60m volumes: - /etc/localtime:/etc/localtime:ro - ollama_models:/home/ollama/.ollama/models mitmproxy: image: mitmproxy/mitmproxy command: mitmweb --web-host 0.0.0.0 --web-port 8080 --mode reverse:http://ollama:11434@11434 --verbose --anticache --anticomp depends_on: - ollama labels: - "traefik.enable=true" # ollama endpoint - "traefik.http.routers.ollama.rule=Host(`llm.example.com`)" - "traefik.http.routers.ollama.tls=true" - "traefik.http.routers.ollama.entrypoints=websecure" - "traefik.http.routers.ollama.tls.certresolver=le" - "traefik.http.routers.ollama.service=ollama" - "traefik.http.services.ollama.loadbalancer.server.port=11434" - "traefik.http.services.ollama.loadbalancer.server.scheme=http" # mitmweb endpoint - "traefik.http.routers.ollama-mitm.rule=Host(`inspector.example.com`)" - "traefik.http.routers.ollama-mitm.tls=true" - "traefik.http.routers.ollama-mitm.entrypoints=websecure" - "traefik.http.routers.ollama-mitm.tls.certresolver=le" - "traefik.http.routers.ollama-mitm.service=ollama-mitm" - "traefik.http.services.ollama-mitm.loadbalancer.server.port=8080" - "traefik.http.services.ollama-mitm.loadbalancer.server.scheme=http" - "traefik.http.middlewares.ollama-mitm-headers.headers.customrequestheaders.Host=0.0.0.0" - "traefik.http.middlewares.ollama-mitm-headers.headers.customrequestheaders.Origin=" open-webui: build: context: . args: OLLAMA_API_BASE_URL: '/ollama/api' dockerfile: Dockerfile image: ghcr.io/open-webui/open-webui:main volumes: - /etc/localtime:/etc/localtime:ro - open-webui:/app/backend/data depends_on: environment: - 'OLLAMA_API_BASE_URL=http://mitmproxy:11434/api' - 'WEBUI_SECRET_KEY=' labels: - "traefik.enable=true" - "traefik.http.routers.open-webui.rule=Host(`chatgpt.example.com`)" - "traefik.http.routers.open-webui.tls=true" - "traefik.http.routers.open-webui.entrypoints=websecure" - "traefik.http.routers.open-webui.tls.certresolver=le" - "traefik.http.routers.open-webui.service=open-webui" - "traefik.http.services.open-webui.loadbalancer.server.port=8080" - "traefik.http.services.open-webui.loadbalancer.server.scheme=http" volumes: ollama_models: open-webui: This exposes 3 different endpoints: ...

March 23, 2024 · 1 min

Ollama system prompt

Ollama I have recently started to use Ollama and I was unimpressed by some models as they did not follow instructions, especially in their output format. I knew about model system prompt but I thought it was fixed in the model. Then I found out you could change the system prompt at run time with the /set system command and immediately, most models responded as expected. That was so much better! ...

March 18, 2024 · 1 min

Using generative AI to learn vocabulary

I wanted to help a friend learning English who has trouble learning new vocabulary. She often gets new list of words at school and it’s difficult for her to know how to use them, or remember what they mean. She usually gets one exercise about the topic where she must fill blanks with words from a list. Why not use generative AI for that? I could not achieve good results using a single large prompt, so I decided to explicitly break it into different steps and refer to the whole process later, with “OK” results. ...

November 21, 2023 · 7 min

Stable Diffusion: samplers comparison

I ran the same prompt using many samplers at different steps counts to evaluate which one(s) give a decent quality at a low step count. I have not used the “restore faces” option. Here are my observations related to image quality (artifacts) and convergence. Quality at lower steps At 10 steps, a few samplers are unusable: DPM++ 2M and its variants, DDIM. At 15 steps, all samplers are OK except DPM++ 2M SDE and its Karras variant are unusable. ...

July 29, 2023 · 2 min

SDXL 1.0 is out!

And voilà! SDXL 1.0 is out. After tinkering a bit, I think it’s working pretty well. As with SDXL 0.9, I must use both base and refiner models to get good pictures, but they are of excellent quality. Use the pipeline from ComfyUI and put the models at the right place: https://comfyanonymous.github.io/ComfyUI_examples/sdxl/ Note that it’s really slow with an AMD Radeon RX 6700 XT, especially because of the 2 models. A few links: ...

July 28, 2023 · 1 min