Securing large language models with a reverse proxy

In a previous post, I explained how to host a private ChatGPT using Docker and Traefik. I didn’t spend a lot of time on the security aspect of the project. I see many people asking how to expose their large language model on Internet and ask how to secure it. Since most (all?) open-source projects have adopted the OpenAI API, it uses standard HTTP. Therefore you can use all the traditional techniques to secure your large language model with a reverse proxy....

April 5, 2024

Self-hosted coding assistant with llamafile, continue.dev and docker

There was a recent dramatic improvement on the speed of LLM’s on CPU thanks to llamafile’s author. She goes on extensively about it on her blog but the short version is: expect 7-billion parameters to be usable on consumer-grade CPU even in Q8. Now it’s certainly possible to self-host a coding assistant with llamafile, continue.dev and Docker on a VPS. Let’s see how to achieve that. I’ll use Docker + Traefik but you can easily convert it to anything else (native + nginx for example)....

April 1, 2024

Europe GPU prices update - March 28 2024

With all the buzz about AI these days, let’s have a look at the GPU prices in Europe and check which one gives the best “bang for the buck” as YouTubers like to say. YouTube is filled with people telling you how cheap GPUs are or that this model is the best value but unfortunately most of those people are living in the USA. Here in Europe, the story is usually different....

March 28, 2024

Ollama, open-webui, mitmproxy in a docker compose stack, behind traefik

Reading Ollama discord channel, I notice many people want to self-host their chatGPT with Docker and don’t know how to do it. Here’s how to host the whole stack with docker compose. Here’s my docker-compose.yml including the mitmproxy from the previous article. version: "3" services: ollama: build: ollama user: 1001:1001 environment: - OLLAMA_HOST=0.0.0.0 - OLLAMA_DEBUG=1 - OLLAMA_KEEP_ALIVE=60m volumes: - /etc/localtime:/etc/localtime:ro - ollama_models:/home/ollama/.ollama/models mitmproxy: image: mitmproxy/mitmproxy command: mitmweb --web-host 0.0.0.0 --web-port 8080 --mode reverse:http://ollama:11434@11434 --verbose --anticache --anticomp depends_on: - ollama labels: - "traefik....

March 23, 2024

Ollama system prompt

Ollama I have recently started to use Ollama and I was unimpressed by some models as they did not follow instructions, especially in their output format. I knew about model system prompt but I thought it was fixed in the model. Then I found out you could change the system prompt at run time with the /set system command and immediately, most models responded as expected. That was so much better!...

March 18, 2024

Using generative AI to learn vocabulary

I wanted to help a friend learning English who has trouble learning new vocabulary. She often gets new list of words at school and it’s difficult for her to know how to use them, or remember what they mean. She usually gets one exercise about the topic where she must fill blanks with words from a list. Why not use generative AI for that? I could not achieve good results using a single large prompt, so I decided to explicitly break it into different steps and refer to the whole process later, with “OK” results....

November 21, 2023

Stable Diffusion: samplers comparison

I ran the same prompt using many samplers at different steps counts to evaluate which one(s) give a decent quality at a low step count. I have not used the “restore faces” option. Here are my observations related to image quality (artifacts) and convergence. Quality at lower steps At 10 steps, a few samplers are unusable: DPM++ 2M and its variants, DDIM. At 15 steps, all samplers are OK except DPM++ 2M SDE and its Karras variant are unusable....

July 29, 2023

SDXL 1.0 is out!

And voilà! SDXL 1.0 is out. After tinkering a bit, I think it’s working pretty well. As with SDXL 0.9, I must use both base and refiner models to get good pictures, but they are of excellent quality. Use the pipeline from ComfyUI and put the models at the right place: https://comfyanonymous.github.io/ComfyUI_examples/sdxl/ Note that it’s really slow with an AMD Radeon RX 6700 XT, especially because of the 2 models....

July 28, 2023

ComfyUI: remove metadata from image files

When you generate a file using ComfyUI, metadata are added to the image automatically. Amongst the metadata, there is the full workflow including the prompt. If you want to remove those data, you can use ImageMagick convert with the --strip option. convert image.png --strip image_strip.png If you want to alter the original file, use mogrify: mogrify --strip image.png

July 23, 2023

ComfyUI: batch run from command line with API

While AUTOMATIC1111 can generate images based on prompt variations, I haven’t found the same possibility in ComfyUI. However, you can achieve the same result thanks to ComfyUI API and curl. When you click “queue prompt” in ComfyUI, it actually sends a POST request with the whole workflow as JSON data to http://127.0.0.1:8188/prompt . To get the workflow as JSON, go to the UI and click on the settings icon, then enable Dev mode Options and click close....

July 22, 2023