Securing large language models with a reverse proxy

In a previous post, I explained how to host a private ChatGPT using Docker and Traefik. I didn’t spend a lot of time on the security aspect of the project.

I see many people asking how to expose their large language model on Internet and ask how to secure it. Since most (all?) open-source projects have adopted the OpenAI API, it uses standard HTTP. Therefore you can use all the traditional techniques to secure your large language model with a reverse proxy.

Continue reading
Posted in artificial intelligence, Computer, Generative AI, Large Language Models, Linux, Networking, Security | Leave a comment

Self-hosted coding assistant with llamafile, and docker

There was a recent dramatic improvement on the speed of LLM’s on CPU thanks to llamafile‘s author. She goes on extensively about it on her blog but the short version is: expect 7-billion parameters to be usable on consumer-grade CPU even in Q8. Now it’s certainly possible to self-host a coding assistant with llamafile, and Docker on a VPS. Let’s see how to achieve that.

Continue reading
Posted in artificial intelligence, Computer, Docker, Generative AI, Large Language Models, Linux | Leave a comment

Europe GPU prices update – March 28 2024

With all the buzz about AI these days, let’s have a look at the GPU prices in Europe and check which one gives the best “bang for the buck” as YouTubers like to say.

YouTube is filled with people telling you how cheap GPUs are or that this model is the best value but unfortunately most of those people are living in the USA. Here in Europe, the story is usually different. I checked the cheapest model of each chip, and sorted them by the price per GB VRAM. The full table is available below.

Continue reading
Posted in artificial intelligence, Computer, Generative AI, Hardware, Large Language Models, Stable Diffusion | Leave a comment

Ollama, open-webui, mitmproxy in a docker compose stack, behind traefik

Reading Ollama discord channel, I notice many people want to self-host their chatGPT with Docker and don’t know how to do it. Here’s how to host the whole stack with docker compose.

Continue reading
Posted in artificial intelligence, Computer, Docker, Generative AI, Large Language Models, Linux, Networking, Software | Leave a comment

Troubleshoot HTTP API requests with mitmproxy

Sometimes you connect a new tool to one of your servers and it doesn’t work as expected. You are sure you follow the documentation or tutorials but you don’t get the expected results. Before you throw away everything, you should check what’s actually going on between the 2 applications. And if none of them supports logging requests and responses, you can use mitmproxy for troubleshooting.

Continue reading
Posted in Computer, Docker, Linux, Networking, Security, Software | Leave a comment

Ollama system prompt


I have recently started to use Ollama and I was unimpressed by some models as they did not follow instructions, especially in their output format. I knew about model system prompt but I thought it was fixed in the model.

Then I found out you could change the system prompt at run time with the /set system command and immediately, most models responded as expected. That was so much better!

To set the system prompt with an API call, add the parameter “system” as described in the documentation.

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "system": "This is your new system prompt"
Posted in artificial intelligence, Computer, Generative AI, Large Language Models, Linux | Leave a comment

Intel N100 CPU performance review

I have just bought a mini PC based on Intel N100 CPU. Initially, I was going to buy another Raspberry PI or a used “TinyMiniMicro” PC, but I decided to have a look at the current mini PC offering. I am glad I did.

Continue reading
Posted in Computer, Hardware, Linux, Software | Leave a comment

Restrict docker container resource usage with docker compose

By default, resources available to containers are not limited. However, sometimes, you want to make sure a container is not going to use too much processing power or memory.

To achieve such a thing, in the docker-compose.yml file, add the following sections to the service you want to restrict:

      cpus: "1.0"
      memory: 100M
memswap_limit: 100M

This will effectively limit the container to use at most one CPU and 100 megabytes of memory.

You can specify a float for cpus. You can specify gigabytes (use G)for memory. Always use the same value for memwap_limit and memory.

Note that if the container tries to use more memory than the limit, it will get killed and will be restarted according to the policy. This can cause some issues if your app always requires this much memory (I’m looking at you clamd).

Posted in Computer, Docker, Linux, Software | Leave a comment

OpenSSH CVE-2023-48795 mitigation

If you cannot upgrade your OpenSSH client and/or server to fix CVE-2023-48795, also known as the Terrapin attack, the way to mitigate it is to disable the vulnerable ciphers as Red Hat explains very well.

If you have a recent OpenSSH version, you can disable the the ciphers by adding “-” before them in the Ciphers and MACs options. This works for both the ssh client config (/etc/ssh/ssh_config by default) and the ssh server config (/etc/ssh/sshd_config).

If you have an older OpenSSH version, you may not be able to use the “-“. Then you must explicitly list all the allowed ciphers. Simply remove the vulnerable ciphers and MACs from the respective lists.

For reference, in January 2023, Germany BSI (Federal Office of Information Security) recommended the following settings for SSH for use past 2023 (2029+).

Ciphers,,aes256-ctr,aes192-ctr,aes128-ctr<br>MACs hmac-sha2-512,hmac-sha2-256<br>KexAlgorithms diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256<br>HostKeyAlgorithms ecdsa-sha2-nistp521,ecdsa-sha2-nistp384,ecdsa-sha2-nistp256

A few things to be aware of:

  • be sure to check if the mentioned options are available to your systems before you restart your ssh daemons
  • make sure you have host keys matching the ciphers
  • verify you can connect to your servers after restarting sshd and before you disconnect
  • monitor for connection failures from your clients.

Be careful, Mozilla OpenSSH guidelines have not been updated for a long time and they still recommend vulnerable algorithms.

The team behind Terrapin published a scanner to check if your servers are vulnerable on GitHub.

Posted in Computer, Linux, Networking, Security, Software | Leave a comment

Using generative AI to learn vocabulary

I wanted to help a friend learning English who has trouble learning new vocabulary. She often gets new list of words at school and it’s difficult for her to know how to use them, or remember what they mean.

She usually gets one exercise about the topic where she must fill blanks with words from a list.

Why not use generative AI for that?

I could not achieve good results using a single large prompt, so I decided to explicitly break it into different steps and refer to the whole process later, with “OK” results.

Continue reading
Posted in artificial intelligence, Computer, Generative AI, Software | Leave a comment