Create a customized LLM with Ollama in 2 minutes
Generative AI: A Strategic Concern
Generative artificial intelligence is a real concern and a strategic issue, both for businesses and governmental entities alike.
With the arrival of ChatGPT, numerous solutions have emerged: Gemini or even Mistral, for example.
The problem lies in the fact that:
- Accessing these services can incur very high costs in case of intensive use;
- We do not know what these services are doing with our personal data;
- Our data necessarily passes through an internet connection;
- We are forced to endure the outages that occur on these services.
Fortunately, there’s an alternative to address these problems: Ollama.

What is Ollama?
Ollama is a relatively recent project born out of the desire to simplify the use of language models (often called LLMs for Large Language Models) by allowing them to run directly on local machines rather than hosting them on cloud servers.
The idea behind Ollama is to make language models more accessible to developers while respecting growing concerns about data privacy.
If you know Docker, consider Ollama as the Docker of LLMs. If that doesn’t ring a bell, think of it as a quarantined space on your computer: nothing can get in or out.
With Ollama, there are commands reminiscent of Docker’s commands:
pull
andrun
, for example.
With Ollama, all your data is stored locally on your computer. You can also easily fine-tune language models to your needs.
Goodbye monthly subscriptions to online services! 😁
Installing Ollama
To install Ollama, start by visiting the official website.
Just click on the big "Download" button. You shouldn’t have trouble finding it. 😉
Select your operating system and off you go!

If you are on Mac or Windows
A file should now download to your computer: open it and follow the installation prompts.
If all goes well, you should see something like this:

A message will ask you to install a command ollama
. Just click Install.
Congratulations! You have installed Ollama!
If you are on Linux
Open your terminal and type:
curl -fsSL https://ollama.com/install.sh | sh
You should now be able to use the Ollama command!
Launching an Ollama Instance
Now that Ollama is installed, let’s start a first instance!
Before going any further, verify that the following command returns the version number of Ollama.
CONSOLEollama --version
To start our first instance, we must understand what we are going to do.
We will begin by using Meta’s language model: Llama in its version 3.1 (the very latest).
If this doesn’t ring a bell, this language model is simply the best of the best, even ranked above OpenAI’s ChatGPT in several benchmarks, depending on the configuration. Moreover, it’s open-source.

Now we have your attention. 🙃
Launching the Llama 3.1 model
To launch a language model, we will use the following command:
ollama run <model name>
In <model name>
, you can choose from this massive list of supported language models by Ollama.
We’ll work with llama3.1. Here’s what we’ll type in the terminal:
ollama run llama3.1
There are multiple versions of Llama3.1:
llama3.1
- corresponds to the basic version of the Llama model (8B parameters)llama3.1:70b
- corresponds to the 70B parameters versionllama3.1:405b
- corresponds to the 405B parameters version 🤪Do not download the 405B parameter version unless you have a very powerful machine with 231GB of RAM and about 4 x 4090 GPUs.
Downloading the Llama3.1 model should now be done after a few minutes if you have fiber optics. The model is about 4.7GB.

Using the Llama 3.1 model
Now that the model is ready, let’s ask a question!
>>> What is the answer to the mystery of life?
(The model responds with a philosophical explanation instead of just “42”.)
To stop the model, you have options:
- type
/bye
- press
Ctrl + d
Using Llama 3.1 via a REST API
You can also use Llama 3.1 with an API. To do this, we first need to run our local server, which will also allow you to set up your model directly on a production server.
Starting the Ollama server
To start our server, run:
ollama serve
Your server should now be running with various messages showing info you can safely ignore.
If you see this error:
CONSOLEError: listen tcp 127.0.0.1:11434: bind: address already in use
It means your server is already running. If you installed Ollama on Mac or Windows, this is normal: its installation automatically launches Ollama! Just go to the taskbar, find the Ollama icon, and click "Stop ollama".
If you’re on Linux, you can stop Ollama via terminal:
CONSOLEsystemctl stop ollama
Using the REST API for Ollama
Now open a new terminal and use the tool curl to send a request to our new Ollama server:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Salut comment que ça va ?",
"stream": false
}'
We send the request to an address generated by our Ollama server with three specific parameters:
model
- The model to useprompt
- Our querystream
- When set tofalse
, we receive an object with our response along with additional info. If set totrue
, we would receive a streamed response.
You should get a JSON object you can use in your projects.

Customizing Llama 3.1 to Your Needs
Did you think that was all? Not with Believemy. 😗
We will now learn to customize the Llama model to our needs. If you’re familiar with Docker, this will feel familiar.
Creating a Modelfile
To customize your language model, create a Modelfile
(similar to a Dockerfile
). Open your favorite code editor or if you don’t know what that is, use Notepad (I never thought I’d say that on Believemy 👀).
Create a Modelfile
without extension:
FROM llama3
PARAMETER temperature 3
SYSTEM """
You respond in french. Behave like if you were Steve Jobs and always say at the end of yours messages "Steve Jobs."
"""
Explanations:
FROM
- Our base modelPARAMETER temperature
- The higher the value, the more coherent the model. The lower the value (like 1), the more creative and unpredictable it is.SYSTEM
- Here we define instructions that the language model should always follow. It’s like giving it an initial personality or rules.
For more details on customizing a language model with Ollama, check out this documentation.
Creating a new Ollama model
We can now add our own model (Llama custom version):
ollama create steve -f ./Modelfile
Here’s what this does:
create
- Creates a new modelsteve
- Our chosen name for the model-f
- Indicates we’ll specify a path to a file./Modelfile
- The file path
Use your chosen name and path. The command might take a while because we’re downloading and customizing the language model again.

Using a customized Ollama model
Now we can finally use our new model! Remember the command? 😉
ollama run steve
Here my model is named steve. For your model, use your chosen name. Don’t confuse it with "llama".

Isn’t that incredible? 🥲
You can also use the REST API:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "steve",
"prompt": "Salut comment que ça va ?",
"stream": false
}'
Conclusion
The possibilities are huge! Have fun with Ollama! This mini-tutorial was a pleasure to create. If you enjoyed it, feel free to share it on your networks and tag Believemy or me, so we can comment on your share!
In the meantime, if you want to discuss LLM, come visit our private Discord channel.
Peace! 😉
Steve Jobs. (🙃)