Create a customized LLM with Ollama in 2 minutes

Discover Ollama, run LLMs language models locally in total confidentiality and customize LLama easily on your device!

Updated on December 9, 2024

Generative AI: A Strategic Concern

Generative artificial intelligence is a real concern and a strategic issue, both for businesses and governmental entities alike.

With the arrival of ChatGPT, numerous solutions have emerged: Gemini or even Mistral, for example.

The problem lies in the fact that:

Accessing these services can incur very high costs in case of intensive use;
We do not know what these services are doing with our personal data;
Our data necessarily passes through an internet connection;
We are forced to endure the outages that occur on these services.

Fortunately, there’s an alternative to address these problems: Ollama.

Ollama’s Homepage

What is Ollama?

Ollama is a relatively recent project born out of the desire to simplify the use of language models (often called LLMs for Large Language Models) by allowing them to run directly on local machines rather than hosting them on cloud servers.

The idea behind Ollama is to make language models more accessible to developers while respecting growing concerns about data privacy.

If you know Docker, consider Ollama as the Docker of LLMs. If that doesn’t ring a bell, think of it as a quarantined space on your computer: nothing can get in or out.

With Ollama, there are commands reminiscent of Docker’s commands: pull and run, for example.

With Ollama, all your data is stored locally on your computer. You can also easily fine-tune language models to your needs.

Goodbye monthly subscriptions to online services! 😁

Installing Ollama

To install Ollama, start by visiting the official website.

Just click on the big "Download" button. You shouldn’t have trouble finding it. 😉

Select your operating system and off you go!

Click on your operating system

If you are on Mac or Windows

A file should now download to your computer: open it and follow the installation prompts.

If all goes well, you should see something like this:

Screenshot of Ollama’s welcome message on Mac

A message will ask you to install a command ollama. Just click Install.

Congratulations! You have installed Ollama!

If you are on Linux

Open your terminal and type:

CONSOLE

curl -fsSL https://ollama.com/install.sh | sh

You should now be able to use the Ollama command!

Launching an Ollama Instance

Now that Ollama is installed, let’s start a first instance!

Before going any further, verify that the following command returns the version number of Ollama.
CONSOLE
ollama --version

To start our first instance, we must understand what we are going to do.

We will begin by using Meta’s language model: Llama in its version 3.1 (the very latest).

If this doesn’t ring a bell, this language model is simply the best of the best, even ranked above OpenAI’s ChatGPT in several benchmarks, depending on the configuration. Moreover, it’s open-source.

Performances between Llama 3.1 (405 billion parameters) and GPT4 (source)

Now we have your attention. 🙃

Launching the Llama 3.1 model

To launch a language model, we will use the following command:

CONSOLE

ollama run <model name>

In <model name>, you can choose from this massive list of supported language models by Ollama.

We’ll work with llama3.1. Here’s what we’ll type in the terminal:

CONSOLE

ollama run llama3.1

There are multiple versions of Llama3.1:

llama3.1 - corresponds to the basic version of the Llama model (8B parameters)

llama3.1:70b - corresponds to the 70B parameters version

llama3.1:405b - corresponds to the 405B parameters version 🤪

Do not download the 405B parameter version unless you have a very powerful machine with 231GB of RAM and about 4 x 4090 GPUs.

Downloading the Llama3.1 model should now be done after a few minutes if you have fiber optics. The model is about 4.7GB.

Our model is now ready to respond

Using the Llama 3.1 model

Now that the model is ready, let’s ask a question!

CONSOLE

>>> What is the answer to the mystery of life?

(The model responds with a philosophical explanation instead of just “42”.)

To stop the model, you have options:

type /bye

press Ctrl + d

Using Llama 3.1 via a REST API

You can also use Llama 3.1 with an API. To do this, we first need to run our local server, which will also allow you to set up your model directly on a production server.

Starting the Ollama server

To start our server, run:

CONSOLE

ollama serve

Your server should now be running with various messages showing info you can safely ignore.

If you see this error:
CONSOLE
Error: listen tcp 127.0.0.1:11434: bind: address already in use
It means your server is already running. If you installed Ollama on Mac or Windows, this is normal: its installation automatically launches Ollama! Just go to the taskbar, find the Ollama icon, and click "Stop ollama".

If you’re on Linux, you can stop Ollama via terminal:
CONSOLE
systemctl stop ollama

Using the REST API for Ollama

Now open a new terminal and use the tool curl to send a request to our new Ollama server:

CONSOLE

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Salut comment que ça va ?",
  "stream": false
}'

We send the request to an address generated by our Ollama server with three specific parameters:

model - The model to use
prompt - Our query
stream - When set to false, we receive an object with our response along with additional info. If set to true, we would receive a streamed response.

You should get a JSON object you can use in your projects.

Example of a response from our brand new Ollama server

Customizing Llama 3.1 to Your Needs

Did you think that was all? Not with Believemy. 😗

We will now learn to customize the Llama model to our needs. If you’re familiar with Docker, this will feel familiar.

Creating a Modelfile

To customize your language model, create a Modelfile (similar to a Dockerfile). Open your favorite code editor or if you don’t know what that is, use Notepad (I never thought I’d say that on Believemy 👀).

Create a Modelfile without extension:

CONSOLE

FROM llama3

PARAMETER temperature 3

SYSTEM """
You respond in french. Behave like if you were Steve Jobs and always say at the end of yours messages "Steve Jobs."
"""

Explanations:

FROM - Our base model
PARAMETER temperature - The higher the value, the more coherent the model. The lower the value (like 1), the more creative and unpredictable it is.
SYSTEM - Here we define instructions that the language model should always follow. It’s like giving it an initial personality or rules.

For more details on customizing a language model with Ollama, check out this documentation.

Creating a new Ollama model

We can now add our own model (Llama custom version):

CONSOLE

ollama create steve -f ./Modelfile

Here’s what this does:

create - Creates a new model
steve - Our chosen name for the model
-f - Indicates we’ll specify a path to a file
./Modelfile - The file path

Use your chosen name and path. The command might take a while because we’re downloading and customizing the language model again.

Our new Ollama model is ready

Using a customized Ollama model

Now we can finally use our new model! Remember the command? 😉

CONSOLE

ollama run steve

Here my model is named steve. For your model, use your chosen name. Don’t confuse it with "llama".

Example response with our new Ollama model based on Llama3.1

Isn’t that incredible? 🥲

You can also use the REST API:

CONSOLE

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "steve",
  "prompt": "Salut comment que ça va ?",
  "stream": false
}'

Conclusion

The possibilities are huge! Have fun with Ollama! This mini-tutorial was a pleasure to create. If you enjoyed it, feel free to share it on your networks and tag Believemy or me, so we can comment on your share!

In the meantime, if you want to discuss LLM, come visit our private Discord channel.

Peace! 😉

Steve Jobs. (🙃)

Categories: Development, Entrepreneurship, Startup

Create a customized LLM with Ollama in 2 minutes

Generative AI: A Strategic Concern

What is Ollama?

Installing Ollama

If you are on Mac or Windows

If you are on Linux

Launching an Ollama Instance

Launching the Llama 3.1 model

Using the Llama 3.1 model

Using Llama 3.1 via a REST API

Starting the Ollama server

Using the REST API for Ollama

Customizing Llama 3.1 to Your Needs

Creating a Modelfile

Creating a new Ollama model

Using a customized Ollama model

Conclusion

Try for Free