How to cache dependencies in GitLab

GitLab CI job

Hi everybody!
Today I want to tell you about my experience of using GitLab CI dependency caching.

Why is it needed

I have a small pet project where I usually experiment with new technologies and approaches. The repository of this project is stored in GitLab. There I configured CI/CD tasks for testing and deploying a project.

CI-task with testing usually completed in 2 minutes. But every time I thought about what actions are being performed in this time. An example is installing Python dependencies.

On one hand, this guarantees reproducible builds (let’s say hello to leftpad and mimemagick 😄).

But on the other hand, these actions are performed every time when I push changes to the repository. And that’s just a pet project.

Let’s try to enable caching 🤟

Here is an official GitLab documentation about CI caching with examples - https://docs.gitlab.com/ee/ci/caching

The project on which I tested CI-caching is written on Django and uses poetry for dependency and virtual environments management.

What .gitlab-ci.yml looked like before the changes

1
2
3
4
5
6
7
8
9
10
11
12
13
stages:
- tests
- deploy

tests:
stage: tests
image: python:3.7-slim
script:
- apt-get update -qy && apt-get install -y build-essential
- pip --no-cache-dir install poetry
- poetry config virtualenvs.create false && poetry install --no-root
- sed 's/#DATABASE_URL/DATABASE_URL/g' telega/.env.example > telega/.env
- coverage run --source='.' manage.py test && coverage report -m

Here We install Debian packages and then install poetry through pip and install project dependencies with poetry.

What .gitlab-ci.yml looks like after the changes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
stages:
- tests
- deploy

variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
key:
files:
- poetry.lock
- .gitlab-ci.yml
prefix: ${CI_JOB_NAME}
paths:
- .venv
- .cache/pip

tests:
stage: tests
image: python:3.7-slim
script:
- apt-get update -qy && apt-get install -y build-essential
- pip install poetry
- poetry config virtualenvs.in-project true
- poetry install --no-root
- sed 's/#DATABASE_URL/DATABASE_URL/g' telega/.env.example > telega/.env
- poetry run coverage run manage.py test && poetry run coverage report -m

I added some settings to tell pip and poetry where packages should be stored. Then I added ‘cache’ section and set poetry.lock and .gitlab-ci.yml files as key for cache.

This means that if at least one of the files is changed then packages should be installed from PyPI, but in another case will be used cached directories with already installed packages.

Results

CI-task running time is decreased from 2 minutes to 1 minute. Of course, the checking and unpacking cache operation was added, but it’s still faster than installing dependencies from PyPI.

On the screenshot with the task logs, we can see how pip use the cache.

GitLab CI job

And here We can see that the poetry did not install anything new.

GitLab CI job

Cache dependencies in GitLab CI/CD are a powerful tool for faster-running tasks and the economy of resources.