A Standard & Complete CI/CD Pipeline for Most Python Projects
Streamline Python project CI/CD with a GitHub Actions workflow. Overcome Poetry and virtual environment challenges. Achieve linting, testing, dependency caching, and CodeCov integration with minimal setup. Boost productivity and maintain code quality effortlessly.
Have you ever spent ages tinkering with CI/CD tools rather than work on writing code for your Python project?
I sure did! There were times Poetry couldn’t install dependencies due to virtual environments. Or other times, the dependencies wouldn’t just cache for some reasons. On top of that, CI/CD tools are difficult to debug dude to obscure error messages.
Hence, I’m sharing this GitHub Actions workflow which I use with most of my Python projects. It works right out-of-the-box without any tinkering & sets you on the right path to publishing your project. The workflow is very minimal yet doesn’t compromise on some of the most major CI/CD principles required for maintaining optimal coding standards. Keeping it minimal also means, you’re free to build upon it for further changes & improvements.
That said, here’re what you get with this workflow, out-of-the-box without any changes:
- Linting & code formatting with
Pylint
,Black
&isort
on all PRs & pushes to the remote repository. - Running integrated test suites for catching any breaking changes before merging the PR.
- Caching dependencies for faster workflow execution times.
- Uploading coverage reports to CodeCov for following coverage reports.
So, as you can see, the workflow doesn’t do much but ensure the bare minimum CI/CD principles are taken care of. And, best of all, you can build upon it as you’ll soon see.
About the Workflow
Python’s package management scene isn’t praiseworthy (sources: [1] & [2]). And coupled with those packaging issues, due to virtualenv requirements, setting up CI/CD tools are quite complicated as well (on GitHub Actions at least). So, I scourged through the Internet to come up with the most optimal CI/CD setup for Python projects. While Poetry, out-of-the-box is a great CLI tool for local development, it doesn’t work well with CI/CD platforms. With Poetry, you can manage local virtualenvs as easily as publishing your project on PyPi right from your terminal!
But that’s manual labour. And as developers, we commit often & push to remote repositories on regular intervals. Repeated manual tasks are subject to mistakes thus increasing the chances of a bug or breaking changes creeping into the project. Hence, I set out with a goal to resolve this issue without spending too much time setting up CI/CD tools.
The goal was to make the setup as simple & minimal as possible, yet should qualify to meet the modern standards of CI/CD principles.
In other words, the setup should be able to perform linting and/or formatting tasks, run the test suites, generate coverage reports & upload the report to CodeCov. And those were the tasks, the setup should have at the minimum. Hence, the principles of minimalism were kept in mind.
I also assume most projects are hosted on GitHub repositories so the setup works ONLY with GitHub Actions. And in case you’re looking to use other CI/CD platforms like Travis CI/CircleCI, then you might want to look elsewhere.
That said, you can copy the code snippets shared below in an aptly named
<NAME-OF-THE-WORKFLOW>.yml
under the .github
directory of your project. For
example, I usually name the file like test_suite.yml
. GitHub can identify your
workflow files from there automatically. Once you push your commits to the
remote repository, the workflow should initiate then. You can access it at
https://github.com/<GITHUB-USERNAME>/<PROJECT-NAME>/actions?query=workflow%3A%22Test+Suite%22
.
That said, here’s the code snippet for the CI/CD pipeline. Feel free to copy+paste it. 😉
A Standard CI/CD Pipeline Suitable for Most Python Projects
Brief Overview of What the Workflow Does
If you’re impatient like me & would like to skim through the article, here’s what you should know:
- The workflow executes on PR & push events. As in when someone makes a PR, the
Test Suite
workflow will run. The same happens when you push you local commits to the remote repository. - The workflow consists of two jobs:
linter
&test
. The latter of which is dependent on the former. So iflinter
fails, execution oftest
will be skipped. linter
runs on an Ubuntu VM & installsPylint
,Black
&isort
for linting & formatting the code. They’re also cached for decreasing the execution times.test
runs on a MacOS, an Ubuntu & a Windows VM with Python versions -3.8
&3.9
respectively. Do note, these runs happen in parallel irrespective of each other’s execution state.- The
test
job will also cache & install the virtualenv stored under the.venv
directory. And then run the test suites with PyTest which generates acoverage.xml
report to be uploaded to CodeCov.
So, as you can see, even if the workflow is kept as minimal as possible, it still accomplishes a lot of tasks. In fact, most of these tasks are indispensable for maitaining the minimum quality standards for your projects.
Anyway, with a brief overview of what the workflow does, let’s take a deeper look into what each line of code was written for. The next section describes it in as much details as possible.
In-depth Explanation of the Workflow
Right at the top of the file is the name: Test Suite
key-value pair. It
describes the name of the workflow which GitHub shows in it’s web UI. The
succeeding line, on: [pull_request, push]
pair describes the events that
should trigger the workflow.
The jobs:
section describes different jobs that should run in parallel (not
neccessarily, more on it later). Being a minimalist, this workflow describes two
jobs: a linter
& a test
. The names of the jobs are kept self-descriptive
intentionally. As mentioned in the previous section, linter
performs linting
actions when some code is pushed to the repository or a PR is created. While the
test
job initiates the array of tests on the code pushed or in a PR.
That said, each job has to be assigned an operating system which is assigned
with the runs-on:
keyword. While these
jobs run in parallel,
they can be made dependent on another. Hence, they can also be stopped prior to
completion if a dependent job failed earlier for some reasons.
Now for the interesting part. The steps:
key describes what/which
workflow/commands to execute. So, the linter
job executes a
Git Checkout first, then sets up an
appropriate version of Python in the succeeding step.
The next couple of steps involves caching dependencies for decreased workflow execution time. The actions/cache Action loads the dependencies if they’ve been cached earlier. It also identifies the correct cache with a signed key.
If the dependencies aren’t loaded from the cache, then pip
installs Black
,
Pylint
& isort
for linting purposes.
The final step for the linter
job is to execute the aforementioned linting &
formatting tools. Pylint, Black & isort
has sensible defaults, hence they’re
passed without any additional arguments.
And finally coming to the test
job. This job mirrors the previous linter
job
to an extent as you’ll see soon enough.
Right off the bat, using the needs:
key, the job is said to be dependent on
the completion of the linter
job. Thus, test
willn’t execute in parallel but
will also not execute if linter
fails. The fast fail option is enabled with
the fail-fast: true
pair.
In addition to the above strategy, this job is set to run on multiple OS
platforms with multiple versions of Python. This is set with the matrix:
key
which has os
& python-version
has its values. The os
& python-version
values accept an array each, of the OS & the Python versions respectively.
The next line sets the default shell for the virtual environment the workflow should run.
And as mentioned earlier each workflow has to assigned an OS to run on with the
runs-on
keyword. The runs-on
key for the test
job accepts a variable which
will iterate through each of the values set under matrix.os
. Thus, allowing
the workflow to run multiple instances of the ensuing steps for different OSes &
Python versions!
The following next two steps is pretty similar to how linter
started it’s
execution process but with a caveat. Based on the values set up under
matrix.python-version
, each OS instance will have one Python instance as well.
Now, instead of installing pip
as was the case in linter
, the workflow
installs Poetry using the
[snok/install-poetry](https://github.com/snok/install-poetry)
Action. It
configures Poetry to setup virtualenvs inside the project directory which can
then easily cached in the next step.
The Cache action caches the whole virtualenv instead of the dependencies. Hence,
Poetry installs the dependencies only if the cached .venv
wasn’t restored.
Following that, the .venv
is activated & pytest
then runs the test suite.
The arguments passed to pytest
ensures maximum verbosity for debugging &
reporting the output in a .xml
file format in the root directory. The
generated report then uploads the file to CodeCov
using the [codecov/codecov-action](https://github.com/codecov/codecov-action)
.
The CodeCov Action accepts an API token that you’ll have to copy & pass in as a
Secret
Environment Variable. The CodeCov token can be found at
https://codecov.io/gh/<GITHUB-USERNAME>/<PROJECT-NAME>
(for projects hosted on
GitHub). And finally, at the end, the CodeCov Actions is set to fail if it
errors out.
The workflow at it’s full glory isn’t as minimal as it sounds. It’s complexity comes with the fact that production-grade software should be thoroughly tested & formatted by following standards, if your project is open-source. Even then, there’s still a lot of room for further improvements & changes. And the next section looks into how you further build upon this workflow.
Room for Further Improvements
As mentioned countless other times, the pipeline is kept minimalist with an intention: Keep room for further changes and/or improvements.
Tthere’re a ton more changes/improvements that can be made as per one’s requirements. Some such improvements that I can think of over my head are:
- Enable a
release
event wherein the package is tested, formatted, linted, built & then uploaded to PyPi with Poetry. - Considering scalability, the linters & code formatters can be run in parallel instead of the sequential runs.
- Tag & update a
CHANGELOG.md
file uponrelease
.
And many more. The possibilities are endless & only limited by the project & individual maintainer’s requirements.
But all said & done, the code shared here should suffice for most open-sourced Python projects on GitHub.
And if you feel I missed something out then reach out to me on Twitter, Email and/or Ask Me Anything.
SUBSCRIBE to my newsletter & get personalized content or updates on articles delivered straight to your inbox!