At Amplify, Python is one of the languages we use to build AI-driven data pipelines. Python is the natural choice for data processing and interfacing with LLMs to make decisions utilizing frameworks like LangChain. However, the tooling around Python for things like dependency management and deployment frequently leaves developers in frustration. If Python is the natural choice for large parts of our data pipelines, how can we make working with it, well, better? Let’s dive into it.
We’ve recently set out to redesign and improve our main data pipeline and AI agent architecture, so we have a chance to implement a more modern, standard Python stack for this endeavor. A typical Python application usually consists of a requirements file to use with pip
and the large, official Python Docker image (a whopping 1.02GB
!), but we’ve found that this often causes many developer and deployment headaches—but an exhaustive list of these issues is outside of the scope of this blog. However, before we discuss how we attempted to improve upon the usual Python stack, we should at least discuss the goals that we had at Amplify for our latest projects.
Dockerfile
should be utilized in local development and production.The first goal of our new Python stack was to find a solution to avoid the dependency resolution issues that can arise during builds with pip
. Poetry, a packaging tool for Python, is a project that has been on my radar for some time and around for even longer. We finally got a chance to integrate Poetry into our Python stack at Amplify and we were not disappointed! Poetry provides a better way to manage Python dependencies with lock files for synchronized dependencies and optional dependency groups. As you’ll see later, optional dependency groups help keep Docker image sizes down for production while allowing us to enable the python debugger during local development.
The easiest way to install Poetry is by using pipx
, so go ahead and do that. Once Poetry is installed, we can initialize our example project by running poetry init:
Let's add uvicorn
and fastapi
to our demo project. You can do this by running poetry add uvicorn
and poetry add fastapi
.
Lastly, we set package-mode = false
in Poetry's config file as we will only be using it for dependency management, and this will result in a smaller overall image size later. Your pyproject.toml
and poetry.lock
files should be similar to the following:
# pyproject.toml
[tool.poetry]
name = "demo"
version = "0.1.0"
description = ""
authors = ["Michael Fox <mfox@amplify.security>"]
readme = "README.md"
package-mode = false
[tool.poetry.dependencies]
python = "^3.12"
uvicorn = "^0.29.0"
fastapi = "^0.111.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
At Amplify, we use FastAPI ASGI apps with Uvicorn to build event workers utilizing our open source messaging adapter, Carrier. For the purposes of this demo, we’re going to expose a simple ping endpoint to verify our python environment and, later, Docker image. Create a file, server.py
, with a FastAPI endpoint like below. We will also add functionality to run uvicorn
programmatically, which will be necessary later.
# server.py
import os
import uvicorn
from fastapi import FastAPI
UVICORN_RELOAD = os.getenv("UVICORN_RELOAD", "False").lower() in ("true", "1")
UVICORN_HOST = os.getenv("UVICORN_HOST", "0.0.0.0")
UVICORN_PORT = int(os.getenv("UVICORN_PORT", "8000"))
app = FastAPI()
@app.get("/ping")
def ping():
return ({"ping": "pong"})
if __name__ == "__main__":
uvicorn.run("server:app", host=UVICORN_HOST, port=UVICORN_PORT, reload=UVICORN_RELOAD)
Running server.py
with python server.py
should start Uvicorn, load the FastAPI ASGI app, and expose a /ping
endpoint on port 8000
. One thing to note here is that we specify 0.0.0.0
as the default for UVICORN_HOST
because we will be running in a Docker container for local testing later, but you can configure this however works best in your environment. For example, if we were deploying this application as a Kubernetes pod and only pod-local networking was necessary, we could configure UVICORN_HOST
as 127.0.0.1
for security concerns. The UVICORN_RELOAD
environment variable, used here to enable dynamic reloads on code updates, will be passed to uvicorn
later in our Docker Compose stack.
A debugger should be a non-negotiable feature of any functional development environment. When it comes to debuggers, don’t take my word for it: John Carmack himself is one of the most outspoken proponents of debuggers. The debugpy
package offers excellent python debugger support for VS Code (and hopefully PyCharm soon) so we will be installing it in our project within a Poetry dependency group using poetry add debugpy --group debug
:
Once we’ve added debugpy
, we want to make the debug
dependency group optional so that it will not be installed in our production Docker image. Create a [tool.poetry.group.debug]
table in pyproject.toml
and include optional = true
. Once done, your pyproject.toml
should look similar to the following:
# pyproject.toml
[tool.poetry]
name = "demo"
version = "0.1.0"
description = ""
authors = ["Michael Fox <mfox@amplify.security>"]
readme = "README.md"
package-mode = false
[tool.poetry.dependencies]
python = "^3.12"
uvicorn = "^0.29.0"
fastapi = "^0.111.0"
[tool.poetry.group.debug]
optional = true
[tool.poetry.group.debug.dependencies]
debugpy = "^1.8.1"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
As mentioned previously, one of our goals was to use the same Dockerfile
for both production and local development. However, we also want to be able to run the Python debugger when developing locally, but given that debugpy
is over 20MB
, this unfortunately conflicts with our goal of keeping production images small! We will use Docker build arguments in conjunction with Poetry's optional dependency groups to satisfy both of these goals with the same Dockerfile
. Let’s take a look:
# Dockerfile
FROM python:3.12-alpine as builder
ARG INSTALL_DEBUGPY
# Set environment variables
ENV POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_CACHE_DIR=/tmp/.poetry
# Install poetry
RUN pip install poetry==1.8.3
# Add demo user
RUN adduser -D demo && \
mkdir -p /home/demo/app && \
chown demo:demo /home/demo/app
WORKDIR /home/demo/app
USER demo
COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN if [[ -z "${INSTALL_DEBUGPY}" ]]; then \
poetry install --no-root; \
else \
poetry install --no-root --with debug; \
fi
FROM python:3.12-alpine as runtime
# Expose fastapi port
EXPOSE 8000
# Add demo user
RUN adduser -D demo && \
mkdir -p /home/demo/app && \
chown demo:demo /home/demo/app
WORKDIR /home/demo/app
USER demo
# Set environment variables
ENV VIRTUAL_ENV=.venv \
PATH=/home/demo/app/.venv/bin:$PATH
# Copy virtual environment
COPY --from=builder /home/demo/app/${VIRTUAL_ENV} ${VIRTUAL_ENV}
# Copy server.py
COPY server.py server.py
# Set entrypoint
ENTRYPOINT ["python", "server.py"]
In this Dockerfile
, we use the builder pattern to install Poetry and all dependencies into a virtual environment. We install the optional dependency group debug
only if the INSTALL_DEBUGPY
Docker build argument is set. We then copy only the virtual environment from the build image into our runtime image to ensure the smallest possible final image. Finally, we set the Docker image to run our server.py
ASGI app. As an added bonus, we configure the Docker image to drop privileges and run as a non-root user, demo
. Without debugpy
installed, the final image size is only 111MB
, which is over 9X smaller than the full Python Docker image and still smaller than the base slim
Docker image!
alpine
vs slim
For the purposes of this blog (and our actual pipeline at Amplify) we use the alpine
Python Docker image tag as our base image. The alpine
Python image is only 57.1MB
compared to the full-fat 1.02GB
latest
image and the still slimmed down 130MB
slim
image. However, Alpine Linux is built against the musl
C library (as opposed to the default in most other Linux distributions, glibc
) which does not always play well with certain Python packages. For greenfield development, it doesn’t hurt to use alpine as a base until an irreplaceable dependency just won’t play nice. However, if you are migrating a legacy Python application with existing dependencies, you may have unintended consequences and using slim
might be a safer bet.
The final task in getting this Python stack ready for active development is running it locally with Docker Compose. Two things that will be useful for local testing are Uvicorn's hot reloading and exposing the debugger. For hot reloading, we will set the UVICORN_RELOAD
environment variable, which server.py
will pass to uvicorn
. For exposing the debugger, we will set the Docker Compose build context to include the INSTALL_DEBUGPY
build argument, expose port 5678
for the debugger, and run server.py
with the debugpy
module. This is what the docker-compose.yml
file will look like:
# docker-compose.yml
---
version: "3.3"
services:
demo:
build:
context: .
args:
INSTALL_DEBUGPY: "True"
ports:
- "8000:8000"
- "5678:5678"
volumes:
- ./server.py:/home/demo/app/server.py # this will allow hot reload when file changes
environment:
UVICORN_RELOAD: "True"
entrypoint:
- "python"
- "-m"
- "debugpy"
- "--listen"
- "0.0.0.0:5678"
- "server.py"
Bring up the Docker Compose stack with docker-compose up
. This command will build the image with debugging enabled and run the Docker Compose stack. You can then access the FastAPI endpoint at http://localhost:8000/ping
and can fiddle with changing the response in server.py
without restarting the Docker Compose stack. Uvicorn will automatically reload the ASGI application and you can continue to develop without restarts.
To attach the debugger to our container running with Docker Compose, create a VS Code launch configuration by creating a .vscode
directory in the root of your project and adding a launch.json
file to it. The file should contain a configuration for a remote attach Python debugger:
// .vscode/launch.json
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Remote Attach",
"type": "debugpy",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "."
}
],
}
]
}
That’s all there is to it! You can now add breakpoints to your code within VS Code and attach to the running container using the Run and Debug menu.
If you made it this far, we created a Python stack which:
111MB
with dependencies.debugpy
.This is the opinionated way we build modern Python applications, including our AI-driven data pipelines, at Amplify. Hopefully you found something interesting or useful to takeaway and incorporate into your own projects! As usual, feel free to drop by and follow us on LinkedIn or GitHub to keep in touch and hear about the latest developments at Amplify.