Persistent Storage in Kubernetes

Kubernetes is an open source container orchestration framework originally developed by Google and now run by the Cloud Native Computing Foundation (CNCF). In this post I’ll briefly explain how persistent storage works in Kubernetes.

Kubernetes logo

Let’s Begin

Stateless! Everything needs to be stateless. Stateless. Stateless. Stateless. Ok, that should be enough to rank me on Google. Containers are designed for scaling, they can be teared down, deployed here or there without extra configuration, they just work in their little isolated area. However, there aren’t too many useful applications that are completely stateless. Maybe a mini game that doesn’t require you to save? Most useful applications require state, such as “what level is this user up to?” or “who is this person friends with?”

Persistence in Docker

Before moving towards persistent storage in Kubernetes, let’s start at the smallest of building blocks, a single Docker container.

In order to persist data from a Docker container you must “mount an external volume”. What does this even mean?

What is a Volume?

A volume is a single accessible storage area with a single filesystem.

Wikipedia

“With a single file system” suggests volumes exist at the logical operating system level. This is compared to partitions, which exist at the hardware level.

Normally there is at least one volume mounted to a partition, such that it can be used by an operating system. That is just for clarification but isn’t too relevant to the immediate topic of this post.

What is Mounting?

Mounting is a process by which the operating system makes files and directories on a storage device (such as hard drive, CD-ROM, or network share) available for users to access via the computer’s file system.

Wikipedia

Thus, a volume requires a filesystem, such that the operating can traverse the volume’s filesystem as if it were a part of its own file system.

To mount a volume, you set a “mount point”, normally a folder on the host filesystem. The contents of the volume can then be explored via that folder, after being registered to the virtual file system (VFS) of the operating system.

Back to Docker…

So, now we have a better idea of the underlying workings of volumes and mounting. To mount a volume to a docker container looks something like this:

docker run -d \
  --name devtest \
  -v myvol2:/app \
  nginx:latest

Here we are mounting a local docker volume (created with docker create volume myvol2) to the mount point of /app inside the container. i.e. to access the contents of the external volume, you can use /app when inside the container.

How does this work for Kubernetes?

Because Kubernetes is designed for high scalability and automation, there is some extra complexity and implementation detail. However, the core concept of taking an external volume and then mounting it to a container, remains the same.

Creating the Storage

There are two ways of creating persistent volumes (PV) in Kubernetes.

  1. Static provisioning. A cluster admin will create volumes of required sizes and types for particular application use cases.
  2. Dynamic provisioning. A developer creates a persistent volume claim (PVC), defining a size and a storage class (defined by the cluster admin e.g. cheap, fast). Upon creating of the PVC, a linked PV is created of the requested size and type.

Note that a persistent volume is not the physical storage itself. A PV is still just a Kubernetes extraction; however, it contains all the details for accessing the physical storage e.g. IP address, credentials, whatever is necessary for that particular storage type.

Mounting the Storage

A pod is the smallest functional application unit in Kubernetes. A pod is a collection of containers (but normally just one container). Volumes are mounted to the container within the pod, rather than to the pod itself. Like so:

apiVersion: v1
kind: Pod
metadata:
  name: test-ebs
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /test-ebs
      name: test-volume
  volumes:
  - name: test-volume
    # This AWS EBS volume must already exist.
    awsElasticBlockStore:
      volumeID: <volume-id>
      fsType: ext4

(taken from: https://kubernetes.io/docs/concepts/storage/volumes/)

Note that the volume is referenced and named (volumes section) in the context of the pod but it is mounted (volumeMounts section) to the container.

This is an in-line volume. Using volumes in this way makes the pod lose portability, since not every Kubernetes cluster has AWS Elastic Block Store with the particular volume ID. It is best to use PVCs as volumes instead for flexibility.

What about Scaling?

Persistent storage in Kubernetes with a StatefulSet
Persistent Storage in Kubernetes with a StatefulSet

You shouldn’t use pods for stateful applications. They are designed to be ephemeral. But Kubernetes provides a solution for these stateful applications.

In a Deployment all pods (replicas) share the same PVC. This means they share the same PV, meaning the share the same physical storage. This is a simpler setup but can result in pods overwriting each other’s data. For Stateful applications, we should really use StatefulSets.

A StatefulSet means that each pod in the set gets its own PVC from a template and each pod has a stable identity e.g. mysql-slave-0, mysql-slave-1 etc. which makes it easier to reason about master-slave replication, leader election etc. A StatefulSet does many other things outside the scope of this post that assist in the creation of containerised, stateful applications.

How to increase volume sizes?

You can resize volumes, but only increasing the. You cannot shrink the volume size. Since v1.15, resizing an in-use PVC has been in beta. You just change the size of the PVC used in the Kubernetes definition file and then apply it.

That’s a basic overview of Persistent storage in Kubernetes!

Some Further Reading

Container Storage Interface (CSI)