Alex Alex

Fedora CoreOS.

Abstract

This paper is an analysis of Fedora CoreOS, an operating system designed for container-based applications within the cloud native ecosystem. The objective of this paper is to analyze the architecture, functionality, and operation of Fedora CoreOS. This analysis will begin with an explanation of Linux containers as defined by the Open Container Initiative (OCI). Subsequently, we will take an in-depth look at Fedora CoreOS, exploring its architecture and key components.

Introduction

Motivation

Over the last decade, container-based packaging has emerged as the new standard for web-based applications [1]. To foster this development, the Cloud Native Computing Foundation (CNCF) has taken the lead in providing governance and guidance for containers and other related technologies [2].

Containers have become the most wide spread method for deploying applications such as microservices, enabling developers to quickly and consistently deploy applications in an environment that can run independently of the underlying infrastructure.

The most common form of containers are Linux containers, which, as the name implies, must run on Linux-based operating systems. Due to the growing importance of containerization, non-standard Linux distributions have been developed that are specifically optimized to deploy containers and meet the needs of the cloud native ecosystem.

Fedora CoreOS is one such operating system with many unique features that make it an attractive choice for use cases in the cloud native ecosystem. The primary objective of this research is to analyze the architecture and functionality of OCI-compliant Linux containers and Fedora CoreOS. Through this analysis, I aim to provide a comprehensive understanding of these technologies and their use cases.

Objectives

The objectives of this paper are as follows:

  • To provide a comprehensive explanation of the container technology according to the OCI specifications.

  • To provide a detailed overview of Fedora CoreOS, including an explanation of its architecture and core components.

  • To show and contextualize differences to conventional operating systems.

  • To show a use case of CoreOS.

Overview of the current state of technology and relevant work

The cloud native ecosystem is an area that has grown significantly over the past decade, with many tools and processes being open sourced. One of the most important is Docker’s Linux container technology, which is now governed by the Open Container Initiative (OCI) under The Linux Foundation.

  • The Linux Foundation, “Open Container Initiative Runtime Specification” [3]

  • The Linux Foundation, “Open Container Initiative Image Format Specification” [4]

To take full advantage of containerization, operating systems have been designed to be container-centric. These operating systems are highly automated and specialized for the workloads of the cloud-native ecosystem, and they differ from conventional operating systems in several key ways.

  • S. Böhm and G. Wirtz, “Immutable Operating Systems: A Survey,” ZEUS, vol. 2023, p. 52. [5]

Fundamentals

Definition of container technology and its advantages

Often incorrectly referred to as a form of virtual machine like virtualization, container technology provides a way to deploy applications on the system in isolated environments. Containers are designed to provide a consistent runtime environment that can run on multiple platforms, making them a popular solution for building, packaging and deploying applications.

The key benefit of container technology is that it enables developers to create applications that are portable, scalable and run on most infrastructure. Containers are lightweight and require fewer resources than virtual machines, making them ideal for deploying microservices and other cloud native applications.

In addition, containers can help increase development efficiency by allowing developers to build, test and deploy applications quickly and efficiently

Introduction to the Open Container Initiative

The Open Container Initiative (OCI) is a lightweight, open governance structure, formed under the umbrella of the Linux Foundation, for the express purpose of creating open industry standards around container formats and runtimes. The OCI was launched in 2015 by Docker, CoreOS and other leaders in the container industry [6].

The main goal of the OCI is to provide a standard platform for running containerized applications that is vendor-neutral, portable, and interoperable. This will allow developers and organizations to build and manage container-based applications with a high degree of flexibility and freedom. The OCI has created three main specifications: the Runtime specification 2, which defines how to run a container image, the Image Specification, which defines the format of a container image 3, and the Distribution specification, which defines the format and transport of container images between registries and clients [7].

These specifications are used by a wide range of container platforms and tools, including Docker, Kubernetes1, and CRI-O2. [8]

Explanation of OCI Linux containers and how they work

As described in 2, a OCI Linux container (container) is an environment for executing processes with configurable isolation and resource constraints. To isolate processes, container runtimes use features native to Linux-based operating systems.

Key Components

Containers rely on a number of different Linux features to isolate and manage the system resources required for containers. These components include namespaces, cgroups, the container runtime, and the container image format. In this section, we will take a closer look at these key components and explore how they work together to create a secure and efficient environment for containerized applications.

Namespaces

Namespaces provide a mechanism for creating an isolated environment for a container and its processes. Linux provides a variety of namespaces for different purposes, six of which are important for containers.

PID

First, it is important to understand what a PID is. When a process is created, it is assigned a numeric identifier called a process ID. The PID is a unique identifier and can be used to distinguish between processes that have the same human-readable name. On UNIX systems, active processes are tracked in a special filesystem called procfs, which most processes expect to be mounted as \proc. Inside \proc is a folder for each active process, identified by its PID, containing various files with information about the process. The PID namespace allows isolation of the PID number space, which means that processes in different PID namespaces can have the same PID. Since \proc is just a filesystem, having multiple filesystems in parallel can lead to processes having the same PID, which is why isolation is so important. Another important thing to note is that the first process, PID 1, is treated differently than other PIDs and is considered the init process. When the init process is killed, all other processes in the PID namespace are also killed by the operating system with a SIGKILL signal. [9]

net

The net namespace isolates and creates virtual networks that are exposed through network interfaces, allowing processes in the namespace to have full control of their network stack without having to interact with the host’s network stack. [10]

uts

This namespace has changed so much over time that its name Unix Timesharing System is now misleading. Today, the UTS namespace controls the hostname and Network Information System (NIS) domains. For containers, its most important property is that the hostname is isolated, and any changes to the hostname within the namespace do not propagate outside the namespace. [11]

User

This namespace is used to isolate user permissions within the container. As of the Linux Kernel 5.1, Linux now has what are called Capabilities; these capabilities provide more fine-grained control over what a user can do. Each user on a Linux system can create a user namespace. Namespaces are organized in hierarchies, so the superuser can see any namespace, but within the namespace, users can only interact with themselves or namespaces created within the current namespace. So we can isolate permissions to the namespace, which allows us to create namespaces that have root privileges inside the container, but not outside. [12]

mnt

The mount namespace behaves similarly to the chroot command at first glance, but where the chroot command is not suitable for secure isolation, the mount namespace is. When you first create a mount namespace, it appears as if nothing has changed, because the mount points of the parent namespace are copied; this also means that, unlike the other namespaces, changes to files and directories inside the new namespace are propagated outside the namespace. To create secure isolation, you must create a new mount point. [13]

IPC

Interprocess communication is a form of communication is being used in high performance applications (e.g. trading software) and is being realized by having a shared memory space with other processes. While enabling high performance it sacrifices security, which can be mitigated by isolating processes and memory with this namespace. [14]

Each namespace creates a unique instance of that system resource. This isolation allows the container to have its own view of system resources, separate from the host operating system. This allows containerized applications to run without interfering with other applications or the host operating system. Namespaces are implemented by the Linux kernel and provide a lightweight, efficient method of isolating system resources. [15]

Control Groups

Control Groups (cgroup) provide a way to limit and manage system resources such as CPU, memory and I/O bandwidth. Cgroups can be used to define and manage the resources allocated to a container, ensuring that a container does not consume more resources than it needs. This avoids one container taking up too much system resources and slowing down or even crashing the host system. Cgroups are implemented as a kernel feature that allows fine-grained control of system resources. [16]

Together, namespaces and cgroups provide a powerful mechanism for creating and managing containers. By isolating and controlling system resources, containers can run securely and efficiently, without interfering with other applications or the host operating system. [17]

Container image

Container images are a fundamental part of modern container-technology and a major contributor to the success of the Docker3-container. Container images enable reproducible and consistent environments for applications. To understand the inner workings of container images, it is important to examine the combination of union file systems and the Open Container Initiative (OCI) image format specification.

OCI Image Format

The OCI Image Format Specification provides a standardized representation of container images. It defines the structure and layout of image artifacts, enabling interoperability and portability across container runtimes. The specification outlines key components such as the image manifest, image configuration, and layers that together define the contents and properties of a container image.

Image Layers

Container images leverage a layered file system approach to optimize disk space utilization and facilitate efficient image management. Union File Systems, including variants like AUFS, btrfs, vfs, and devicemapper, play a pivotal role in this process. These file systems allow the overlaying of files and directories from separate layers, resulting in a unified and coherent file system for the container. [18]

Each layer within a container image represents a set of file system changes. These layers can be shared among multiple images, reducing data duplication and overall image size. UnionFS allows containers to access files and directories from different layers as if they were part of a single file system, ensuring seamless application execution.

When a container is instantiated from an image, a writable layer is created on top of the image’s layered file system. This layer serves as a sandboxed environment where any changes made at runtime, such as creating or modifying files, take place. This isolation ensures that changes made within the container will not affect the underlying image or other containers using the same image. The writable layer allows containers to be ephemeral, disposable, and easily reset to their initial state. 3

Image Manifest

The OCI image manifest serves as the metadata descriptor for a container image. It includes essential details such as the image’s name, version, and the list of layers that constitute the image. The manifest also references the image configuration, which contains information about the container runtime requirements, environment variables, and execution parameters. 3

Together, the image manifest and configuration provide a comprehensive description of the container image, enabling proper interpretation and execution by container runtimes.

Distribution

Container images are typically distributed and stored in image registries, which act as centralized repositories for sharing and retrieving images. These registries provide mechanisms for versioning, tagging, and organizing container images, making them easily accessible to container runtime environments. To ensure interoperability between different runtimes and registries, the Open Container Initiative defines the Distribution Specification. 6

To summarize, container images are structured and managed according to the OCI Image Format Specification. Using Union File Systems, container images efficiently overlay file systems from different layers, resulting in a unified and coherent file system for containers.

Container networking

Container Networking is another important part that is in some parts dependent on the net-namespace (2.3.1.1.2{reference-type=“ref” reference=“namespace:net”}) and enables containers to communicate with other containers and external networks. Container Networking is responsible for the creation of virtual network interfaces within a container, providing the container with its own unique network stack that is isolated from the host operating system and other containers. This isolation ensures that containers can communicate securely and efficiently without interfering with other containers or the host system. [19]

There are several networking models that can be used with containerization, including bridge networking, host networking, and overlay networking. Bridge networking involves creating a virtual network bridge that connects containers to a physical or virtual network interface on the host system [20]. Host networking allows containers to use the host system’s network stack directly, providing fast and efficient network communication. Overlay networking creates a virtual network that spans multiple hosts, allowing containers to communicate across different hosts as if they were on the same network [21].

Comparison of Linux container and Virtual Machines

Virtual machines and OCI Linux containers are two different technologies for running isolated environments on a host operating system. The primary difference between them is that virtual machines emulate an entire operating system, including the kernel, while containers share the kernel of the host operating system.

Because virtual machines (VMs) emulate a complete operating system, most of the resources that are supposed to be available to them at any moment have to be allocated to the VM at all times and are not available to other processes on the host operating system, including other VMs.

On the other hand, containers are isolated at the kernel level [22] and don’t need to be allocated the maximum resources at all times, and can share all host resources with other processes, including other containers and it is still possible to set maximum resources.

Comparison of deployment types by the kubernetes authors[23]{#fig:k8s_container_vm_comp width="\textwidth" height="\textheight"}

Fedora CoreOS

Introduction to Fedora CoreOS

Fedora CoreOS (FCOS) is a minimal, container-centric operating system designed for running containerized workloads. It is built on top of the Fedora Linux4 distribution and uses a number of container-specific technologies to provide a lightweight, secure, and reliable platform for running containerized applications. FCOS is designed to be immutable, meaning that the operating system image cannot be modified after it is deployed, which helps ensure consistency and reliability across large scale deployments.

History of CoreOS

The company CoreOS was founded in 2013 to build cloud native open source software. One of these projects was Container Linux, the lightweight, container-centric operating system that is the predecessor to Fedora CoreOS.

Red Hat acquired CoreOS in 2018, integrating CoreOS’s Container Linux with Red Hat’s Atomic Host5 to form Red Hat’s Fedora CoreOS. [24]

Summary of the key components and how they work

This section provides an overview of the Fedora CoreOS’s components that are most relevant6 to the cloud native use cases [25]. The aspects supported by these components are also representative of immutable operating systems and container operating systems in general 4.

Container Runtime

FCOS is designed as a container-centric operating system and because of that comes bundeled with Docker7 and Podman8 by default. Other Container runtimes can be added afterwards and the selection of the default container runtimes is based on feedback by the FCOS open source developer community.

Container runtimes are intended as the primary function to deploy applications on FCOS.

Automated Provisioning

The initial deployment of FCOS machines is largely automated using Ignition[^9]. Ignition allows users to define a declarative configuration that specifies the desired state of the machine. The configuration file, called an Ignition configuration, is used to initialize the machine during the boot process.

Ignition configurations are transpiled with Butane[^10] from human-friendly YAML formatted files, providing a clear and concise way to specify the desired state of the machine. Ignition can be used to configure much of the target system, including networking, storage, system users, ssh keys, and more [26].

The key advantage of Ignition is that it is reproducible and consistent across all deployed machines, making it ideal for deploying large numbers of machines.

During the boot process, the Ignition configuration is retrieved from a user-specified location (such as a web server or USB drive) and used to initialize the machine. The configuration is applied atomically, meaning that the machine either applies the entire configuration successfully or not at all. This ensures that the machine is always in a consistent state and reduces the risk of configuration drift [27].

Immutable File System

The “Immutable File System" refers to a file system that cannot be modified or changed once it has been created. This feature ensures that the system remains secure, stable, and predictable. FCOS uses ostree for its file system implementation. 24

Because FCOS is still based on Linux and uses common Linux applications it is not possible for the whole file system to be immutable. The /etc and /var directories are writable and /home and /srv are symbolic links[^11] to directories in /var. This is done so that applications following Linux conventions can still work with the immutable file system.

Ostree is a Git-inspired version control system for file systems that allows you to manage and track changes to a file system over time. In the context of FCOS, ostree is used to manage the base operating system image [28].

When a new version of the operating system is released, ostree applies the update by creating a new immutable file system based on the updated version. The new file system is then swapped atomically with the old one. If something goes wrong during the update process, ostree can quickly roll back to the previous file system version, ensuring system stability and predictability. While it is possible to roll back to the last file system version during an update process, it is important to note that FCOS does not come with a formal rollback system.

Ostree also allows the creation of custom variations of the base operating system image, called "variants". Variants are created by branching from the base OS image and adding custom packages or configuration files. The resulting file system is then deployed to the system as a new variant, which can be updated and managed separately from the base image.

Automated Updates

FCOS is a designed to be a highly autonomous, secure and reliable container platform. To enable this FCOS has an automated update system, which helps ensure that the operating system stays up-to-date with the latest security patches and feature updates. FCOS uses two main components to manage updates: rpm-ostree and Zincati.

rpm-ostree

is a hybrid image/package system. This system combines the benefits of both libostree[^12] and Redhat Package Manager[^13] (RPM), using libostree as the base image format and accepting RPM on both the client and server sides. Rpm-ostree provides Fedora CoreOS with the ability to seamlessly manage and update packages while maintaining the integrity of the system image. [29]

Zincati

is an agent that provides continuous auto-updates, multiple update strategies, local maintenance windows, metrics, logging, and support for complex update graphs and cluster-wide reboot orchestration. It is also configurable via TOML dropins and overlaid directories. [30]

Rollbacks

Most immutable operating systems with automated updates also provide the ability to roll back updates 4. This is not the case with FCOS; instead, it is recommended that when running large deployments of FCOS systems, a subset of those systems should run test or next update streams to detect problems in advance [31].

Atomic Updates

Using the immutable nature of the file system, updates are designed to be atomic, meaning they either succeed and are applied, or they fail and no changes are committed to the tree.

Streams

FCOS receives its updates from the official Fedora CoreOS update streams. Update streams are different channels or release tracks in which updates are delivered for the operating system.

Fedora provides three official streams that provide different levels of stability and reliability: stable, testing and next.

By following a stream, a system is automatically updated when a new release is rolled out on that stream. While all streams of FCOS are automatically tested, users are strongly encouraged to dedicate a percentage of their FCOS deployment to running the test and next streams. This ensures that any potentially breaking changes are caught early enough to reduce regressions in stable deployments. 3024

Differences to other operating systems

The most important difference to other operating system is the immutable file system, most other differences are derived from this. This difference affects how a machine with FCOS is operated; except for applications needed on a system level everything is executed in a container. In addition, the initial provisioning system Ignition is used to enforce a practise where a machine is not supposed to be changed after initial provisioning. The initial state of the machine defined with ignition should represent something as close as possible to the final state of the machine. Larger changes to this state should be done in the ignition file and applied to the machine with a reprovisioning of the machine.

Cloud native context

The practise of not just rebooting but reprovisioning machines for the sole purpose of changing the desired state of the machine may seem uncommon to some but fits perfectly into the operation of clusters for container orchestration systems like Kubernetes, which is a primary use case for FCOS as stated in [32]. This is possible because in orchestration systems like Kubernetes the single machine has a very low significance and the execution and management of applications on the machine is orchestrated from a central process that can controls thousands of machines. This has the consequence that the desired state a machine should be in will be in most cases just the installation and execution of the Kubernetes agent that connects to a cluster.

Case study

Single server node for running containerized applications

Based on the second primary use case of Fedora CoreOS as defined in the official product requirements document 31, this case study will cover the process of initially provisioning a machine with Fedora CoreOS and a brief summary of the long-term operation of the deployed machine.

The goal is to have a provisioned Fedora CoreOS machine that uses the stable update stream and runs nginx as a containerized application.

Preparations

The machine that will be provisioned is a CX11 server by Hetzner [33]. This is the smallest available server with a single vCPU, 2GB RAM and 20GB disk space. The ignition file will be copied on the machine so no web server is needed to supply the ignition file.

Butane

Before we can provision our machine we need an Ignition file for our initial configuration. The Ignition file is transpiled with butane from the following YAML file:

version: 1.4.0 
passwd: 
  users: 
  - name: core 
  groups: 
  - wheel 
ssh_authorized_keys: 
  - [REDACTED] 
systemd: 
  units: 
  - name: nginx.service 
    enabled: true 
    contents: | 
        [Unit] Description=Nginx
        Container Documentation=https://nginx.org/ Requires=podman.service
        After=podman.service

        [Service] ExecStartPre=-/usr/bin/podman stop nginx
        ExecStartPre=-/usr/bin/podman rm nginx ExecStart=/usr/bin/podman run
        --name nginx -p 80:80 v /etc/nginx/conf.d/:/etc/nginx/conf.d/:Z
        docker.io/library/nginx ExecStop=/usr/bin/podman stop nginx

        [Install] WantedBy=multi-user.target 
storage: 
  directories: 
  - path: /etc/nginx/conf.d 
    mode: 0755
passwd

Users can be defined within the passwd block. It is possible to directly add users to groups and define passwords and or ssh keys for each user.

systemd

Butane enables to definition of systemd services. The nginx service uses podman to run nginx inside a container and uses a volume mount to read the configurations inside of the directory defined inside the storage block.

storage

Describes the desired state of the system’s storage devices. To create a directory for our nginx configurations we define the path and the directory permissions under the directories key.

Provisioning

After transpiling the butane file into an ignition configuration file, the actual provisioning of the machine can begin. The ignition file is copied to the target machine and mounted at /mnt/ignition. At the next boot the ignition configuration is read once and Fedora CoreOS attempts to reach the desired state. If the desired state could not be reached, e.g. because of an error during provisioning, the whole process is being aborted and the machine has to be re provisioned.

Operations

Fedora CoreOS is designed for unattended operations and best practise is to setup everything in the ignition file. If there is any need for additonal configuration on the machine it is possible to connect with the user defined in the ignition file. It is recommended to configure the services in a way that they handle regular restarts because system updates happen automatically and are followed by a restart.

Conclusion

In conclusion, this paper has delved into Fedora CoreOS (FCOS) and provided a comprehensive summary of it’s features and key components. Throughout the paper, we have explored the basics of Linux container technology, examined the differences from other operating systems, and demonstrated provisioning and operation in a short use case example.

By starting with the fundamentals of Linux container technology, we have gained a deeper understanding of its complexity and clarified misconceptions about how Linux containers compare to virtual machines. The research has shed new light on FCOS and contributed an updated summary of FCOS to the existing body of knowledge.

Although this term paper has made progress in addressing the research question, it is important to acknowledge its limitations. For example, most of the information comes from vendor documentation with little to no independent sources. FCOS is an active open source project and could affect the longevity of this paper if future changes to design goals or features render parts of this paper obsolete. These limitations provide potential avenues for future research to build upon this study and fill existing gaps. This paper mostly covered fundamentals and explained parts of FCOS itself so the actual answer to the research question is superficial and could also be expanded in future research.

In summary, this term paper has provided a comprehensive analysis of FCOS. It has deepened our understanding of its fundamentals and key components, and provided valuable insights into its operation. Moving forward, it is important for researchers and practitioners to continue to explore the cloud-native landscape and its implications to further develop and contribute to the advancement of container-centric operating systems.

Bibliography


  1. Open-source container orchestration system ↩︎

  2. Lightweight container runtime for kubernetes ↩︎

  3. Docker Inc. https://www.docker.com/ ↩︎

  4. Fedora Project fedoraproject.org ↩︎

  5. Atomic Host was a container-centric operating system by Red Hat: https://projectatomic.io ↩︎

  6. This is based on the experience of a junior software engineer for cloud applications and an explanation for this selection is not part of this paper ↩︎

  7. docker.com/products/container-runtime/ ↩︎

  8. [podman.io](ht ↩︎