Kubernetes straightforward — Archtecture (part 1)[English]

Sebastiao Ferreira de Paula Neto
13 min readMay 16, 2023

--

Welcome to the series of articles on Kubernetes! In this series, we will explore what you need to know about this leading container orchestration platform. Let’s start with a brief historical introduction to Kubernetes and how it has evolved over the years to become one of the key tools for deploying and managing containerized applications.

Kubernetes was created by Google in 2014 and later, in 2015, the project was released as an open-source platform. Since then, Kubernetes has been widely adopted by companies across industries, from startups to large corporations, due to its ability to provide powerful features for scalability, resource management, monitoring, and more. This platform is considered a strategic choice for companies that want to harness the benefits of containers by unifying their deployments on a single platform.

This is the first in a series of articles where we will cover fundamental topics of Kubernetes, such as its architecture, essential concepts, monitoring, advanced services, and security. In this particular one, we will present the historical context of Kubernetes’ creation and how its architecture works.

1. Historical Context

At the beginning of software development, it was convenient to build applications as a single block of code, known as a monolithic system. This model made it difficult to make corrections and add new functionalities. With the increasing complexity of software construction, it became evident that a new approach was needed to handle the growing complexity of systems and deployment methods.

As a result, the architecture known as microservices emerged. This approach divides the application into small, independent services, each with its own responsibility. This scalable and easy-to-maintain architecture is becoming increasingly popular among companies, and Kubernetes is a platform that can help deliver it efficiently. Kubernetes allows for managing and orchestrating the deployment of containers on a large scale, ensuring that each microservice runs efficiently and securely in a distributed environment.

In this way, understanding the evolution of technology and architectural models is essential for us to understand how Kubernetes emerged as a solution for managing container environments on a large scale and how it has changed the way applications are developed and deployed today.

1.1 Monolithic systems

Monolithic systems adopt a software development approach where all functions and services are encompassed by a single codebase. This software development model was widely used throughout most of the last century, where programmers wrote code that was compiled and executed as a single program. This approach became popular because it was easy to understand and manage, and it allowed developers to work in large and complex teams.

However, as applications became more complex and required more computational resources, the monolithic model started to present limitations. Monolithic systems tend to be bulky, difficult to scale, and outdated since all changes and updates need to be made in a single codebase.

With the advancement of software development, standards emerged for disaggregating the system into blocks while still maintaining the final product as a single entity. Even with the disaggregation, small changes still had impacts on all layers. Therefore, the decomposition of each of these blocks into independent systems was proposed, giving rise to the approach of microservices development.

1.2 Microservices

Microservices divide the application into small, independent services that can be deployed, managed, and updated separately. Each service is responsible for a specific function and communicates with other services through APIs (application programming interfaces) or the same network.

This approach allows applications to be highly scalable and flexible, as services can be easily added, updated, or removed. Additionally, microservices enable developers to work in smaller and more agile teams, facilitating the development and maintenance of the software.

It is essential to understand the allocation of resources for each model within the framework. When seeking an environment accessible to all developers, where resources can be shared, the use of Virtual Machines (VMs) has become widely adopted. Additionally, with the growth of container usage, primarily driven by Docker, the possibility of isolating applications more efficiently has emerged. This aligns with development methodologies that aim to modularize software into independent units, known as microservices.

You may be wondering why not use VMs as isolated units. To answer this question, it is necessary to understand the differences between the two approaches..

1.3 Virtual Machines vs Containers

Virtual Machines (VMs) and Containers are two important technologies for server and application virtualization. Although they share some similarities, they have different approaches to creating isolated environments for running applications.

A VM is a virtual instance of a complete operating system, which can be created on top of a hypervisor on a physical server. Each VM is isolated from other VMs and the host operating system and has its own resources such as CPU, memory, and storage. Applications are installed within the VM, and the VM is managed as a separate entity.

On the other hand, containers share the host operating system’s kernel but are isolated from each other. Each container is an isolated instance of a specific application and its library and operating system requirements. They are managed as a single entity but can be scaled independently of each other.

Therefore, VMs offer greater isolation and security but are heavier and less scalable than containers. Containers offer greater flexibility and scalability but with less isolation and security compared to VMs. Each technology has its own use cases and should be chosen based on the specific requirements of the application environment.

The use of containers has proven to be a valuable tool for application virtualization, but the complexity of setting up and managing these isolated environments has hindered their widespread adoption. That’s where Docker comes into play, introducing a simplified interface for container virtualization, allowing developers and IT operations teams to efficiently and consistently create, deploy, and manage these environments. Let’s learn a little more about how this tool works.

1.4 Docker

To understand the internal structure of Docker and how it functions, let’s start with a high-level overview of Docker and then delve into the details of its architecture.

Your internal structure is based on a virtualization technology called “namespaces.” Namespaces allow Docker to isolate processes in containers, creating a secure and isolated environment to run applications. Additionally, Docker uses images to create containers, which act as templates containing all the necessary information to run an application in a container. Images can be downloaded from public or private repositories and are updated and maintained by software developers.

Looking at its structure, the first point of attention is the Docker Daemon, as it is the central component of the system. It is responsible for managing all aspects of Docker containers and images, including creating, running, and removing containers, managing storage and networking, ensuring security, and controlling access. The Docker Daemon runs in the background as a system process and can be controlled using the Docker CLI or Docker API.

The Docker CLI is the command-line interface that developers use to interact with Docker. It allows users to execute commands for creating, deploying, managing, and monitoring containers and images. The Docker CLI is a powerful and flexible tool that can be used to perform virtually all Docker-related tasks.

The Docker API is an application programming interface that enables other applications to communicate with the Docker Daemon. The Docker API allows developers to create applications that integrate with Docker and perform operations such as creating, starting, stopping, and removing containers and images. The Docker API is extremely powerful and flexible and is used by many applications and services to manage Docker containers.

Looking inside the Docker hood, we need to talk about its resource manager, cgroups, and its engine, containerd.

Cgroups manage system resources such as memory and CPU efficiently. Cgroups are a Linux kernel feature that allows processes to be limited in terms of system resources they can access. Docker uses cgroups to ensure that containers only have access to the resources they need and do not impact the performance of the host system.

Containerd is a component of Docker that is responsible for managing container creation, execution, and destruction. It is designed to be a lightweight and portable container platform that can run in any environment. Containerd is a fundamental component of Docker and is used by many other Docker-related tools and services.

Containerd consists of several elements, including runc, shim, and snapshotter.

  • Runc is responsible for starting and managing the process inside the container.
  • Shim is a small program that acts as an intermediary between Containerd and the container runtime.
  • Snapshotter is responsible for managing container file system snapshots, allowing containers to be quickly created and destroyed.

Now that we understand the architecture, let’s think about deploying our applications. When scaling containers on a large scale, simply deploying Docker is not enough. Therefore, it is essential to use a tool that monitors and orchestrates this set of applications. For our case, there is a Docker tool called Docker Swarm, and as an alternative, we have Kubernetes.

The Swarm is designed to manage clusters of Docker containers, distributing the workload and ensuring that containers run efficiently and scalably. Swarm allows developers to define the number of container replicas, and it manages scalability by creating or destroying replicas as needed. At this point, you may be wondering:

So, why is Kubernetes necessary in many cases if Docker has its own orchestrator, Swarm?

Well, Kubernetes offers many advanced features that Swarm lacks, such as a powerful and flexible object model for describing applications, advanced storage and networking management, support for canary deployments, automatic scalability, application updates without downtime, and much more. Kubernetes is a much more advanced container orchestrator than Swarm.

One of the main problems that Kubernetes solves compared to Docker is managing containers at a large scale. Another problem that Kubernetes addresses is the lack of portability. Docker is a runtime environment, which means it relies on a specific operating system and libraries. This makes the deployment process in different environments very challenging.

For these reasons, we see many applications running on Kubernetes today. Therefore, before understanding how it works, we need to understand the purpose of its creation.

1.5. Line age kubernetes

To truly understand the history of Kubernetes, we must go back to the early 2000s when Google was facing an increasingly challenging task of managing its massive clusters of servers. It was in this context that Google’s internal project, known as Borg, began development. Borg aimed to automate the deployment, scaling, and management of applications at scale, allowing Google engineers to focus on software development rather than worrying about the underlying infrastructure.

With the internal success of Borg and Docker gaining significant interest from the developer community and businesses seeking efficient container management solutions, Google recognized the demand and decided to launch Kubernetes in 2013. Kubernetes is an open-source platform based on Google’s experience with Borg, designed to accelerate the addition of features for commercial use.

As Kubernetes became established as a reliable and powerful solution, various companies and cloud providers started offering official support for Kubernetes in their services. This support further increased the widespread adoption of Kubernetes, making it a popular choice for both startups and large corporations. With that said, let’s take the first steps in understanding this incredible tool by exploring its internal workings. To do that, let’s get to know our engine before actually “driving the car”.

2 Cluster Architecture

The architecture of Kubernetes is based on the interaction between two fundamental elements: the Master Node and the Worker Nodes. These components work in harmony to ensure efficient management, scalability, and stability of container clusters.

The Master Node, as the brain of the system, is responsible for coordinating and managing all operations within the cluster. It acts as the control center, making intelligent decisions and distributing tasks to the worker nodes. At the core of the Master Node, we find crucial components such as the API Server, which serves as the main interface for external interactions, and etcd, a distributed, reliable, and consistent key-value store that maintains the cluster’s state.

On the other hand, the Worker Nodes are the nodes where workloads are executed and containers are deployed. These nodes provide the computational resources needed for pods to function properly. Among the key components of the Worker Nodes, we have the Kube Proxy, which ensures connectivity and load balancing between services, the Kubelet, which monitors and manages pods on each node, and the container runtime, which securely executes and manages containers.

The interaction between the Master Node and the Worker Nodes is crucial for the proper functioning of Kubernetes. While the Master Node coordinates and distributes tasks, ensuring system stability and efficiency, the Worker Nodes execute workloads and provide the necessary resources for container deployment. This harmonious collaboration allows Kubernetes to be a highly scalable, flexible, and reliable solution for companies looking to adopt container computing on a large scale.

One of the main components of the Master Node is the API Server. It acts as the primary interface for external interactions with the cluster. The API Server receives requests and forwards them to the relevant components of the system. It also authenticates and authorizes the requests, ensuring the security of the cluster. Additionally, the API Server is responsible for storing the definitions of Kubernetes objects such as pods, services, and deployments, allowing developers to manage and control their applications declaratively.

Another essential component of the Master Node is etcd. It is a distributed, highly reliable, and consistent key-value store used by Kubernetes to store the state of the cluster. Etcd is responsible for maintaining critical information such as cluster configuration data, node status, and details of running services. It ensures the consistency of this information and makes it accessible to all components of the system.

The Scheduler is another crucial component in the Master Node. It is responsible for assigning tasks or pods to available nodes within the cluster. The Scheduler analyzes resource requirements, load balancing policies, affinity or anti-affinity constraints, and other considerations to make intelligent decisions about where and how pods will be executed. The Scheduler ensures a balanced distribution of workloads and optimizes the utilization of available resources in the cluster.

In addition to these core components, there are also the kube-controller-manager and the cloud-controller-manager, which play crucial roles in cluster management.

The kube-controller-manager is responsible for running and supervising various controllers in Kubernetes. These controllers constantly monitor the state of the cluster and take appropriate actions to ensure that the cluster complies with the desired specifications. For example, the replicaSet controller ensures that the correct number of replicas of a particular pod are running, while the service controller maintains the correct network connectivity for services within the cluster.

The cloud-controller-manager is a component that connects to specific services provided by each cloud provider. It extends the functionalities of Kubernetes to enable integration with the infrastructure resources provided by cloud providers. The cloud-controller-manager handles tasks related to cloud resources, such as provisioning virtual machines, external load balancing, and management of cloud-based storage volumes.

Together, these components of the Kubernetes Master Node work in harmony to ensure efficient orchestration, resilience, and scalability of the cluster. They form the backbone of the system, allowing users to deploy, manage, and scale their applications reliably and consistently.

One of the essential components in a Worker Node is the Kube Proxy. It acts as a network proxy and an internal load balancer for the services running on the pods. The Kube Proxy forwards incoming requests to the appropriate pods, considering the service rules and load balancing policies defined in the cluster. It plays a crucial role in ensuring proper connectivity between different components and services within the Kubernetes cluster.

Another fundamental component is the Kubelet, responsible for supervising and managing each individual node in the cluster. The Kubelet runs on each Worker Node and receives instructions from the Master Node to start, stop, and monitor the running pods on the node. It ensures that the pods are in a healthy state by responding to scheduling changes, monitoring resource consumption, and taking necessary actions to maintain node integrity.

Additionally, the Kubelet is also responsible for interacting with the container runtime (control runtime), which is the component that executes and manages the containers on the Worker Nodes. The container runtime is responsible for managing container operations such as creation, initialization, pausing, restarting, and termination. It also handles isolation, security, and monitoring issues related to container execution.

Together, these Kubernetes Worker Node components work in sync to ensure the correct functioning of pods and efficient utilization of available resources. They play a crucial role in managing container infrastructure and executing workloads. The Kube Proxy ensures proper connectivity and load balancing between services, the Kubelet supervises and manages pods on each node, and the container runtime executes and manages containers securely.

3. Conclusion

Therefore, by combining Master Nodes and Worker Nodes, Kubernetes provides a complete platform for deploying, orchestrating, and managing container-based applications. This flexible and scalable architecture allows organizations to leverage the benefits of cloud computing while ensuring a reliable and efficient experience in the development and execution of their applications. The seamless interaction between the components of the Worker Node plays a crucial role in this process, enabling Kubernetes to be a powerful choice for large-scale deployments.

Next steps

Para compreender melhor a estrutura de desenvolvimento de aplicações no Kubernetes, é importante conhecer os diferentes tipos de controladores disponíveis no kube-controller-manager, bem como os tipos de serviços alocados no proxy e os tipos de volumes utilizados.

4. Referências

  1. HIGHTOWER, Kelsey; BURNS, Brendan; BEDA, Joe. Kubernetes: Up and Running. 2ª ed. [S.l.]: O’Reilly Media, 2019.
  2. Kubernetes Documentation. Disponível em: https://kubernetes.io/docs/home/. Acesso em: 14 mai. 2023.
  3. LUKSA, Marko. Kubernetes in Action. 2ª ed. [S.l.]: Manning Publications, 2020.
  4. SAYFAN, Gigi. Mastering Kubernetes. [S.l.]: Packt Publishing, 2019.
  5. Vídeo: “Introduction to Kubernetes” por Kelsey Hightower. [Online]. Disponível em: https://youtu.be/BE77h7dmoQU. Publicado em 10 de outubro de 2017. Acesso em: 14 mai. 2023.
  6. Vídeo: “Kubernetes Explained” por TechWorld with Nana. [Online]. Disponível em: https://youtu.be/318elIq37PE. Publicado em 15 de março de 2019. Acesso em: 14 mai. 2023.

--

--

Sebastiao Ferreira de Paula Neto

Data engineer with a passion for data science, I write efficient code and optimize pipelines for successful analytics projects.