Container_Kubernetes-[TODO]
Kubernetes
History
Use cases
Kubernetes provides the following:
Service discovery
Horizontal scaling
Load balancing
Self-healing
Leader election
Components
Control panel
The Kubernetes API Server exposes the RESTful Kubernetes API. Engineers using the cluster and other Kubernetes components create objects via this API.
The etcd distributed datastore persists the objects you create through the API, since the API Server itself is stateless. The Server is the only component that talks to etcd.
The Scheduler decides on which worker node each application instance should run.
Controllers bring to life the objects you create through the API. Most of them simply create other objects, but some also communicate with external systems (for example, the cloud provider via its API).
Control loop
Type of controllers
Workload panel
The Kubelet, an agent that talks to the API server and manages the applications running on its node. It reports the status of these applications and the node via the API.
The Container Runtime, which can be Docker or any other runtime compatible with Kubernetes. It runs your applications in containers as instructed by the Kubelet.
The Kubernetes Service Proxy (Kube Proxy) load-balances network traffic between applications. Its name suggests that traffic flows through it, but that’s no longer the case.
Deployment controller - Horizontal scaling and rolling update
ReplicaSet: Consists of a definition of replica number definition and a pod template.
Deployment controller operates on top of replica set, instead of a pod.
For deployment,
To support horizontal scaling, it modifies the replica number.
To support rolling upgrade, it adds UP-TO-DATE status.
StatefulSet controller
Motivation: Limitations of deployment controller - Deployment assumes that all pods are stateless. However, distributed applications usually have states.
StatefulSet abstracts application from two perspectives:
Topology status. For example:
Application A must start before application B.
When pods are recreated, they must share the same network identifiers as before.
Storage status. For example:
Internals
What statefulSet manages is pod.
Kubernetes number these pods by headless service, and generate DNS records inside DNS servers. As long as the pod numbering stay unchanged, then DNS records don't need to be changed.
StatefulSet allocate an independent PVC for each pod. Kubernetes will bind a PV for each PVC by using persistent volume. In this case, even
Headless service
Big picture
Service: Service is a mechanism for applications to expose pods to external env.
Two types of ways to visit a service:
VIP: A virtual IP maps to an address.
DNS: A domain name maps to an address. And it could be divided into two more types
Headless service
Normal service
Example definition
The cluster ip is set to None. It means that after the application is created, it will not have a virtual IP address. All it has will be a domain name.
And all pods represented by headless service are identified by the labels "app: nginx".
How the DNS record is used by StatefulSet to record pod topology status?
When kubectl create the service according to yaml, it will number the pod as "statefulset name"-"ordinal index"
As long as statefulset is not deleted, then when you visit statefulset-0, you will always be landing at app 0; When you visit statefulset-1, you will always be landing at app 1.
API objects
Persistent volume / +Claim
Limitations of using volume
Requires much knowledge of the storage system themselves.
For example, the following volume file for ceph exposes these information
Ceph storage user name, storage server locations, authorization file locations
Persistent volume / claim to rescue.
Process
Have a PVC defining volume attribute
Use the PVC inside pod.
The PV is defined here
Internals
PVC is like an interface and PV is implementation.
Deploy to Kubernetes
Pods
Objects
Container
Attributes
ImagePullPolicy
Default value is Always. Each time creating pod will pull the image.
LifeCycle
For example
PostStart: Runs immediately after containers get started.
PreStop: Runs before containers get stopped.
Projected volume
Secret: Used to store database credential
ConfigMap: Used to store config info that does not need encryption
Downward API: Used to make pod's info accessible to containers inside pod.
ServiceAccountToken: A special type of secret used to store access control related information.
Pod
Motivation
There will be the gang scheduling problem: How to orchestrate a group of containers.
Mesos tries to solve using resource hoarding and Google Omega tries to use optimistic lock.
Kubernetes avoid this problem because pod is the smallest unit.
Each container is a single process.
Within a container, PID = 1 represents the process itself. And all other processes are the children of PID = 1 process.
There could be many relationships between containers: File exchange, use localhost or socket file for communication, frequent remote procedure call, share some linux namespace.
Def
Pod is only a logical concept and a group of containers having shared resources. All containers in a pod share the same network namespace and could share the same volume.
Why can't pod be realized by docker run command?
The dependency of starting different containers.
Kubernetes has an intermediate container: Infra container. Other containers associate with each other by joining infra container's namespace.
Infra container: Written in assembly language and super lightweight. Use a special container image called k8s.gcr.io/pause. It always stay in pause state and only has a size of 100-200KB after decompression.
Use case
Container design model: When users want to run multiple applications in a container, they should first think whether they could be designed as multiple containers in a pod.
All containers inside a pod share the same network namespace. So network related configuration and management could be completed inside pod namespace.
Anything in the machine level (network, storage, security, orchestration) or Linux namespace level.
Sample: War and webapp
Problem: Java web depends on a war. It needs to be put under Tomcat's webapps directory.
Tries to solve with docker:
Put war under Tomcat's webapps directory. Cons: Will need to update the container image if need to upgrade war.
Reference the war inside volume. Cons: To make the war within volume accessible to multiple containers, need to build a distributed file system.
Solution with pod: Side car model. Build war and tomcat into separate container images and combine them inside a pod.
Init type of containers will start before regular containers.
Attributes
NodeSelector
Use case: Associated a pod with a node.
NodeName
Use case: Orchestration name.
HostAlias
Use case: Define pod's hosts config file
Namespace related
Use case: Share host network, IPC and PID
Network
Limitation
Problem: Under multi-host environments, two container applications might use the same IP address and will have duplicate registry center entries.
Solution:
Don't use containers' ip address. Use physical machines' ip address. However, this requires containers to know physical machines' ip address and this is bad abstraction from architecture perspective.
CNI network model
Kubernetes uses a similar model as XLAN and it replaces docker0 with cni0. The reason is
Kubernetes does not use Docker's CNM model.
The first step for creating a pod is to create an infra to hold the pod's network namespace.
Within CNI model
All containers could use their own IP addresses to communicate with other containers, without using NAT.
All hosts could use their own IP addresses to communicate with other hosts, without using NAT.
Containers see the same self IP address with other containers/hosts.
Calico
References
Last updated