Introduction and Applicaiton of Kubernetes
What is Kubernetes?
-
Open source container orchestration tool
-
Developed by Google
-
Help you manage containerized applications in different deployment ( virtual/ physical)
Problems Solved?
- Trend from Monolith to Microserives
Emerging from the agile practitioner communities, the microservice-oriented architecture emphasizes implementing and employing multiple small-scale and independently deployable microservices, rather than encapsulating all function capabilities into one monolithic application.
- Increased usage of containers
Gradually abundant virtual machine
容器是将操作系统层虚拟化,虚拟机则是虚拟化硬件
Container runtime includes docker, containerd, podman etc.
- Demand for a proper way of managing these hundreds of containers
Tasks of an orchestration tool?
-
High Availability or no downtime
-
Scalability or high performance
-
Disaster recovery-backup and restore
Kubernetes Architecture
Worker processes
Worker machine in K8s cluster
-
Each Node has multiple Pods on it
-
3 processes must be installed on every Node
-
Worker nodes do the actual work
3 processes
-
Container runtime (etc. docker)
-
Kubelet (interacts with both container and node)
Kubelet是工作节点上的主要服务,定期从kube-apiserver(在master process上)组件接收新的或修改的Pod规范,并确保Pod及其容器在期望规范下运行。同时该组件作为工作节点的监控组件,向kube-apiserver汇报主机的运行状况。一文看懂 Kubelet
- Kube proxy (fowarding requests from services to pods)
管理service的访问入口,包括集群内Pod到Service的访问和集群外访问service。kubernetes核心组件kube-proxy
Master Processes
4 processes
- Api Server
cluster gateway
dashboard, command line
Acts as a gatekeeper for authentication
k8s API Server提供了k8s各类资源对象(pod,Service等)的增删改查及watch等HTTP Rest接口,是整个系统的数据总线和数据中心。
- Scheduler
According to the resource of components, scheduler can decide on putting this pod to which node.
More details can be seen in the latter section.
- Controller manager
Detects cluster state change (crashing of pods)
-
etcd
- A key-value store of cluster state
- Cluster brain
- Cluster changes get stored in the key value store.
- All other master processes are based on etcd.
- Is the cluster healthy?
- What resources are availabel?
- Did the cluster state change?
And application data is not stored in etcd. It is stored in the DB.
Load Balance
Multiple master node
Compare
Resources: master nodes more, worker nodes less.
Brief Introduction
Kubernetes 功能
- 服务发现和负载均衡
Kubernetes 可以使用 DNS 名称或自己的 IP 地址公开容器,如果进入容器的流量很大, Kubernetes 可以负载均衡并分配网络流量,从而使部署稳定。
- 存储编排
Kubernetes 允许你自动挂载你选择的存储系统,例如本地存储、公共云提供商等。
- 自动部署和回滚
你可以使用 Kubernetes 描述已部署容器的所需状态,它可以以受控的速率将实际状态 更改为期望状态。例如,你可以自动化 Kubernetes 来为你的部署创建新容器, 删除现有容器并将它们的所有资源用于新容器。
- 自动完成装箱计算
Kubernetes 允许你指定每个容器所需 CPU 和内存(RAM)。 当容器指定了资源请求时,Kubernetes 可以做出更好的决策来管理容器的资源。
- 自我修复
Kubernetes 重新启动失败的容器、替换容器、杀死不响应用户定义的 运行状况检查的容器,并且在准备好服务之前不将其通告给客户端。
- 密钥与配置管理
Kubernetes 允许你存储和管理敏感信息,例如密码、OAuth 令牌和 ssh 密钥。 你可以在不重建容器镜像的情况下部署和更新密钥和应用程序配置,也无需在堆栈配置中暴露密钥。
Kubernetes Components
Node
A simple server, physical or virtual machine
Pod
-
Smallest unit of K8s
-
Abstraction over container
-
Usually 1 app container is inside of it (also, it can have more containers.)
-
Each Pod gets its own IP address
-
New IP address on re-creation (when a Pod die.)
What pod does? Create this running environment or a layer on top of the container
Why use pod? Because K8s wants to abstract away the container runtime or container technologies. (Try not to operate docker directly)
Service
-
Permanent IP address
-
Lifecycle of Pod and Service Not Connected (when a pod dies, the service and its address will stay)
-
External service vs Internal Service
Deployment
Replicate everything in order to avoid a bad experience. Do each replicate need a IP? No.
In this, we should resort to Service. Because Service has permanent IP and load balancer.
So what is Deployment?
Blueprint for pods. So you can scale up or scale down the number of replicas. Deployment is an abstraction on top of pods. Mostly, you work with deployment and not with pods.
However, DB can not be replicated via Deployment. Because database has a state which is its data meaning that is we have clones or replicas, they would access the same shared data storage. The writing and reading of pods will cause the inconsistency of DB. And StatefulSet will solve it.
Ingress
When we have plenty of services, it is insufficient to give Nodeport to each service. So we can use ingress to map this services.
ConfigMap
ConfigMap是一种API对象,用来将非加密数据保存到键值对中。可以用作环境变量、命令行参数或者存储卷中的配置文件。
ConfigMap可以将环境变量配置信息和容器镜像解耦,便于应用配置的修改。如果需要存储加密信息时可以使用Secret对象。
- 将ConfigMap中的数据设置为容器的环境变量 (启动后不可改变)
- 将ConfigMap中的数据设置为命令行参数
- 使用Volume将ConfigMap作为文件或目录挂载 (pod的环境变量可以随配置文件而改变)
1 | apiVersion: v1 |
1 | apiVersion: v1 |
Secrets
Has the similar function as ConfigMap and it is used to store some secrets (base64 encode)
1 | kind: deployment |
Volumes
Data storage
Storage on a local machine or remote, outside of the K8s cluster
K8s does not manage data persistence. And you do it.
StatefulSet
For stateful apps, such as mongo DB, MySQL
It would take care of replicating the pods and scaling them up or scaling them down to making sure the database reads and writes are synchronized.
How to use it on a personal device?
The answer is minikube!
What is minikube?
- Creates Virtual Box on your laptop.
- Node runs in that virtual boxx
- 1 node K8s cluster
The ways to communicate with Api server
- UI
- API
- CLI (kubectl)
Basic instructions
预装minikube(本地环境部署K8S)minikube start
minikube能把本地机虚拟化成一个单Node的集群(需要机器预装docker)
1 | minikube version |
1 | kubectl version |
Kubernetes Pod
Create Pod
使用配置创建pod
Use the instructions of docker to pull images from the remote repository to the local one
Linux(CentOS7)安装Docker,镜像拉取、使用及常用操作
For example: Simple configuration file (hello-k8s-pod.yaml)
1 | apiVersion:v1 |
1 | kubectl create -f hello-k8s-pod.yaml |
As some senior K8s developer said, usually we do not directly connect with pod and we use deployment.
1 | kubectl apply -f [file name] |
1 | apiVersion: apps/v1 |
- ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
- NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting
: . nodeport为对外端口
1 | kubectl get deployments #查看部署信息 |
查看 pod 和工作节点
What is a pod?
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
A Pod (as in a pod of whales or pea pod) is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.
Note:
Restarting a container in a Pod should not be confused with restarting a Pod. A Pod is not a process, but an environment for running container(s). A Pod persists until it is deleted.
-
kubectl get pods- 列出资源
-
kubectl describe pods- 显示有关资源的详细信息
主要描述pod的容器的一些信息,包括IP、image、端口、发生在pod上的一些事件
-
kubectl logs - 打印 pod 和其中容器的日志
-
kubectl exec - 在 pod 中的容器上执行命令
检查指定pod的logs
1 | kubectl logs $POD_NAME |
1 | kubectl exec $POD_NAME -- env |
1 | kubectl exec -ti $POD_NAME -- bash |
Step into a pod
1 | kubectl exec k8s-httpd -- ls |
Let us review how to step into a container!
1 | kubectl exec -it my-pod --container main-app -- /bin/bash # when a pod (my-pod) has more than one container, we can step in a specific container (main-app) |
1 | docker exec -ti minikube ls #step into the minikube(container) and execute the command ls |
Let us find the container httpd (we built before).
1 | docker exec -ti minikube docker exec -ti k8s-httpd ls |
1 | docker exec -ti minikube docker exec -ti k8s_k8s-httpd_k8s-httpd_default_418d0482-e835-4e81-823e-efbe0d58ddf5_0 ps -ef | grep httpd |
Pod update and replacement
How to edit a deployment?
1 | kubectl edit deployment nginx-depl |
when the Pod template for a workload resource is changed, the controller creates new Pods based on the updated template instead of updating or patching the existing Pods.
Pod updates may not change fields other than spec.containers[].image,spec.initContainers[].image,spec.activeDeadlineSecondsor spec.tolerations. For spec.tolerations, you can only add new entrie.
Let us have a look at initContainers! (when initContainers start successfully, it will start up containers.)
myapp.yaml
1 | apiVersion: v1 |
This example defines a simple Pod that has two init containers. The first waits for myservice
, and the second waits for mydb. Once both init containers complete, the Pod runs the app container from its spec section.
We can see that this pod will run until these two init complete.
1 | #NAME READY STATUS RESTARTS AGE |
删除deployment
1 | kubectl delete deployment mongo-db |
Summary
About the delete operation, we can say more.
We can combine these two commands to delete many objects.
1 | kubectl get deployment nginx-depl -o yaml > nginx-dep-result.yaml |
Kubernetes Configuration File
3 parts
Meta data
deployment
1 | apiVersion:app/v1 |
Service
1 | apiVersion:v1 |
Specification
It is specific to the kind!
Status
It is saved by K8s and we do not connect with it. It will judge whether the actual object matches what we desired or not. And this data is saved in etcd.
And how do we check it ?
1 | kubectl get deployment nginx-depl -o yaml > nginx-dep-result.yaml |
apiVersion
- Alpha:
- The version names contain alpha (for example, v1alpha1).
- The software may contain bugs. Enabling a feature may expose bugs. A feature may be disabled by default.
- The support for a feature may be dropped at any time without notice.
- The API may change in incompatible ways in a later software release without notice.
- The software is recommended for use only in short-lived testing clusters, due to increased risk of bugs and lack of long-term support.
- Beta:
- The version names contain beta (for example, v2beta3).
- The software is well tested. Enabling a feature is considered safe. Features are enabled by default.
- The support for a feature will not be dropped, though the details may change.
- The schema and/or semantics of objects may change in incompatible ways in a subsequent beta or stable release. When this happens, migration instructions are provided. Schema changes may require deleting, editing, and re-creating API objects. The editing process may not be straightforward. The migration may require downtime for applications that rely on the feature.
- The software is not recommended for production uses. Subsequent releases may introduce incompatible changes. If you have multiple clusters which can be upgraded independently, you may be able to relax this restriction.
- Stable:
- The version name is vX where X is an integer.
- The stable versions of features appear in released software for many subsequent versions.
For example, suppose there are two API versions, v1 and v1beta1, for the same resource. If you originally created an object using the v1beta1 version of its API, you can later read, update, or delete that object using either the v1beta1 or the v1 API version.
There are several API groups in Kubernetes:
-
The core (also called legacy) group is found at REST path /api/v1. The core group is not specified as part of the apiVersion field, for example, apiVersion: v1.
-
The named groups are at REST path /apis/$GROUP_NAME/$VERSION and use apiVersion: $GROUP_NAME/$VERSION (for example, apiVersion: batch/v1). You can find the full list of supported API groups in Kubernetes API reference.
In the last link, we can find the different API versions and component kinds. And most we need is in the core group.
1 | apiVersion:v1 |
1 | kubectl api-versions |
Template
It has its own metadata and spec. Because the template will be applied to the pod and pod has its own configuration.
1 | apiVersion:app/v1 |
Connection
Labels & selectors
1 | apiVersion:apps/v1 |
Port
1 | apiVersion:apps/v1 |
1 | apiVersion:v1 |
Kubernetes Deployment
使用 Service 暴露您的应用
- Kubernetes 中的服务(Service)是一种抽象概念,它定义了 Pod 的逻辑集和访问 Pod 的协议。
- Service 通过一组 Pod 路由通信。Service 是一种抽象,它允许 Pod 死亡并在 Kubernetes 中复制,而不会影响应用程序。
1 | kubectl get services #查看服务 |
1 | apiVersion: v1 |
Types of services
- ClusterIP: just have Cluster-IP
- NodePort: have Cluster-IP and External-IP
- LoadBalancer: have Cluster-IP and External-IP
Simple Demo
Kubernetes Services
What
Each Pod has its own address
Pods are ephemeral - are destroyed frequently!
Servce:
- Stable IP address
- loadbalancing
- Loose coupling
- Within & outside cluster
Service types
ClusterIP
Default type
Two containers in it
Microservice app deployed
Side-car container: collects microservice logs
Port vs targetPort
1 | ports: |
Headless
Client wants to communicate with 1 specific Pod directly. (ClusterIP will randomly select a pod.)
Use case: stateful applications, like databases.
We want to communicate with master pod (because it can write the statefulSet.)
1 | apiVersion: v1 |
NodePort
NodePort service is accessible on a static port on each worker node.
VS ClusterIP.
- ClusterIP is accessible to the port of the cluster.
- NodePort is accessible to the fixed port of the worker node.
1 | apiVersion: v1 |
NodePort is not secure because it exposes the physical port of the node.
LoadBalancer
Kubernetes 私有集群 LoadBalancer 解决方案
Most cloud providers have their own loadBalancer.
Before we access the port of the node, we will first access the loadBalancer.
1 | apiVersion: v1 |
LoadBalancer Service is an extension of NodePort Service
NodePort Service is an extension of ClusterIP Service
Kubernetes namespcaes
What is a Namespace?
- Organise resources in namespaces
- Virtual cluster inside a cluster
4 namespaces per default
- Dashboard (useless)
- System
- Do create or modify in kube-system
- System processes
- Master and Kubectl process
- Public
- Publicely accesible data
- A configmap, which contains cluster information
- Node-lease
- Heartbeats of nodes
- Each node has associated lease object in namespace
- Determine the availabilityof a node
- Default
- Resources you create are located here
Create a namespace with command line
1 | kubectl create namespace my-namespace |
What are the use cases?
Structure
Too many pods, deployments, services, configmaps. So you want to group them.
Such as dividing them into database, monitoring, elastic stack, nginx-ingress.
Conflicts
Many teams, same application
Resource Sharing
Staging and Development
Blue/Green Deployment: two version
Access and Resource Limits on Namespaces
How Namespace work and how to use it?
Check a componet in or not in namspaces.
1 | kubectl api-resources --namespaced=false # not in namespace |
Create components in your specified namespace
- Command
1 | kubectl apply -f mysql-configmap.yaml --namespace=my-namespace |
- Configuration
1 | apiVersion: v1 |
Check the components in specified namespace
1 | get configmap -n my-namespace |
If you want to change the active namespace, you can use kubens.kubens install kubens
1 | kubens |
Kubernetes Ingress
What is Ingress
For security and simplicity, we do not want users to visit service by port and ip address directly. So we introduce ingress
Configuration
1 | apiVersion: v1 |
1 | apiVersion: networking.k8s.io/v1beta1 |
Ingress Controller (pod)
- Evaluate all the rules
- Manage redirections
- Entrypoint to cluster
- Many third-party implementations
K8s Nginx Ingress Controller
Multiple paths for same host
1 | apiVersion: networking.k8s.io/v1beta1 |
Multiple sub-domains or domains
1 | apiVersion: networking.k8s.io/v1beta1 |
Set https
1 | apiVersion: networking.k8s.io/v1beta1 |
1 | apiVersion: v1 |
Kubernetes Volumes
Storage Requirements
- Storage that does not depend on the pod lifecycle
- Storage must be availabel on all nodes
- Storage needs to survive even if cluster crashes
3 concepts
-
PV
是对底层网络共享存储的抽象,将共享存储定义为一种“资源”,比如Node也是容器应用可以消费的资源。PV由管理员创建和配置,与共享存储的具体实现直接相关。
-
PVC
则是用户对存储资源的一个“申请”,就像Pod消费Node资源一样,PVC能够消费PV资源。PVC可以申请特定的存储空间和访问模式。
-
StorageClass
用于标记存储资源的特性和性能,管理员可以将存储资源定义为某种类别,正如存储设备对于自身的配置描述(Profile)。根据StorageClass的描述可以直观的得知各种存储资源的特性,就可以根据应用对存储资源的需求去申请存储资源了。存储卷可以按需创建。
Persistent Volume (PV)
-
A cluster resources
- Need actual physical storage, like local disk, NFS server, cloud storage
-
External plugin to your cluster
-
Created via yaml file
- Kind: persistentVolume
- Spec: e.g. How much storage?
PV作为存储资源,主要包括存储能力、访问模式、存储类型、回收策略、后端存储类型等关键信息的设置。
NFS server
1 | apiVersion: v1 |
目前有以下PV类型支持块设备类型:
AWSElasticBlockStore、AzureDisk、FC、GCEPersistentDisk、iSCSI、Local volume、RBD(Ceph Block Device)、VsphereVolume(alpha)
- volumeMode 属性设置为 Filesystem 的卷会被 Pod 挂载(Mount) 到某个目录。 如果卷的存储来自某块设备而该设备目前为空,Kuberneretes 会在第一次挂载卷之前 在设备上创建文件系统。
- 你可以将 volumeMode 设置为 Block,以便将卷作为原始块设备来使用。 这类卷以块设备的方式交给 Pod 使用,其上没有任何文件系统。 这种模式对于为 Pod 提供一种使用最快可能方式来访问卷而言很有帮助,Pod 和 卷之间不存在文件系统层。另外,Pod 中运行的应用必须知道如何处理原始块设备。 关于如何在 Pod 中使用 volumeMode: Block 的卷,可参阅 原始块卷支持。
对PV进行访问模式的设置,用于描述用户的应用对存储资源的访问权限。访问模式如下:
-
ReadWriteOnce(RWO):读写权限,并且只能被单个Node挂载。
-
ReadOnlyMany(ROX):只读权限,允许被多个Node挂载。
-
ReadWriteMany(RWX):读写权限,允许被多个Node挂载。
Local storage
1 | apiVersion: v1 |
kubernetes支持的PV类型如下:
-
AWSElasticBlockStore:AWS公有云提供的ElasticBlockStore。
-
AzureFile:Azure公有云提供的File。
-
AzureDisk:Azure公有云提供的Disk。
-
CephFS:一种开源共享存储系统。
-
FC(Fibre Channel):光纤存储设备。
-
FlexVolume:一种插件式的存储机制。
-
Flocker:一种开源共享存储系统。
-
GCEPersistentDisk:GCE公有云提供的PersistentDisk。
-
Glusterfs:一种开源共享存储系统。
-
HostPath:宿主机目录,仅用于单机测试。
-
iSCSI:iSCSI存储设备。
-
Local:本地存储设备,目前可以通过指定块(Block)设备提供Local PV,或通过社区开发的sig-storage-local-static-provisioner插件来管理Local PV的生命周期。
-
NFS:网络文件系统。
-
Portworx Volumes:Portworx提供的存储服务。
-
Quobyte Volumes:Quobyte提供的存储服务。
-
RBD(Ceph Block Device):Ceph块存储。
-
ScaleIO Volumes:DellEMC的存储设备。
-
StorageOS:StorageOS提供的存储服务。
-
VsphereVolume:VMWare提供的存储系统。
PV outside of the namespaces
It is accessible to the whole cluster
Persistent Volume Claim (PVC)
The pod will visit PV by forwarding the request by PVC
1 | kind: PersistentVolumeClaim |
Use PVC in Pods configuration
1 | apiVersion: v1 |
Claims must be in the same namespace with the pod
When the claim find the persistent Volume, the volume will be mounted into the pod
Life Cycle of PV and PVC
将PV看作可用的存储资源,PVC则是对存储资源的需求。
-
资源供应
k8s支持两种资源的供应模式:静态模式(Static)和动态模式(Dynamic)。资源供应的结果就是创建好的PV。
静态模式:集群管理员手工创建许多PV,在定义PV时需要将后端存储的特性进行设置。
动态模式:集群管理员无需手工创建PV,而是通过StorageClass的设置对后端存储进行描述,标记为某种类型。此时要求PVC对存储的类型进行声明,系统将自动完成PV的创建及与PVC的绑定。PVC可以声明Class为"",说明该PVC禁止使用动态模式。
-
资源绑定
在定义好PVC之后,系统将根据PVC对存储资源的要求(存储空间和访问模式)在已存在的PV中选择一个满足PVC要求的PV,一旦找到,就将该PV与定义的PVC进行绑定,应用就可以使用这个PVC了。如果系统中没有这个PV,则PVC则会一直处理Pending状态,直到系统中有符合条件的PV。PV一旦绑定到PVC上,就会被PVC独占,不能再与其他PVC进行绑定。当PVC申请的存储空间比PV的少时,整个PV的空间就都能够为PVC所用,可能会造成资源的浪费。如果资源供应使用的是动态模式,则系统在为PVC找到合适的StorageClass后,将自动创建一个PV并完成与PVC的绑定。
-
资源使用
Pod使用Volume定义,将PVC挂载到容器内的某个路径进行使用。Volume的类型为Persistent VolumeClaim,在容器挂载了一个PVC后,就能被持续独占使用。多个Pod可以挂载到同一个PVC上。
-
资源释放
当存储资源使用完毕后,可以删除PVC,与该PVC绑定的PV会被标记为“已释放”,但还不能立刻与其他PVC进行绑定。通过之前PVC写入的数据可能还被保留在存储设备上,只有在清除之后该PV才能被再次使用。
- 资源回收
对于PV,管理员可以设定回收策略,用于设置与之绑定的PVC释放资源之后如何处理遗留数据的问题。只有PV的存储空间完成回收,才能供新的PVC绑定和使用。
通过两张图分别对在静态资源供应模式和动态资源供应模式下,PV、PVC、StorageClass及Pod使用PVC的原理进行说明。
在静态资源供应模式下,通过PV和PVC完成绑定,并供Pod使用的存储管理机制
在动态资源供应模式下,通过StorageClass和PVC完成资源动态绑定(系统自动生成PV),并供Pod使用的存储管理机制
Why so many abstractions?
- Developers not care for the concret storage.
- Do not want to set up the actual storages
others
ConfigMap and Secret
- local volumes
- Nor created via PV and PVC
- Managed by K8s
- create ConfigMap or Secret component
- Mount that into your pod
1 | apiVersion: v1 |
StorageClass
StorageClass作为对存储资源的抽象定义,对用户设置的PVC申请屏蔽后端存储的细节,一方面减少了用户对存储资源细节的关注,另一方面减少了管理员手工管理PV的工作,由系统自动完成PV的创建和绑定,实现了动态的资源供应。
Configuration
1 | apiVersion: storage.k8s.io/v1 |
Abstraction under PVC
Usuage
Requested by PVC
1 | apiVersion: v1 |
Pipeline
- a. Pod claims storage via PVC
- b. PVC requests storage from SC
- c. SC creates PV that meets the needs of the claim
Kubernetes StatefulSet
StatefulSet for stateful applications
Stateful applications
- Examples of stateful applications: database, applications that stores data
- Deployed using StatefulSet
Stateless applications
- Do not keep record of state
- Each request is completely new
- Deployed using Deployment
StatefulSet and Deployment both manage pods based on container specification!
Deployment vs StatefulSet
- Deployment
- Deployment is identical and interchangeable.
- Deployment Created in random order with random hashes.
- One service that load balance to any Pod
- Lose data when all Pods die
- StatefulSet
- Can not be created/deleted at same time. (think about DB)
- Can not be randomly addressed
- Replica pods are not identical
- Pod Identity
- Sticky identity for each pod
- Created from same specification, but not interchangeable!
- Persistent identifier across any re-scheduling (when some pods died, some new pods will inherit their identity)
- Pod Identity
- Can save the data when all Pod die
The pods will be divided into a master and workers. And only master can write the statefulSet and workers can only read it.
2 characteristics of StatefulSet
- predictable pod name: mysql-0, mysql-1, …
- fixed individual DNS name: mysql-0.svc2…
Kubernetes Role
User
kubernetes集群权限之Cluster、 User和Context
Role&ClusterRole
Role是一组权限的集合,例如Role可以包含列出Pod权限及列出Deployment权限,Role用于给某个NameSpace中的资源进行鉴权。
通过YAML资源定义清单创建Role
Role
针对单个namespace
1 | apiVersion: rbac.authorization.k8s.io/v1 |
ClusterRole
可以在包括所有namespace和集群级别的资源或非资源类型进行鉴权
1 | apiVersion: rbac.authorization.k8s.io/v1 |
相关参数
Role、ClsuterRole APIGroup可配置参数
1 | "","apps", "autoscaling", "batch" |
参数含义为不同的api group,空字符串""表明使用core API group空字符串
Role、ClsuterRole Resource可配置参数
1 | "services", "endpoints", "pods","secrets","configmaps","crontabs","deployments","jobs","nodes","rolebindings","clusterroles","daemonsets","replicasets","statefulsets","horizontalpodautoscalers","replicationcontrollers","cronjobs" |
Role、ClsuterRole Verbs(规则)可配置参数
1 | "get", "list", "watch", "create", "update", "patch", "delete", "exec" |
RoleBinding&ClusterRoleBinding
角色绑定将一个角色中定义的各种权限授予一个或者一组用户,则该用户或用户组则具有对应绑定的Role或ClusterRole定义的权限。
角色绑定包含了一组相关主体(即subject, 包括用户——User、用户组——Group、或者服务账户——Service Account)以及对被授予角色的引用。 在某一namespace中可以通过RoleBinding对象授予权限,而集群范围的权限授予则通过ClusterRoleBinding对象完成。
利用RoleBinding绑定user在默认namespace里的权限
1 | kind: RoleBinding |
利用ClusterRoleBinding绑定group在集群里的权限
1 | kind: ClusterRoleBinding |
利用RoleBinding引用一个ClusterRole,但仅作用在一个namespace
1 | kind: RoleBinding |
Intructions
1 | kubectl get role -n kube-system # 查看kube-system下的所有role |
1 | kubectl get role kube-proxy -n kube-system -o yaml #查看kube-proxy的具体参数信息 |
1 | kubectl get rolebinding -n kube-system # 查看kube-system下的角色绑定 |
1 | kubectl get clusterrole #查看集群中的clusterrole |
1 | kubectl get clusterrolebinding #查看集群中的绑定 |
Other functions
运行应用程序的多个实例(负载均衡)
ReplicaSet to balance
Avoid some pods being killed
- NAME lists the names of the Deployments in the cluster.
- READY shows the ratio of CURRENT/DESIRED replicas
- UP-TO-DATE displays the number of replicas that have been updated to achieve the desired state.
- AVAILABLE displays how many replicas of the application are available to your users.
- AGE displays the amount of time that the application has been running.
1 | kubectl get rs #see the ReplicaSet created by the Deployment |
负载均衡
1 | kubectl scale deployments/kubernetes-bootcamp --replicas=4 #创建4个副本 |
You do not need to control the replicas and you can set deployment.
Zone to balance
Balance the number of pods in different zones.
Pod Topology Spread Constraints
···YAML
kind: Pod
apiVersion: v1
metadata:
name: mypod
labels:
foo: bar
spec:
topologySpreadConstraints:
- maxSkew: 1 #the difference among different topology is 1. cask principle: The shortest plank determines the capacity of the barrel
topologyKey: zone # mark the zone (it also can be node, region etc.)
whenUnsatisfiable: DoNotSchedule #do not satisfy the condition, then the pod will not be scheduled
labelSelector:
matchLabels:
foo: bar
containers: - name: pause
image: k8s.gcr.io/pause:3.1
1 |
|
有些节点暴露在外,我们不希望在之上部署pod
1 | spec: |
The scheduling strategy is pod-based and we will introduce the cluster-based strategy.
It will involve the knowledge of scheduler in k8s.
1 | kube-scheduler --config <filename> |
The simple file can be this~
1 | apiVersion: kubescheduler.config.k8s.io/v1beta2 |
And we will discuss concrete knowledge later. In this part, we continue to discuss the balanced problem.
1 | apiVersion: kubescheduler.config.k8s.io/v1beta1 |
Now we do not need to set each pod configuration file.
执行滚动更新
1 | kubectl set image deployments/kubernetes-bootcamp kubernetes-bootcamp=jocatalin/kubernetes-bootcamp:v2 #用新的版本更新旧的版本 |
Scheduling
Scheduling Policies
A scheduling Policy can be used to specify the predicates and priorities that the kube-scheduler runs to filter and score nodes, respectively.
1 | kube-scheduler --policy-config-file <filename> |
Predicates(断言)
简单来说:就是检查Node是否满足一些Pod的需要
The following predicates implement filtering:
- PodFitsHostPorts: Checks if a Node has free ports (the network protocol kind) for the Pod ports the Pod is requesting.
- PodFitsHost: Checks if a Pod specifies a specific Node by its hostname.
- PodFitsResources: Checks if the Node has free resources (eg, CPU and Memory) to meet the requirement of the Pod.
- MatchNodeSelector: Checks if a Pod’s Node Selector matches the Node’s label(s).
- NoVolumeZoneConflict: Evaluate if the Volumes that a Pod requests are available on the Node, given the failure zone restrictions for that storage.
- (Volumes: 存储数据的目录,容器可以访问)
- NoDiskConflict: Evaluates if a Pod can fit on a Node due to the volumes it requests, and those that are already mounted.
- MaxCSIVolumeCount: Decides how many CSI volumes should be attached, and whether that’s over a configured limit. (CSI: Contrainer Storage Interface)
- PodToleratesNodeTaints: checks if a Pod’s tolerations can tolerate the Node’s taints.
- (Pod:pod的容忍程度,与node的taint对应)
- (Traints: is to prevent the scheduling of pods on nodes.)
- CheckVolumeBinding: Evaluates if a Pod can fit due to the volumes it requests. This applies for both bound and unbound PVCs.
Priorities(优先级)
简单来说:就是根据node的一些资源或者要求进行排队
The following priorities implement scoring:
- SelectorSpreadPriority: Spreads Pods across hosts, considering Pods that belong to the same Service, StatefulSet or ReplicaSet.
- InterPodAffinityPriority: Implements preferred inter pod affininity and antiaffinity.
- LeastRequestedPriority: Favors nodes with fewer requested resources. In other words, the more Pods that are placed on a Node, and the more resources those Pods use, the lower the ranking this policy will give.
- MostRequestedPriority: Favors nodes with most requested resources. This policy will fit the scheduled Pods onto the smallest number of Nodes needed to run your overall set of workloads.
- RequestedToCapacityRatioPriority: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
- BalancedResourceAllocation: Favors nodes with balanced resource usage.
- NodePreferAvoidPodsPriority: Prioritizes nodes according to the node annotation scheduler.alpha.kubernetes.io/preferAvoidPods. You can use this to hint that two different Pods shouldn’t run on the same Node.
- NodeAffinityPriority: Prioritizes nodes according to node affinity scheduling preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution. You can read more about this in Assigning Pods to Nodes.
- TaintTolerationPriority: Prepares the priority list for all the nodes, based on the number of intolerable taints on the node. This policy adjusts a node’s rank taking that list into account.
- ImageLocalityPriority: Favors nodes that already have the container images for that Pod cached locally.
- ServiceSpreadingPriority: For a given Service, this policy aims to make sure that the Pods for the Service run on different nodes. It favours scheduling onto nodes that don’t have Pods for the service already assigned there. The overall outcome is that the Service becomes more resilient to a single Node failure.
- EqualPriority: Gives an equal weight of one to all nodes.
- EvenPodsSpreadPriority: Implements preferred pod topology spread constraints.
Scheduler Configuration
A scheduling Profile allows you to configure the different stages of scheduling in the kube-scheduler.
Each stage is exposed in an extension point.
Extension points
Scheduling happens in a series of stages that are exposed through the following extension points:
- queueSort: These plugins provide an ordering function that is used to sort pending Pods in the scheduling queue. Exactly one queue sort plugin may be enabled at a time.
- preFilter: These plugins are used to pre-process or check information about a Pod or the cluster before filtering. They can mark a pod as unschedulable.
- filter: These plugins are the equivalent of Predicates in a scheduling Policy and are used to filter out nodes that can not run the Pod. Filters are called in the configured order. A pod is marked as unschedulable if no nodes pass all the filters.
- postFilter: These plugins are called in their configured order when no feasible nodes were found for the pod. If any postFilter plugin marks the Pod schedulable, the remaining plugins are not called.
- preScore: This is an informational extension point that can be used for doing pre-scoring work.
- score: These plugins provide a score to each node that has passed the filtering phase. The scheduler will then select the node with the highest weighted scores sum.
- reserve: This is an informational extension point that notifies plugins when resources have been reserved for a given Pod. Plugins also implement an Unreserve call that gets called in the case of failure during or after Reserve.
- permit: These plugins can prevent or delay the binding of a Pod.
- preBind: These plugins perform any work required before a Pod is bound.
- bind: The plugins bind a Pod to a Node. bind plugins are called in order and once one has done the binding, the remaining plugins are skipped. At least one bind plugin is required.
- postBind: This is an informational extension point that is called after a Pod has been bound.
Helm
What is Helm
Package manager for kubernetes
to package yaml file
When you have too many components, it is tedious to manage them and search for them.
And other people can use the yaml file in the repository
Templating Engine
If you want to create many microservice and they just have a few difference, you do not need to write different yaml file.
You can:
- define a common blueprint
- Dynamic values are replaced by placeholders
Template YAML config
1 | apiVersion: v1 |
values.yaml
1 | name: my-app |
How to use them
When you install a chart, you will use the values.yaml to inject the template file.
And if you want to use you own values file to update some values in values.yaml, you can
1 | helm install --values=my-values.yaml <chartname> |
You can also set some values by command line.
1 | helm install --set version=2.0.0 |
If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !