| all-in-one | AIO | Kubernetes and openFuyao components such as fuyao-system are deployed on the same node. |
| API Gateway | APIG | A single entry point between clients and APIs. It acts as a reverse proxy that routes client requests to a group of backend APIs. |
| application programming interface | API | A set of predefined functions that provide application programs for developers to access a group of routines based on software or hardware. There is no need to access the source code or understand the internal details of a mechanism. |
| AscendHub | - | Docker image repository opened in Ascend. |
| blackbox_exporter | - | One of the official Prometheus exporters. It enables network probing over HTTP, HTTPS, DNS, TCP, and ICMP. |
| cAdvisor | - | A container monitoring tool developed by Google. It is embedded in Kubernetes as a monitoring component. |
| cloud-native colocation | - | Online and offline services are deployed in the same cluster in cloud native mode. Cluster resources are allocated to online services during peak and off-peak hours to improve the overall resource utilization of the cluster. |
| Cloud Native Computing Foundation | CNCF | An open-source software foundation. |
| ConfigMap | - | An API object used to store non-confidential data in key-value pairs. |
| console | - | The frontend web-based control interface. |
| container | - | A runtime instance created based on an image. Containers can be started, stopped, and removed. Each container is an isolated and secure platform. |
| Coordinated Universal Time | UTC | A time standard used to unify the time globally. |
| CronJob | - | Jobs are created on a repeating schedule. |
| custom resource | CR | Custom resources in Kubernetes. |
| custom resource definition | CRD | A Kubernetes extension mechanism that allows users to define custom resources. |
| DaemonSet | - | A controller that ensures that all or some nodes run a copy of a Pod. |
| dashboard | - | A dashboard consists of multiple monitoring components customized by users. Users can monitor various indicators based on their requirements. |
| data parallelism | DP | A full copy of a model is stored on each device. Each device independently processes a part of the dataset, and then summarizes the gradients. |
| decode | - | Process from the generation of the first token to the end of inference. |
| Deployment | - | A Deployment provides declarative updates for Pods and ReplicaSets. |
| domain name system | DNS | A service that maps domain names to IP addresses for easier network access. |
| Dubbo | - | An open-source, high-performance service framework from Alibaba that delivers transparent RPC remote invocation and service governance. |
| embedding | EMB | Data vectorization embedding operation. |
| fully qualified domain name | FQDN | Name with both the hostname and domain name in the format of FQDN=Hostname+Domain name. |
| gigabyte | GB | A decimal unit of information measurement, which is commonly used to denote the storage capacity of disks or memory. |
| Helm | - | A package manager for Kubernetes that simplifies the deployment and management of applications in a Kubernetes cluster. |
| Helm chart | - | A core concept in Helm, which is a pre-configured package of application resources. |
| high availability | HA | The ability of a system or service to run reliably and remain continuously available, even during hardware failures or unexpected issues. |
| High Performance RSA Engine | HPRE | KAE high-performance RSA acceleration engine. |
| High Performance ZIP Engine | ZIP | KAE high-performance zlib/Gzlib compression engine. |
| horizontal Pod autoscaler | HPA | Workload resources (such as Deployments or StatefulSets) are automatically updated to scale them up or down based on demand. |
| Hypertext Transfer Protocol Secure | HTTPS | A secure version of HTTP that ensures security during transmission through encryption and authentication. |
| indicator | - | A monitoring indicator is an indicator supported by the data collection system (such as Prometheus) for users to monitor. A monitoring indicator can contain multiple monitoring instances. |
| ingress | - | An API object that manages external access to the services in a cluster. It can provide load balancing, SSL termination, and name-based virtual hosting. |
| instance | - | The smallest monitorable object in Kubernetes. Each monitoring instance is uniquely identified by a set of key-value pair labels. |
| Job | - | A one-off task executed in a cluster. It focuses on running a task once rather than maintaining a specified number of running instances. A Job creates one or more Pods to run a specified task. The Pods are removed by the Job after the task is complete. |
| key-value cache | KV cache | A common policy to accelerate the foundation model inference, which works by caching the key (K) and value (V) matrices in the self-attention mechanism during the foundation model inference to prevent redundant computations and improve inference speed. |
| kube-apiserver | - | It validates and configures data for the API objects which include Pods, services, and ReplicationControllers. The API server provides services for REST operations and provides the frontend for the shared state of clusters. All other components interact through the frontend. |
| kubectl | - | A command-line tool for communicating with the control plane of a Kubernetes cluster, using the Kubernetes API. |
| kubelet | - | A critical Kubernetes component running on each node in a cluster, responsible for managing containers on that node. It is a node agent in the Kubernetes system and communicates with the controller on the main control plane to ensure that containers run on nodes as expected. |
| Kube-rbac-proxy | - | A lightweight HTTP proxy service designed for Kubernetes, utilizing the SubjectAccessReview feature of Kubernetes to enforce RBAC authorization. Its goal is to restrict communication between Pods, allowing only Pods with valid and RBAC-authorized tokens to access other Pods. |
| Kubernetes | K8s | A portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. |
| kube-state-metrics | - | It collects status indicators of resource objects generated by the API server, such as Deployments, nodes, and Pods. |
| Kunpeng Accelerator Engine | KAE | A hardware acceleration solution based on Kunpeng 920 series processors. |
| Large Language Model | LLM | A deep learning model trained based on a large amount of text data. |
| metrics-server | - | A core component of the Kubernetes monitoring system. It collects resource indicators from Kubelet, aggregates the indicators (depending on the kube-aggregator), and exposes them through the metrics API (/apis/metrics.k8s.io/) in the Kubernetes API server. However, metrics-server stores only the latest indicator data (CPUs/memory). |
| Mind Inference Service | MIS | Foundation model inference API service provided by Ascend based on containerized deployment. |
| Mind Inference Service Operator | MIS operator | Component that manages the lifecycle of inference microservice instances. |
| Mooncake | - | Distributed KV cache storage engine designed for foundation model inference to improve the inference efficiency. |
| multi-core | - | A large number of processing cores are integrated on a single chip. Multi-core scenarios indicate nodes with more than 256 CPUs in a cluster. |
| Mutual TLS | mTLS | A two-way encrypted channel is used between a server and a client. |
| namespace | - | A Kubernetes namespace is an isolated resource space within a project on the platform, serving as the user's workspace for production. A project can create multiple namespaces, with the sum of their allocated resource quotas not exceeding the project quota. Namespaces provide finer-grained resource quota division and also limit container sizes (CPUs and memory) within the namespace, effectively improving resource utilization. |
| Nginx | - | A high-performance HTTP and reverse proxy web server. It also provides IMAP, POP3, and SMTP services. |
| node | - | Nodes can be VMs or PMs depending on the cluster configuration. |
| node feature discovery | NFD | Node feature discovery in Kubernetes. It detects the hardware features that are available on each node in a Kubernetes cluster and advertises them using node labels, annotations, and taints. |
| node_exporter | - | It collects and exposes host system indicators, such as CPU usage, disk usage, memory usage, and network activities. It can be used with Prometheus or other monitoring tools and supports various collectors and custom indicators. |
| non-uniform memory access | NUMA | A memory architecture in modern multi-core and multi-processor systems. It optimizes memory access speed by allocating processors and memory to multiple nodes. |
| OAuth2-Server | - | A server providing the OAuth 2.0 protocol implementation in openFuyao. |
| oauth-proxy | - | It provides OAuth2-based authentication and authorization functionality. It helps protect web applications or APIs, ensuring that only authenticated users can access protected content. |
| offline workload | - | Services that have relatively low requirements on service quality and are insensitive to response delay, such as big data analysis, transcoding, and AI training. |
| online workload | - | Services that have high requirements on service quality and are sensitive to response delay, such as web services and e-commerce services. |
| Open Authorization 2.0 | OAuth2.0 | An industry-standard protocol for authorization. OAuth 2.0 focuses on client developer simplicity while providing specific authorization flows for web applications, desktop applications, mobile phones, and living room devices. |
| operating system | OS | A built-in program that coordinates various computer hardware components and interacts with the user. Commonly used OSs include Windows, macOS, and open-source Linux. |
| persistent volume | PV | A piece of storage in a cluster, which is prepared and managed by an administrator. |
| persistent volume claim | PVC | Kubernetes storage resource used for file storage. |
| pipeline parallelism | PP | A technology that splits a model into multiple parts by layer and computes the input data on different devices. |
| Pod | - | The smallest deployable unit of computing that users can create and manage in Kubernetes. |
| prefill | - | Process from the time when a user enters a prompt to the time when the first token is generated. |
| prefill-decode disaggregation | PD disaggregation | Architecture that schedules the prefill and decode phases of the foundation model inference to different hardware clusters to optimize resource configuration and improve system performance. |
| Prometheus | - | An open-source system monitoring and alerting toolkit used to collect and process real-time indicators. It periodically pulls monitoring data from the target service or agent through HTTP and stores the data in a highly available time series database. Users can query, aggregate, and visualize the data using the PromQL query language and trigger alerts based on predefined rules. |
| prompt | - | Information input by a user to the model. The model generates the expected output. |
| quality of service | QoS | Kubernetes classifies Pods into three QoS levels (Guaranteed, Burstable, and BestEffort) based on their resource requests and limits. These QoS levels are used to determine resource allocation and eviction priority. In addition, workloads can define custom QoS policies to differentiate priorities and resource guarantees across services or tasks. QoS helps improve the stability of critical services and optimize resource utilization. |
| RayCluster | - | A basic Ray cluster consists of one head node and zero or more worker nodes. |
| RayJob | - | It submits and executes a single job. Each submitted job independently creates a Ray cluster, executes the task once the cluster is ready, and automatically destroys the cluster upon task completion, achieving cluster-level isolation. |
| RayService | - | It deploys Ray Serve. During deployment, it creates an independent Ray cluster and supports features like hot updates and high availability for the service. |
| resource | - | Built-in resources and custom resources in Kubernetes. |
| resource overselling | - | When the online services are idle in hybrid deployment, the remaining resources are often high. Resource overselling dynamically allocates these resources to offline jobs. |
| role-based access control | RBAC | An access control method that manages access to system resources based on user roles. In RBAC, permissions are associated with roles, and users acquire permissions through their assigned roles. |
| scheduler | - | A scheduler makes sure that Pods are matched to nodes so that Kubelet can run them. |
| secret | - | An object that contains a small amount of sensitive data such as a password, a token, or a key. |
| security engine | SEC | The hardware security acceleration engine module of KAE. |
| service | - | A method that is used to expose network applications running on one Pod or a group of Pods as network services in Kubernetes. |
| service level object | SLO | It ensures that the provided services meet the customer's expectations. |
| ServiceMonitor | - | A core abstraction of the Prometheus operator for the monitoring system. ServiceMonitors can facilitate indicator monitoring. |
| Silence | - | A basic capability provided by alerting components. It matches alerts based on configured silence rules. If a matching alert is silenced, it is not pushed for notification. |
| Spring Cloud | - | A complete microservices solution suite based on the Spring Boot framework. |
| StatefulSet | - | A workload API object used to manage stateful applications. |
| tensor parallelism | TP | A technology that splits the model weight into multiple parts and computes the input data on different devices. |
| universally unique identifier | UUID | A standard used in software construction, which is composed of a timestamp, clock sequence, and a globally unique node identifier (such as a hash of the hostname). |
| vCPU | - | A processor resource used in virtual environments. It represents a portion of a physical CPU and can be used independently by a VM. Unlike physical CPUs, vCPUs utilize hyper-threading technology to divide a physical processor into multiple virtual processor cores, enabling resource sharing and dynamic allocation. |
| vision-language model | VLM | A deep learning model trained based on a large amount of visual-text data. |
| vLLM | - | An efficient inference engine and framework designed for large language models. It can optimize the inference performance of foundation models. |
| volume | - | An abstraction in Kubernetes that provides persistent storage for containers within a Pod. |
| widget | - | A monitoring component contains the name and data chart and is displayed as a card. |
| - | xPyD | Architecture that contains x P nodes and y D nodes in the PD architecture. |