Terminology

Chinese Name	English Full Name	Abbreviation	Description
Extension	addon	-	Components that need to be installed when creating cluster using openFuyao community BKE installation tool, such as calico.
-	All-In-One	AIO	Deploy Kubernetes and fuyao-system and other components on the same node.
API Gateway	API Gateway	APIG	A single entry point located between client and API, acting as reverse proxy to route client requests to a group of APIs behind it.
Application Programming Interface	Application Programming Interface	API	Some predefined functions, purpose is to provide applications and developers with ability to access a set of routines based on certain software or hardware without needing to access source code or understand details of internal working mechanism.
Ascend Computing Language	Ascend Computing Language	ACL	Provides APIs for runtime management, single operator calls, model inference, media data processing, etc., capable of utilizing underlying hardware computing resources to perform deep learning inference computing, graphic image preprocessing, single operator accelerated computing, etc. on CANN platform.
Ascend Image Repository	AscendHub	-	Ascend open Docker image repository.
API Server	API Server	apiserver	Kubernetes API server, providing cluster REST API interface.
-	blackbox_exporter	-	One of the exporters provided by Prometheus official, can provide HTTP、HTTPS、DNS、TCP and ICMP methods to probe network.
Bootstrap Node	Bootstrap Node	-	First node created during cluster initialization, used to guide entire cluster creation process.
-	BatchTransfer	-	BatchTransfer encapsulates operation requests, specifically responsible for Read/Write data synchronization between non-contiguous group of data spaces in one Segment and corresponding spaces in another group of Segments.
-	Mooncake CacheTier	CacheTier	Different tier cache layers under Tier Backend architecture in Mooncake, used for tiered storage of KVCache data.
-	cAdvisor	-	A container monitoring tool developed by Google, embedded into Kubernetes as monitoring component.
Cloud Native Colocation	Cloud Native Colocation	-	Using cloud-native approach to deploy online and offline businesses in the same cluster, improving overall cluster resource utilization by adjusting online business cluster resource usage during business trough and peak periods, a deployment method.
Cloud Native Computing Foundation	Cloud Native Computing Foundation	CNCF	Cloud Native Computing Foundation, an open-source software foundation.
-	ConfigMap	-	An API object in Kubernetes, used to save non-sensitive data to key-value pairs.
Console	Console	-	Frontend web page console.
Container	Container	-	Running instance created based on image, can be started, started, stopped, deleted. Each container is mutually isolated, secure platform.
Container memory sharing	Container memory sharing	-	Based on UB memory pooling mechanism, when node or NUMA memory usage rate in bare metal container scenario reaches threshold, trigger memory borrowing, seamlessly sharing part of memory pressure to remote memory pool.
Container memory borrowing	Container memory borrowing	-	Memory pooling component of UBS-Core, supports importing/exporting memory blocks in UBS Server cluster through memory pooling capability to achieve goal of cross-node and multi-process shared memory usage on bare metal.
Coordinated Universal Time	Coordinated Universal Time	UTC	UTC is a time standard, used to unify time globally.
-	CronJob	-	Create Job based on time interval repeated scheduling.
-	Custom Resource	CR	Custom resource in Kubernetes.
Custom Resource Definition	Custom Resource Definition	CRD	Kubernetes resource extension mechanism, allowing definition of custom resources.
Certificate Authority	Certificate Authority	CA	Authority responsible for issuing and managing digital certificates.
Certificate Signing Request	Certificate Signing Request	CSR	File containing certificate applicant information, used to apply for certificate from CA.
Common Name	Common Name	CN	Field in X.509 certificate, usually used to identify name of certificate holder.
Certificate File Extension	Certificate	CRT	Usually used to store X.509 certificate.
Configuration Map	Configuration Map	ConfigMap	Configuration object in Kubernetes, used to store non-sensitive configuration data.
Certificate Revocation List	Certificate Revocation List	CRL	List containing revoked certificates.
Controller Manager	Controller Manager	controller-manager	Kubernetes controller manager, running various controllers to maintain cluster state.
-	Customstatlogger	-	vLLM exposes StatLoggerBase abstract class, able to customize metrics and metric reporting methods.
-	DaemonSet	-	Ensure running one Pod replica on all (or some) nodes.
Monitoring Dashboard	Dashboard	-	Monitoring dashboard consists of multiple user-defined monitoring components, supporting users to monitor various metrics according to their own needs.
Data Parallelism	Data Parallelism	DP	Each device will have a complete copy of model, each device independently processes part of dataset, then aggregates their respective gradients.
Decode Stage	Decode	-	Process from generating first token to inference stop.
-	Deployment	-	Provides declarative update capability for Pod and ReplicaSet.
Device Class	DeviceClass	-	A cluster-level API object in Kubernetes DRA framework, used to define abstract category of certain type of device resources and its selection semantics, for ResourceClaim to reference when requesting resources.
Device-to-Device Transmission	Device-to-Device Transmission	D2D Transmission	Transmission method where data does not pass through CPU or host memory, directly transferred within same accelerator card or between different accelerator cards.
Domain Name System	Domain Name System	DNS	Service that maps domain names and IP addresses to each other, better for network access.
-	Dubbo	-	A high-performance service framework open-sourced by Alibaba, possessing high-performance and transparent RPC remote service calls and service governance solutions.
Dynamic Resource Allocation	Dynamic Resource Allocation	DRA	A device resource management mechanism provided by Kubernetes, used to implement device resource request, scheduling and allocation through structured API objects, and allowing device drivers to participate in resource selection and allocation decision process.
-	DCGM Exporter	-	Collects GPU running and health metrics, including GPU utilization, PCIe transfer rate, temperature, power usage, etc. messages.
Embedding	Embedding	EMB	Data vector embedding operation.
Extended Key Usage	Extended Key Usage	extKeyUsage	Certificate extension field, specifies specific usage of certificate (such as server authentication, client authentication, etc.).
-	etcd	-	Distributed key-value storage system, Kubernetes used to store cluster state and configuration data.
EndpointPicker	EndpointPicker	EPP	In Kubernets Gateway API Inference Extension, component responsible for selecting appropriate backend instance, supporting endpoint selection based on different routing policies.
-	Felix	-	Calico's node agent component, running on each node, responsible for configuring network, routing and policy rules.
Fully qualified domain name	Fully qualified domain name	FQDN	Name with both hostname and domain name, FQDN=Hostname+DomainName.
Gigabyte	Gigabyte	GB	A decimal information measurement unit, often used to identify storage capacity of storage media with larger capacity such as computer hard drives, memory.
Gateway API Inference Extension	Gateway API Inference Extension	GIE	Extended inference-related capabilities based on Kubernetes Gateway API, used to define and manage routing and traffic policies of inference services.
-	Helm	-	A package manager in Kubernetes, used to simplify deploying and managing applications in Kubernetes cluster.
-	Helm Chart	-	A core concept of Helm, it is a pre-configured application resource package.
High Availability	High Availability	HA	System or service can run with high reliability and continuous availability, maintaining normal operation even when facing hardware failures or other anomalies.
High Performance RSA Engine	High Performance RSA Engine	HPRE	KAE high-performance RSA acceleration engine module.
High Performance ZIP Engine	High Performance ZIP Engine	ZIP	KAE high-performance zlib/Gzlib compression engine module.
Horizontal Pod Autoscaler	Horizontal Pod Autoscaler	HPA	Automatically updates workload resources (such as Deployment or StatefulSet), purpose is to automatically scale workload to meet demand.
Host_to_Device Transmission	Host_to_Device Transmission	H2D Transmission	Transmission process of copying data from CPU/host memory to accelerator device video memory such as NPU/GPU.
Hypertext Transfer Protocol Secure	Hypertext Transfer Protocol Secure	HTTPS	HTTP channel with security as goal, on basis of HTTP ensures security of transmission process through transmission encryption and identity authentication.
Monitoring Indicator	Indicator	-	Monitoring indicators are metrics supported by users for monitoring in data collection system (such as Prometheus), a monitoring indicator can contain multiple monitoring instances.
-	Ingress	-	API object that manages external access to services in cluster, can provide load balancing, SSL termination and name-based virtual hosting.
Monitoring Instance	Instance	-	Monitoring instance is minimum granularity object that can be monitored on Kubernetes. Each monitoring instance is uniquely identified by set of certain key-value pair labels.
-	Job	-	Run one-time tasks in cluster, focusing on executing one-time tasks rather than maintaining specified number of instances running. Job controller will create one or more Pods to run specified tasks. When tasks complete, Job controller will delete Pods.
Key-Value Cache	Key-Value Cache	KVCache	Common strategy for large model inference acceleration, works by caching Key (K) and Values (V) matrices generated by self-attention mechanism during large model inference to avoid repeated calculation and improve inference speed.
-	kube-apiserver	-	Validates and configures data for API objects, these objects include Pods、Services、Replicationcontrollers, etc. API server provides REST operations and provides frontend for cluster's shared state, all other components interact through this frontend.
-	kubectl	-	Command-line tool for Kubernetes API to communicate with Kubernetes cluster control plane.
-	kubelet	-	An important component in Kubernetes cluster, running on each node, responsible for managing containers on that node. It is node agent in Kubernetes system, communicates with controllers in main control plane to ensure containers run on nodes as expected.
-	Kube-rbac-proxy	-	A lightweight HTTP proxy service designed specifically for Kubernetes, it uses Kubernetes' SubjectAccessReview function to perform RBAC (role-based access control) authorization. This project's goal is to limit communication between Pods, only allowing Pods holding valid and RBAC authorization tokens to access other Pods.
-	Kubernetes	K8s	Kubernetes is a portable, scalable open-source platform for managing containerized workloads and services, facilitating declarative configuration and automation.
-	kube-state-metrics	-	Collects state metrics about resource objects generated by API Server, such as Deployment、Node、Pod.
Kunpeng Accelerator Engine	Kunpeng Accelerator Engine	KAE	Kunpeng Accelerator Engine KAE (Kunpeng Accelerator Engine) is hardware acceleration solution provided based on Kunpeng 920 processor.
Key File Extension	Key	KEY	Used to store private key.
Kubernetes Configuration	Kubernetes Configuration	kubeconfig	Contains cluster access information, authentication information and context configuration.
Key Usage	Key Usage	keyUsage	Certificate extension field, specifies usage of certificate key (such as digital signature, key encryption, etc.).
-	kubelet	-	Kubernetes node agent, running on each node, responsible for managing Pods and containers.
-	kube-proxy	-	Kubernetes network proxy, responsible for maintaining network rules and load balancing on nodes.
Large Language Model	Large Language Model	LLM	Deep learning model trained based on large amount of text data.
Load Balancer	Load Balancer	-	Device or service used to distribute network traffic among multiple servers.
-	metrics-server	-	One of core components in Kubernetes monitoring system, responsible for collecting resource metrics from Kubelet, then aggregating these metrics monitoring data (depending on kube-aggregator), and exposing them through Metrics API (/apis/metrics.k8s.io/) in Kubernetes Apiserver, but metrics-server only stores latest metrics data (CPU/Memory).
Mind Inference Service	Mind Inference Service	MIS	Large model inference API service based on containerized deployment provided by Ascend.
Mind Inference Service Operator	Mind Inference Service Operator	MIS-Operator	Component implementing inference microservice instance lifecycle management.
-	Mooncake	-	Distributed KVCache storage engine specifically designed for large model inference, improving inference efficiency.
Multi-core	Multi-core	-	Refers to integrating large number of processing cores on single chip. Multi-core scenario specifically refers to nodes in cluster with CPU count greater than 256.
Mutual TLS	Mutual TLS	mTLS	Using bidirectional encryption channel between server and client.
-	Mooncake	-	Open-source community, and proposes LLM service decoupling architecture centered on KVCache.
-	Mooncake Store	-	High-performance distributed key-value KVCache storage engine specifically designed for LLM inference scenarios.
-	Mooncake Store Master Service	Master Service	In Mooncake Store, responsible for managing logical storage space pool of entire cluster, and handling node join and exit events.
-	Mooncake Store Client	Mooncake Client	Mooncake Store client, responsible for initiating get/put requests called by upper-layer applications and providing actual KVCache storage.
-	Mooncake Transfer Engine	TE	A high-performance, zero-copy data transmission library designed around two core abstractions of Segment and BatchTransfer.
Namespace	Namespace	-	Kubernetes namespace, on platform is smaller resource space mutually isolated within project, also production workspace implemented by users. A project can create multiple namespaces, total resource quota occupied cannot exceed project quota. Namespace divides resource quota with finer granularity while also limiting size of containers under namespace (CPU、memory), effectively improving resource utilization.
-	Nginx	-	A high-performance HTTP and reverse proxy web server, also provides IMAP/POP3/SMTP services.
Node	Node	-	According to cluster configuration, node can be a virtual machine or physical machine.
Node Feature Discovery	Node Feature Discovery	NFD	Kubernetes node feature discovery function. It detects available hardware features on each node in Kubernetes cluster, and uses node labels, annotations and node taints to mark these features.
-	node_exporter	-	Used to collect and expose host system metrics, such as CPU、disk、memory、network, etc. It can be used with Prometheus or other monitoring tools, and supports various collectors and custom metrics.
Non-Uniform Memory Access	Non-Uniform Memory Access	NUMA	NUMA is a memory architecture in modern multi-core and multi-processor systems, it optimizes system memory access speed by dividing processors and memory into multiple nodes.
-	NATS	-	NATS is an open-source, lightweight, high-performance distributed message system, providing publish/subscribe, request/reply and queue subscription communication models.
-	NATS Prometheus Exporter	-	Collects metrics from NATS server monitoring endpoints (such as varz、connz、subsz、routez), including connection count, subscription count, message throughput, transmission rate, client latency, etc. information, used to monitor performance and health status of message system.
Non-Uniform Storage Access	NUMA	-	A computer memory architecture designed for multiple processors.
-	OAuth2-Server	-	Server-side providing OAuth2.0 protocol implementation in openFuyao.
-	Oauth-proxy	-	Provides OAuth2-based identity verification and authorization functionality. It can help protect web applications or APIs, ensuring only verified users can access protected content.
Offline Workload	Offline Workload	-	Business with relatively low quality of service requirements and not sensitive to response latency, such as big data analysis, transcoding, AI training, etc.
Online Workload	Online Workload	-	Business with relatively high quality of service requirements and sensitive to response latency, such as web services, e-commerce, etc.
Open Authorization 2.0	Open Authorization 2.0	OAuth2.0	OAuth 2.0 is industry-standard authorization protocol. OAuth 2.0 focuses on simplifying client developer work, while providing specific authorization flows for web applications, desktop applications, mobile phones and living room devices.
Operating System	Operating System	OS	A built-in program used to coordinate various hardware of computer and interact with users. Common ones include Windows, macOS and open-source Linux.
Organization	Organization	O	Field in X.509 certificate, used to identify organization to which certificate holder belongs.
Persistent Volume	Persistent Volume	PV	A block of storage in cluster, responsible for provisioning and management by administrator.
Persistent Volume Claim	Persistent Volume Claim	PVC	Kubernetes storage resource for file storage.
pipeline parallel	pipeline parallel	PP	A technique that splits model by layers into multiple parts and performs computation on different devices.
-	Pod	-	Smallest deployable computing unit created and managed in Kubernetes.
Prefill Stage	Prefill	-	Process from user completing prompt input to generating first token.
Prefill-Decode Disaggregation	Prefill-Decode Disaggregation	PD	An architecture that schedules Prefill and Decode two stages in large model inference process to execute on different hardware clusters to optimize resource allocation and improve system performance.
Prometheus	Prometheus	-	An open-source system monitoring and alerting tool set, used to collect and process real-time metrics. Periodically pulls monitoring data from target services or proxies through HTTP protocol and stores it in highly available time-series database. Users can use PromQL query language to query, aggregate and visualize this data, and trigger alerts based on preset rules.
Prompt	Prompt	-	Information user inputs to model, model generates output meeting expectations.
Public Key Infrastructure	Public Key Infrastructure	PKI	System used to manage digital certificates and public key-private key pairs.
Public-Key Cryptography Standards #1	Public-Key Cryptography Standards #1	PKCS#1	RSA encryption standard, defines storage format of RSA private key.
Privacy-Enhanced Mail	Privacy-Enhanced Mail	PEM	A Base64-encoded certificate and key storage format.
Profile	Profile	Profile	Configuration item in signing policy configuration, defines issuance parameters for specific type of certificates.
-	PodGroup	-	Job developer declares a PodGroup, scheduler performs resource judgment and reservation in units of "groups", ensuring all Pods of job can start simultaneously (Gang Scheduling).
Quality of Service	Quality of Service	QoS	Kubernetes divides Pods into Guaranteed、Burstable and BestEffort three quality of service levels based on Pod resource requests and limits, used to determine resource allocation and eviction priority. Business can also customize QoS policies to distinguish priority and resource guarantee of different services or tasks. QoS helps improve stability of key businesses and resource utilization.
-	RayCluster	-	Basic Ray cluster, consisting of 1 head node and 0 to multiple worker nodes forming application cluster.
-	RayJob	-	Used to submit and execute single job. Each submitted job independently creates a Ray cluster, executes tasks after cluster is ready, and automatically destroys after task completion, achieving cluster-level isolation.
-	RayService	-	Deploys Ray Serve, creates independent Ray cluster during deployment, and supports capabilities such as service hot update, high availability, etc.
Resource Acquisition Is Initialization	Resource Acquisition Is Initialization	RAII	A C++ resource management paradigm that binds lifecycle of resources such as memory, file handles, locks to local objects, acquires resources through construction, automatically releases through destruction, reducing resource leaks and improving exception safety.
Remote Direct Memory Access	Remote Direct Memory Access	RDMA	Technology of directly accessing storage of another computer from storage of one computer. It enables network card to directly access application memory, supporting zero-copy network communication.
Remote Procedure Call	Remote Procedure Call	RPC	A computer communication protocol. This protocol allows program running on one computer to call subroutine in another address space (usually a computer on open network), and programmer can call like local program without extra programming for this interaction (i.e., no need to pay attention to details).
Resource	Resource	-	Built-in resources and custom resources in Kubernetes.
Resource Overselling	Resource Overselling	-	When online business in colocation is in business trough, remaining amount of resources requested is often relatively high, resource overselling is behavior of dynamically allocating this part of resources to colocation offline jobs.
ResourceClaim	ResourceClaim	-	A namespace-level API object in Kubernetes DRA framework, used to represent specific request for certain type of device resources, and binds to actual resource instance on specific node during scheduling and allocation process.
ResourceClaimTemplate	ResourceClaimTemplate	-	A namespace-level API object in Kubernetes DRA framework, used to define template specification for generating ResourceClaim.
ResourceSlice	ResourceSlice	-	A cluster-level API object in Kubernetes DRA framework, used to represent a set of structured resource instances available for allocation by certain DRA Driver on specific node.
Role-based access control	Role-based access control	RBAC	An access control method. Manages access permissions to system resources through user roles. In RBAC, permissions are associated with roles, users obtain corresponding permissions through roles they belong to.
-	Secret	-	An object containing small amount of sensitive information such as passwords, tokens or keys.
Security Engine	Security Engine	SEC	KAE hardware security acceleration engine module.
Service	Service	-	Method in Kubernetes to expose network application running on one or a group of Pods as network service.
Service Level Object	Service Level Object	SLO	Target ensuring provided service meets customer expectations.
-	ServiceMonitor	-	One of core abstractions of PrometheusOperator to monitoring system, through ServiceMonitor can conveniently perform metric monitoring.
Silence	Silence	-	Basic capability provided by alerting component, performs alert matching according to set silence rules, once there is successfully matched alert, that alert is silenced, i.e., not pushed.
-	Spring Cloud	-	A complete microservice solution based on Spring Boot framework.
-	StatefulSet	-	Workload API object used to manage stateful applications.
Subject Alternative Name	Subject Alternative Name	SAN	Extension field in certificate used to specify multiple domain names or IP addresses.
scheduler	scheduler	-	Kubernetes scheduler, responsible for scheduling Pods to appropriate nodes to run.
-	Segment	-	Represents a continuous address space that can be remotely read and written.
tensor parallel	tensor parallel	TP	A technique that splits model weight matrix into multiple parts and performs computation on different devices.
-	Tier Backend	-	Backend system supporting tiered storage in Mooncake, responsible for uniformly managing KV cache data on different storage tiers.
Transport Layer Security	Transport Layer Security	TLS	Used to provide encrypted communication on network.
Time To First Token	Time To First Token	TTFT	Latency from input to outputting first token in large model inference.
Universally Unique Identifier	Universally Unique Identifier	UUID	UUID is a standard for software construction, composed of timestamp, clock sequence and globally unique node identifier (such as hash value of hostname).
Video Random Access Memory	Video Random Access Memory	VRAM	High-speed memory dedicated to graphics card, used to temporarily store graphic data such as textures, frame buffers required for rendering.
Virtual Central Processing Unit	vCPU	-	Virtual central processor, processor resource used in virtual environment. It is part of physical CPU, can be independently used by virtual machine. Unlike actual physical CPU, vCPU divides one physical processor into multiple virtual processor cores through hyperthreading technology, achieving resource sharing and dynamic allocation.
VictoriaMetrics	VictoriaMetrics	VM	VictoriaMetrics is a high-performance, low-cost, horizontally scalable open-source time-series database and monitoring solution, specifically designed for large-scale metric data storage and query optimization, compatible with Prometheus ecosystem.
Visual Language Model	Visual Language Model	VLM	Deep learning model trained based on large amount of visual-text data.
-	vLLM	-	An efficient inference engine and framework designed for large language models, can optimize large model inference performance.
Volume	Volume	-	An abstract concept in Kubernetes, used to provide persistent storage for containers in Pods.
Widget	Widget	-	Monitoring component is a component containing name and data charts, displayed in card form.
-	-	xPyD	Architecture form containing x P nodes and y D nodes in PD architecture.
X.509 standard	-	X.509	Public key certificate standard formulated by International Telecommunication Union (ITU), defines format and structure of digital certificates.
Equivalence Class Scheduling	-	-	Treats Pods with same resource requests, affinity, etc. conditions as "equivalence class", one scheduling decision can be applied to entire class, greatly reducing scheduling computation overhead of large-scale jobs.
Topology Awareness	-	-	Combines node network topology (NVLink, RDMA) with hardware information, prioritizes scheduling Pods requiring high-speed communication to same node or adjacent nodes.

View source on GitCode

Terminology ​

Terminology