Version: v26.03

Terminology

Chinese NameEnglish Full NameAbbreviationDescription
Extensionaddon-Components that need to be installed when creating cluster using openFuyao community BKE installation tool, such as calico.
-All-In-OneAIODeploy Kubernetes and fuyao-system and other components on the same node.
API GatewayAPI GatewayAPIGA single entry point located between client and API, acting as reverse proxy to route client requests to a group of APIs behind it.
Application Programming InterfaceApplication Programming InterfaceAPISome predefined functions, purpose is to provide applications and developers with ability to access a set of routines based on certain software or hardware without needing to access source code or understand details of internal working mechanism.
Ascend Computing LanguageAscend Computing LanguageACLProvides APIs for runtime management, single operator calls, model inference, media data processing, etc., capable of utilizing underlying hardware computing resources to perform deep learning inference computing, graphic image preprocessing, single operator accelerated computing, etc. on CANN platform.
Ascend Image RepositoryAscendHub-Ascend open Docker image repository.
API ServerAPI ServerapiserverKubernetes API server, providing cluster REST API interface.
-blackbox_exporter-One of the exporters provided by Prometheus official, can provide HTTP、HTTPS、DNS、TCP and ICMP methods to probe network.
Bootstrap NodeBootstrap Node-First node created during cluster initialization, used to guide entire cluster creation process.
-BatchTransfer-BatchTransfer encapsulates operation requests, specifically responsible for Read/Write data synchronization between non-contiguous group of data spaces in one Segment and corresponding spaces in another group of Segments.
-Mooncake CacheTierCacheTierDifferent tier cache layers under Tier Backend architecture in Mooncake, used for tiered storage of KVCache data.
-cAdvisor-A container monitoring tool developed by Google, embedded into Kubernetes as monitoring component.
Cloud Native ColocationCloud Native Colocation-Using cloud-native approach to deploy online and offline businesses in the same cluster, improving overall cluster resource utilization by adjusting online business cluster resource usage during business trough and peak periods, a deployment method.
Cloud Native Computing FoundationCloud Native Computing FoundationCNCFCloud Native Computing Foundation, an open-source software foundation.
-ConfigMap-An API object in Kubernetes, used to save non-sensitive data to key-value pairs.
ConsoleConsole-Frontend web page console.
ContainerContainer-Running instance created based on image, can be started, started, stopped, deleted. Each container is mutually isolated, secure platform.
Container memory sharingContainer memory sharing-Based on UB memory pooling mechanism, when node or NUMA memory usage rate in bare metal container scenario reaches threshold, trigger memory borrowing, seamlessly sharing part of memory pressure to remote memory pool.
Container memory borrowingContainer memory borrowing-Memory pooling component of UBS-Core, supports importing/exporting memory blocks in UBS Server cluster through memory pooling capability to achieve goal of cross-node and multi-process shared memory usage on bare metal.
Coordinated Universal TimeCoordinated Universal TimeUTCUTC is a time standard, used to unify time globally.
-CronJob-Create Job based on time interval repeated scheduling.
-Custom ResourceCRCustom resource in Kubernetes.
Custom Resource DefinitionCustom Resource DefinitionCRDKubernetes resource extension mechanism, allowing definition of custom resources.
Certificate AuthorityCertificate AuthorityCAAuthority responsible for issuing and managing digital certificates.
Certificate Signing RequestCertificate Signing RequestCSRFile containing certificate applicant information, used to apply for certificate from CA.
Common NameCommon NameCNField in X.509 certificate, usually used to identify name of certificate holder.
Certificate File ExtensionCertificateCRTUsually used to store X.509 certificate.
Configuration MapConfiguration MapConfigMapConfiguration object in Kubernetes, used to store non-sensitive configuration data.
Certificate Revocation ListCertificate Revocation ListCRLList containing revoked certificates.
Controller ManagerController Managercontroller-managerKubernetes controller manager, running various controllers to maintain cluster state.
-Customstatlogger-vLLM exposes StatLoggerBase abstract class, able to customize metrics and metric reporting methods.
-DaemonSet-Ensure running one Pod replica on all (or some) nodes.
Monitoring DashboardDashboard-Monitoring dashboard consists of multiple user-defined monitoring components, supporting users to monitor various metrics according to their own needs.
Data ParallelismData ParallelismDPEach device will have a complete copy of model, each device independently processes part of dataset, then aggregates their respective gradients.
Decode StageDecode-Process from generating first token to inference stop.
-Deployment-Provides declarative update capability for Pod and ReplicaSet.
Device ClassDeviceClass-A cluster-level API object in Kubernetes DRA framework, used to define abstract category of certain type of device resources and its selection semantics, for ResourceClaim to reference when requesting resources.
Device-to-Device TransmissionDevice-to-Device TransmissionD2D TransmissionTransmission method where data does not pass through CPU or host memory, directly transferred within same accelerator card or between different accelerator cards.
Domain Name SystemDomain Name SystemDNSService that maps domain names and IP addresses to each other, better for network access.
-Dubbo-A high-performance service framework open-sourced by Alibaba, possessing high-performance and transparent RPC remote service calls and service governance solutions.
Dynamic Resource AllocationDynamic Resource AllocationDRAA device resource management mechanism provided by Kubernetes, used to implement device resource request, scheduling and allocation through structured API objects, and allowing device drivers to participate in resource selection and allocation decision process.
-DCGM Exporter-Collects GPU running and health metrics, including GPU utilization, PCIe transfer rate, temperature, power usage, etc. messages.
EmbeddingEmbeddingEMBData vector embedding operation.
Extended Key UsageExtended Key UsageextKeyUsageCertificate extension field, specifies specific usage of certificate (such as server authentication, client authentication, etc.).
-etcd-Distributed key-value storage system, Kubernetes used to store cluster state and configuration data.
EndpointPickerEndpointPickerEPPIn Kubernets Gateway API Inference Extension, component responsible for selecting appropriate backend instance, supporting endpoint selection based on different routing policies.
-Felix-Calico's node agent component, running on each node, responsible for configuring network, routing and policy rules.
Fully qualified domain nameFully qualified domain nameFQDNName with both hostname and domain name, FQDN=Hostname+DomainName.
GigabyteGigabyteGBA decimal information measurement unit, often used to identify storage capacity of storage media with larger capacity such as computer hard drives, memory.
Gateway API Inference ExtensionGateway API Inference ExtensionGIEExtended inference-related capabilities based on Kubernetes Gateway API, used to define and manage routing and traffic policies of inference services.
-Helm-A package manager in Kubernetes, used to simplify deploying and managing applications in Kubernetes cluster.
-Helm Chart-A core concept of Helm, it is a pre-configured application resource package.
High AvailabilityHigh AvailabilityHASystem or service can run with high reliability and continuous availability, maintaining normal operation even when facing hardware failures or other anomalies.
High Performance RSA EngineHigh Performance RSA EngineHPREKAE high-performance RSA acceleration engine module.
High Performance ZIP EngineHigh Performance ZIP EngineZIPKAE high-performance zlib/Gzlib compression engine module.
Horizontal Pod AutoscalerHorizontal Pod AutoscalerHPAAutomatically updates workload resources (such as Deployment or StatefulSet), purpose is to automatically scale workload to meet demand.
Host_to_Device TransmissionHost_to_Device TransmissionH2D TransmissionTransmission process of copying data from CPU/host memory to accelerator device video memory such as NPU/GPU.
Hypertext Transfer Protocol SecureHypertext Transfer Protocol SecureHTTPSHTTP channel with security as goal, on basis of HTTP ensures security of transmission process through transmission encryption and identity authentication.
Monitoring IndicatorIndicator-Monitoring indicators are metrics supported by users for monitoring in data collection system (such as Prometheus), a monitoring indicator can contain multiple monitoring instances.
-Ingress-API object that manages external access to services in cluster, can provide load balancing, SSL termination and name-based virtual hosting.
Monitoring InstanceInstance-Monitoring instance is minimum granularity object that can be monitored on Kubernetes. Each monitoring instance is uniquely identified by set of certain key-value pair labels.
-Job-Run one-time tasks in cluster, focusing on executing one-time tasks rather than maintaining specified number of instances running. Job controller will create one or more Pods to run specified tasks. When tasks complete, Job controller will delete Pods.
Key-Value CacheKey-Value CacheKVCacheCommon strategy for large model inference acceleration, works by caching Key (K) and Values (V) matrices generated by self-attention mechanism during large model inference to avoid repeated calculation and improve inference speed.
-kube-apiserver-Validates and configures data for API objects, these objects include Pods、Services、Replicationcontrollers, etc. API server provides REST operations and provides frontend for cluster's shared state, all other components interact through this frontend.
-kubectl-Command-line tool for Kubernetes API to communicate with Kubernetes cluster control plane.
-kubelet-An important component in Kubernetes cluster, running on each node, responsible for managing containers on that node. It is node agent in Kubernetes system, communicates with controllers in main control plane to ensure containers run on nodes as expected.
-Kube-rbac-proxy-A lightweight HTTP proxy service designed specifically for Kubernetes, it uses Kubernetes' SubjectAccessReview function to perform RBAC (role-based access control) authorization. This project's goal is to limit communication between Pods, only allowing Pods holding valid and RBAC authorization tokens to access other Pods.
-KubernetesK8sKubernetes is a portable, scalable open-source platform for managing containerized workloads and services, facilitating declarative configuration and automation.
-kube-state-metrics-Collects state metrics about resource objects generated by API Server, such as Deployment、Node、Pod.
Kunpeng Accelerator EngineKunpeng Accelerator EngineKAEKunpeng Accelerator Engine KAE (Kunpeng Accelerator Engine) is hardware acceleration solution provided based on Kunpeng 920 processor.
Key File ExtensionKeyKEYUsed to store private key.
Kubernetes ConfigurationKubernetes ConfigurationkubeconfigContains cluster access information, authentication information and context configuration.
Key UsageKey UsagekeyUsageCertificate extension field, specifies usage of certificate key (such as digital signature, key encryption, etc.).
-kubelet-Kubernetes node agent, running on each node, responsible for managing Pods and containers.
-kube-proxy-Kubernetes network proxy, responsible for maintaining network rules and load balancing on nodes.
Large Language ModelLarge Language ModelLLMDeep learning model trained based on large amount of text data.
Load BalancerLoad Balancer-Device or service used to distribute network traffic among multiple servers.
-metrics-server-One of core components in Kubernetes monitoring system, responsible for collecting resource metrics from Kubelet, then aggregating these metrics monitoring data (depending on kube-aggregator), and exposing them through Metrics API (/apis/metrics.k8s.io/) in Kubernetes Apiserver, but metrics-server only stores latest metrics data (CPU/Memory).
Mind Inference ServiceMind Inference ServiceMISLarge model inference API service based on containerized deployment provided by Ascend.
Mind Inference Service OperatorMind Inference Service OperatorMIS-OperatorComponent implementing inference microservice instance lifecycle management.
-Mooncake-Distributed KVCache storage engine specifically designed for large model inference, improving inference efficiency.
Multi-coreMulti-core-Refers to integrating large number of processing cores on single chip. Multi-core scenario specifically refers to nodes in cluster with CPU count greater than 256.
Mutual TLSMutual TLSmTLSUsing bidirectional encryption channel between server and client.
-Mooncake-Open-source community, and proposes LLM service decoupling architecture centered on KVCache.
-Mooncake Store-High-performance distributed key-value KVCache storage engine specifically designed for LLM inference scenarios.
-Mooncake Store Master ServiceMaster ServiceIn Mooncake Store, responsible for managing logical storage space pool of entire cluster, and handling node join and exit events.
-Mooncake Store ClientMooncake ClientMooncake Store client, responsible for initiating get/put requests called by upper-layer applications and providing actual KVCache storage.
-Mooncake Transfer EngineTEA high-performance, zero-copy data transmission library designed around two core abstractions of Segment and BatchTransfer.
NamespaceNamespace-Kubernetes namespace, on platform is smaller resource space mutually isolated within project, also production workspace implemented by users. A project can create multiple namespaces, total resource quota occupied cannot exceed project quota. Namespace divides resource quota with finer granularity while also limiting size of containers under namespace (CPU、memory), effectively improving resource utilization.
-Nginx-A high-performance HTTP and reverse proxy web server, also provides IMAP/POP3/SMTP services.
NodeNode-According to cluster configuration, node can be a virtual machine or physical machine.
Node Feature DiscoveryNode Feature DiscoveryNFDKubernetes node feature discovery function. It detects available hardware features on each node in Kubernetes cluster, and uses node labels, annotations and node taints to mark these features.
-node_exporter-Used to collect and expose host system metrics, such as CPU、disk、memory、network, etc. It can be used with Prometheus or other monitoring tools, and supports various collectors and custom metrics.
Non-Uniform Memory AccessNon-Uniform Memory AccessNUMANUMA is a memory architecture in modern multi-core and multi-processor systems, it optimizes system memory access speed by dividing processors and memory into multiple nodes.
-NATS-NATS is an open-source, lightweight, high-performance distributed message system, providing publish/subscribe, request/reply and queue subscription communication models.
-NATS Prometheus Exporter-Collects metrics from NATS server monitoring endpoints (such as varz、connz、subsz、routez), including connection count, subscription count, message throughput, transmission rate, client latency, etc. information, used to monitor performance and health status of message system.
Non-Uniform Storage AccessNUMA-A computer memory architecture designed for multiple processors.
-OAuth2-Server-Server-side providing OAuth2.0 protocol implementation in openFuyao.
-Oauth-proxy-Provides OAuth2-based identity verification and authorization functionality. It can help protect web applications or APIs, ensuring only verified users can access protected content.
Offline WorkloadOffline Workload-Business with relatively low quality of service requirements and not sensitive to response latency, such as big data analysis, transcoding, AI training, etc.
Online WorkloadOnline Workload-Business with relatively high quality of service requirements and sensitive to response latency, such as web services, e-commerce, etc.
Open Authorization 2.0Open Authorization 2.0OAuth2.0OAuth 2.0 is industry-standard authorization protocol. OAuth 2.0 focuses on simplifying client developer work, while providing specific authorization flows for web applications, desktop applications, mobile phones and living room devices.
Operating SystemOperating SystemOSA built-in program used to coordinate various hardware of computer and interact with users. Common ones include Windows, macOS and open-source Linux.
OrganizationOrganizationOField in X.509 certificate, used to identify organization to which certificate holder belongs.
Persistent VolumePersistent VolumePVA block of storage in cluster, responsible for provisioning and management by administrator.
Persistent Volume ClaimPersistent Volume ClaimPVCKubernetes storage resource for file storage.
pipeline parallelpipeline parallelPPA technique that splits model by layers into multiple parts and performs computation on different devices.
-Pod-Smallest deployable computing unit created and managed in Kubernetes.
Prefill StagePrefill-Process from user completing prompt input to generating first token.
Prefill-Decode DisaggregationPrefill-Decode DisaggregationPDAn architecture that schedules Prefill and Decode two stages in large model inference process to execute on different hardware clusters to optimize resource allocation and improve system performance.
PrometheusPrometheus-An open-source system monitoring and alerting tool set, used to collect and process real-time metrics. Periodically pulls monitoring data from target services or proxies through HTTP protocol and stores it in highly available time-series database. Users can use PromQL query language to query, aggregate and visualize this data, and trigger alerts based on preset rules.
PromptPrompt-Information user inputs to model, model generates output meeting expectations.
Public Key InfrastructurePublic Key InfrastructurePKISystem used to manage digital certificates and public key-private key pairs.
Public-Key Cryptography Standards #1Public-Key Cryptography Standards #1PKCS#1RSA encryption standard, defines storage format of RSA private key.
Privacy-Enhanced MailPrivacy-Enhanced MailPEMA Base64-encoded certificate and key storage format.
ProfileProfileProfileConfiguration item in signing policy configuration, defines issuance parameters for specific type of certificates.
-PodGroup-Job developer declares a PodGroup, scheduler performs resource judgment and reservation in units of "groups", ensuring all Pods of job can start simultaneously (Gang Scheduling).
Quality of ServiceQuality of ServiceQoSKubernetes divides Pods into Guaranteed、Burstable and BestEffort three quality of service levels based on Pod resource requests and limits, used to determine resource allocation and eviction priority. Business can also customize QoS policies to distinguish priority and resource guarantee of different services or tasks. QoS helps improve stability of key businesses and resource utilization.
-RayCluster-Basic Ray cluster, consisting of 1 head node and 0 to multiple worker nodes forming application cluster.
-RayJob-Used to submit and execute single job. Each submitted job independently creates a Ray cluster, executes tasks after cluster is ready, and automatically destroys after task completion, achieving cluster-level isolation.
-RayService-Deploys Ray Serve, creates independent Ray cluster during deployment, and supports capabilities such as service hot update, high availability, etc.
Resource Acquisition Is InitializationResource Acquisition Is InitializationRAIIA C++ resource management paradigm that binds lifecycle of resources such as memory, file handles, locks to local objects, acquires resources through construction, automatically releases through destruction, reducing resource leaks and improving exception safety.
Remote Direct Memory AccessRemote Direct Memory AccessRDMATechnology of directly accessing storage of another computer from storage of one computer. It enables network card to directly access application memory, supporting zero-copy network communication.
Remote Procedure CallRemote Procedure CallRPCA computer communication protocol. This protocol allows program running on one computer to call subroutine in another address space (usually a computer on open network), and programmer can call like local program without extra programming for this interaction (i.e., no need to pay attention to details).
ResourceResource-Built-in resources and custom resources in Kubernetes.
Resource OversellingResource Overselling-When online business in colocation is in business trough, remaining amount of resources requested is often relatively high, resource overselling is behavior of dynamically allocating this part of resources to colocation offline jobs.
ResourceClaimResourceClaim-A namespace-level API object in Kubernetes DRA framework, used to represent specific request for certain type of device resources, and binds to actual resource instance on specific node during scheduling and allocation process.
ResourceClaimTemplateResourceClaimTemplate-A namespace-level API object in Kubernetes DRA framework, used to define template specification for generating ResourceClaim.
ResourceSliceResourceSlice-A cluster-level API object in Kubernetes DRA framework, used to represent a set of structured resource instances available for allocation by certain DRA Driver on specific node.
Role-based access controlRole-based access controlRBACAn access control method. Manages access permissions to system resources through user roles. In RBAC, permissions are associated with roles, users obtain corresponding permissions through roles they belong to.
-Secret-An object containing small amount of sensitive information such as passwords, tokens or keys.
Security EngineSecurity EngineSECKAE hardware security acceleration engine module.
ServiceService-Method in Kubernetes to expose network application running on one or a group of Pods as network service.
Service Level ObjectService Level ObjectSLOTarget ensuring provided service meets customer expectations.
-ServiceMonitor-One of core abstractions of PrometheusOperator to monitoring system, through ServiceMonitor can conveniently perform metric monitoring.
SilenceSilence-Basic capability provided by alerting component, performs alert matching according to set silence rules, once there is successfully matched alert, that alert is silenced, i.e., not pushed.
-Spring Cloud-A complete microservice solution based on Spring Boot framework.
-StatefulSet-Workload API object used to manage stateful applications.
Subject Alternative NameSubject Alternative NameSANExtension field in certificate used to specify multiple domain names or IP addresses.
schedulerscheduler-Kubernetes scheduler, responsible for scheduling Pods to appropriate nodes to run.
-Segment-Represents a continuous address space that can be remotely read and written.
tensor paralleltensor parallelTPA technique that splits model weight matrix into multiple parts and performs computation on different devices.
-Tier Backend-Backend system supporting tiered storage in Mooncake, responsible for uniformly managing KV cache data on different storage tiers.
Transport Layer SecurityTransport Layer SecurityTLSUsed to provide encrypted communication on network.
Time To First TokenTime To First TokenTTFTLatency from input to outputting first token in large model inference.
Universally Unique IdentifierUniversally Unique IdentifierUUIDUUID is a standard for software construction, composed of timestamp, clock sequence and globally unique node identifier (such as hash value of hostname).
Video Random Access MemoryVideo Random Access MemoryVRAMHigh-speed memory dedicated to graphics card, used to temporarily store graphic data such as textures, frame buffers required for rendering.
Virtual Central Processing UnitvCPU-Virtual central processor, processor resource used in virtual environment. It is part of physical CPU, can be independently used by virtual machine. Unlike actual physical CPU, vCPU divides one physical processor into multiple virtual processor cores through hyperthreading technology, achieving resource sharing and dynamic allocation.
VictoriaMetricsVictoriaMetricsVMVictoriaMetrics is a high-performance, low-cost, horizontally scalable open-source time-series database and monitoring solution, specifically designed for large-scale metric data storage and query optimization, compatible with Prometheus ecosystem.
Visual Language ModelVisual Language ModelVLMDeep learning model trained based on large amount of visual-text data.
-vLLM-An efficient inference engine and framework designed for large language models, can optimize large model inference performance.
VolumeVolume-An abstract concept in Kubernetes, used to provide persistent storage for containers in Pods.
WidgetWidget-Monitoring component is a component containing name and data charts, displayed in card form.
--xPyDArchitecture form containing x P nodes and y D nodes in PD architecture.
X.509 standard-X.509Public key certificate standard formulated by International Telecommunication Union (ITU), defines format and structure of digital certificates.
Equivalence Class Scheduling--Treats Pods with same resource requests, affinity, etc. conditions as "equivalence class", one scheduling decision can be applied to entire class, greatly reducing scheduling computation overhead of large-scale jobs.
Topology Awareness--Combines node network topology (NVLink, RDMA) with hardware information, prioritizes scheduling Pods requiring high-speed communication to same node or adjacent nodes.