| Extension | addon | - | Components that need to be installed when creating cluster using openFuyao community BKE installation tool, such as calico. |
| - | All-In-One | AIO | Deploy Kubernetes and fuyao-system and other components on the same node. |
| API Gateway | API Gateway | APIG | A single entry point located between client and API, acting as reverse proxy to route client requests to a group of APIs behind it. |
| Application Programming Interface | Application Programming Interface | API | Some predefined functions, purpose is to provide applications and developers with ability to access a set of routines based on certain software or hardware without needing to access source code or understand details of internal working mechanism. |
| Ascend Computing Language | Ascend Computing Language | ACL | Provides APIs for runtime management, single operator calls, model inference, media data processing, etc., capable of utilizing underlying hardware computing resources to perform deep learning inference computing, graphic image preprocessing, single operator accelerated computing, etc. on CANN platform. |
| Ascend Image Repository | AscendHub | - | Ascend open Docker image repository. |
| API Server | API Server | apiserver | Kubernetes API server, providing cluster REST API interface. |
| - | blackbox_exporter | - | One of the exporters provided by Prometheus official, can provide HTTP、HTTPS、DNS、TCP and ICMP methods to probe network. |
| Bootstrap Node | Bootstrap Node | - | First node created during cluster initialization, used to guide entire cluster creation process. |
| - | BatchTransfer | - | BatchTransfer encapsulates operation requests, specifically responsible for Read/Write data synchronization between non-contiguous group of data spaces in one Segment and corresponding spaces in another group of Segments. |
| - | Mooncake CacheTier | CacheTier | Different tier cache layers under Tier Backend architecture in Mooncake, used for tiered storage of KVCache data. |
| - | cAdvisor | - | A container monitoring tool developed by Google, embedded into Kubernetes as monitoring component. |
| Cloud Native Colocation | Cloud Native Colocation | - | Using cloud-native approach to deploy online and offline businesses in the same cluster, improving overall cluster resource utilization by adjusting online business cluster resource usage during business trough and peak periods, a deployment method. |
| Cloud Native Computing Foundation | Cloud Native Computing Foundation | CNCF | Cloud Native Computing Foundation, an open-source software foundation. |
| - | ConfigMap | - | An API object in Kubernetes, used to save non-sensitive data to key-value pairs. |
| Console | Console | - | Frontend web page console. |
| Container | Container | - | Running instance created based on image, can be started, started, stopped, deleted. Each container is mutually isolated, secure platform. |
| Container memory sharing | Container memory sharing | - | Based on UB memory pooling mechanism, when node or NUMA memory usage rate in bare metal container scenario reaches threshold, trigger memory borrowing, seamlessly sharing part of memory pressure to remote memory pool. |
| Container memory borrowing | Container memory borrowing | - | Memory pooling component of UBS-Core, supports importing/exporting memory blocks in UBS Server cluster through memory pooling capability to achieve goal of cross-node and multi-process shared memory usage on bare metal. |
| Coordinated Universal Time | Coordinated Universal Time | UTC | UTC is a time standard, used to unify time globally. |
| - | CronJob | - | Create Job based on time interval repeated scheduling. |
| - | Custom Resource | CR | Custom resource in Kubernetes. |
| Custom Resource Definition | Custom Resource Definition | CRD | Kubernetes resource extension mechanism, allowing definition of custom resources. |
| Certificate Authority | Certificate Authority | CA | Authority responsible for issuing and managing digital certificates. |
| Certificate Signing Request | Certificate Signing Request | CSR | File containing certificate applicant information, used to apply for certificate from CA. |
| Common Name | Common Name | CN | Field in X.509 certificate, usually used to identify name of certificate holder. |
| Certificate File Extension | Certificate | CRT | Usually used to store X.509 certificate. |
| Configuration Map | Configuration Map | ConfigMap | Configuration object in Kubernetes, used to store non-sensitive configuration data. |
| Certificate Revocation List | Certificate Revocation List | CRL | List containing revoked certificates. |
| Controller Manager | Controller Manager | controller-manager | Kubernetes controller manager, running various controllers to maintain cluster state. |
| - | Customstatlogger | - | vLLM exposes StatLoggerBase abstract class, able to customize metrics and metric reporting methods. |
| - | DaemonSet | - | Ensure running one Pod replica on all (or some) nodes. |
| Monitoring Dashboard | Dashboard | - | Monitoring dashboard consists of multiple user-defined monitoring components, supporting users to monitor various metrics according to their own needs. |
| Data Parallelism | Data Parallelism | DP | Each device will have a complete copy of model, each device independently processes part of dataset, then aggregates their respective gradients. |
| Decode Stage | Decode | - | Process from generating first token to inference stop. |
| - | Deployment | - | Provides declarative update capability for Pod and ReplicaSet. |
| Device Class | DeviceClass | - | A cluster-level API object in Kubernetes DRA framework, used to define abstract category of certain type of device resources and its selection semantics, for ResourceClaim to reference when requesting resources. |
| Device-to-Device Transmission | Device-to-Device Transmission | D2D Transmission | Transmission method where data does not pass through CPU or host memory, directly transferred within same accelerator card or between different accelerator cards. |
| Domain Name System | Domain Name System | DNS | Service that maps domain names and IP addresses to each other, better for network access. |
| - | Dubbo | - | A high-performance service framework open-sourced by Alibaba, possessing high-performance and transparent RPC remote service calls and service governance solutions. |
| Dynamic Resource Allocation | Dynamic Resource Allocation | DRA | A device resource management mechanism provided by Kubernetes, used to implement device resource request, scheduling and allocation through structured API objects, and allowing device drivers to participate in resource selection and allocation decision process. |
| - | DCGM Exporter | - | Collects GPU running and health metrics, including GPU utilization, PCIe transfer rate, temperature, power usage, etc. messages. |
| Embedding | Embedding | EMB | Data vector embedding operation. |
| Extended Key Usage | Extended Key Usage | extKeyUsage | Certificate extension field, specifies specific usage of certificate (such as server authentication, client authentication, etc.). |
| - | etcd | - | Distributed key-value storage system, Kubernetes used to store cluster state and configuration data. |
| EndpointPicker | EndpointPicker | EPP | In Kubernets Gateway API Inference Extension, component responsible for selecting appropriate backend instance, supporting endpoint selection based on different routing policies. |
| - | Felix | - | Calico's node agent component, running on each node, responsible for configuring network, routing and policy rules. |
| Fully qualified domain name | Fully qualified domain name | FQDN | Name with both hostname and domain name, FQDN=Hostname+DomainName. |
| Gigabyte | Gigabyte | GB | A decimal information measurement unit, often used to identify storage capacity of storage media with larger capacity such as computer hard drives, memory. |
| Gateway API Inference Extension | Gateway API Inference Extension | GIE | Extended inference-related capabilities based on Kubernetes Gateway API, used to define and manage routing and traffic policies of inference services. |
| - | Helm | - | A package manager in Kubernetes, used to simplify deploying and managing applications in Kubernetes cluster. |
| - | Helm Chart | - | A core concept of Helm, it is a pre-configured application resource package. |
| High Availability | High Availability | HA | System or service can run with high reliability and continuous availability, maintaining normal operation even when facing hardware failures or other anomalies. |
| High Performance RSA Engine | High Performance RSA Engine | HPRE | KAE high-performance RSA acceleration engine module. |
| High Performance ZIP Engine | High Performance ZIP Engine | ZIP | KAE high-performance zlib/Gzlib compression engine module. |
| Horizontal Pod Autoscaler | Horizontal Pod Autoscaler | HPA | Automatically updates workload resources (such as Deployment or StatefulSet), purpose is to automatically scale workload to meet demand. |
| Host_to_Device Transmission | Host_to_Device Transmission | H2D Transmission | Transmission process of copying data from CPU/host memory to accelerator device video memory such as NPU/GPU. |
| Hypertext Transfer Protocol Secure | Hypertext Transfer Protocol Secure | HTTPS | HTTP channel with security as goal, on basis of HTTP ensures security of transmission process through transmission encryption and identity authentication. |
| Monitoring Indicator | Indicator | - | Monitoring indicators are metrics supported by users for monitoring in data collection system (such as Prometheus), a monitoring indicator can contain multiple monitoring instances. |
| - | Ingress | - | API object that manages external access to services in cluster, can provide load balancing, SSL termination and name-based virtual hosting. |
| Monitoring Instance | Instance | - | Monitoring instance is minimum granularity object that can be monitored on Kubernetes. Each monitoring instance is uniquely identified by set of certain key-value pair labels. |
| - | Job | - | Run one-time tasks in cluster, focusing on executing one-time tasks rather than maintaining specified number of instances running. Job controller will create one or more Pods to run specified tasks. When tasks complete, Job controller will delete Pods. |
| Key-Value Cache | Key-Value Cache | KVCache | Common strategy for large model inference acceleration, works by caching Key (K) and Values (V) matrices generated by self-attention mechanism during large model inference to avoid repeated calculation and improve inference speed. |
| - | kube-apiserver | - | Validates and configures data for API objects, these objects include Pods、Services、Replicationcontrollers, etc. API server provides REST operations and provides frontend for cluster's shared state, all other components interact through this frontend. |
| - | kubectl | - | Command-line tool for Kubernetes API to communicate with Kubernetes cluster control plane. |
| - | kubelet | - | An important component in Kubernetes cluster, running on each node, responsible for managing containers on that node. It is node agent in Kubernetes system, communicates with controllers in main control plane to ensure containers run on nodes as expected. |
| - | Kube-rbac-proxy | - | A lightweight HTTP proxy service designed specifically for Kubernetes, it uses Kubernetes' SubjectAccessReview function to perform RBAC (role-based access control) authorization. This project's goal is to limit communication between Pods, only allowing Pods holding valid and RBAC authorization tokens to access other Pods. |
| - | Kubernetes | K8s | Kubernetes is a portable, scalable open-source platform for managing containerized workloads and services, facilitating declarative configuration and automation. |
| - | kube-state-metrics | - | Collects state metrics about resource objects generated by API Server, such as Deployment、Node、Pod. |
| Kunpeng Accelerator Engine | Kunpeng Accelerator Engine | KAE | Kunpeng Accelerator Engine KAE (Kunpeng Accelerator Engine) is hardware acceleration solution provided based on Kunpeng 920 processor. |
| Key File Extension | Key | KEY | Used to store private key. |
| Kubernetes Configuration | Kubernetes Configuration | kubeconfig | Contains cluster access information, authentication information and context configuration. |
| Key Usage | Key Usage | keyUsage | Certificate extension field, specifies usage of certificate key (such as digital signature, key encryption, etc.). |
| - | kubelet | - | Kubernetes node agent, running on each node, responsible for managing Pods and containers. |
| - | kube-proxy | - | Kubernetes network proxy, responsible for maintaining network rules and load balancing on nodes. |
| Large Language Model | Large Language Model | LLM | Deep learning model trained based on large amount of text data. |
| Load Balancer | Load Balancer | - | Device or service used to distribute network traffic among multiple servers. |
| - | metrics-server | - | One of core components in Kubernetes monitoring system, responsible for collecting resource metrics from Kubelet, then aggregating these metrics monitoring data (depending on kube-aggregator), and exposing them through Metrics API (/apis/metrics.k8s.io/) in Kubernetes Apiserver, but metrics-server only stores latest metrics data (CPU/Memory). |
| Mind Inference Service | Mind Inference Service | MIS | Large model inference API service based on containerized deployment provided by Ascend. |
| Mind Inference Service Operator | Mind Inference Service Operator | MIS-Operator | Component implementing inference microservice instance lifecycle management. |
| - | Mooncake | - | Distributed KVCache storage engine specifically designed for large model inference, improving inference efficiency. |
| Multi-core | Multi-core | - | Refers to integrating large number of processing cores on single chip. Multi-core scenario specifically refers to nodes in cluster with CPU count greater than 256. |
| Mutual TLS | Mutual TLS | mTLS | Using bidirectional encryption channel between server and client. |
| - | Mooncake | - | Open-source community, and proposes LLM service decoupling architecture centered on KVCache. |
| - | Mooncake Store | - | High-performance distributed key-value KVCache storage engine specifically designed for LLM inference scenarios. |
| - | Mooncake Store Master Service | Master Service | In Mooncake Store, responsible for managing logical storage space pool of entire cluster, and handling node join and exit events. |
| - | Mooncake Store Client | Mooncake Client | Mooncake Store client, responsible for initiating get/put requests called by upper-layer applications and providing actual KVCache storage. |
| - | Mooncake Transfer Engine | TE | A high-performance, zero-copy data transmission library designed around two core abstractions of Segment and BatchTransfer. |
| Namespace | Namespace | - | Kubernetes namespace, on platform is smaller resource space mutually isolated within project, also production workspace implemented by users. A project can create multiple namespaces, total resource quota occupied cannot exceed project quota. Namespace divides resource quota with finer granularity while also limiting size of containers under namespace (CPU、memory), effectively improving resource utilization. |
| - | Nginx | - | A high-performance HTTP and reverse proxy web server, also provides IMAP/POP3/SMTP services. |
| Node | Node | - | According to cluster configuration, node can be a virtual machine or physical machine. |
| Node Feature Discovery | Node Feature Discovery | NFD | Kubernetes node feature discovery function. It detects available hardware features on each node in Kubernetes cluster, and uses node labels, annotations and node taints to mark these features. |
| - | node_exporter | - | Used to collect and expose host system metrics, such as CPU、disk、memory、network, etc. It can be used with Prometheus or other monitoring tools, and supports various collectors and custom metrics. |
| Non-Uniform Memory Access | Non-Uniform Memory Access | NUMA | NUMA is a memory architecture in modern multi-core and multi-processor systems, it optimizes system memory access speed by dividing processors and memory into multiple nodes. |
| - | NATS | - | NATS is an open-source, lightweight, high-performance distributed message system, providing publish/subscribe, request/reply and queue subscription communication models. |
| - | NATS Prometheus Exporter | - | Collects metrics from NATS server monitoring endpoints (such as varz、connz、subsz、routez), including connection count, subscription count, message throughput, transmission rate, client latency, etc. information, used to monitor performance and health status of message system. |
| Non-Uniform Storage Access | NUMA | - | A computer memory architecture designed for multiple processors. |
| - | OAuth2-Server | - | Server-side providing OAuth2.0 protocol implementation in openFuyao. |
| - | Oauth-proxy | - | Provides OAuth2-based identity verification and authorization functionality. It can help protect web applications or APIs, ensuring only verified users can access protected content. |
| Offline Workload | Offline Workload | - | Business with relatively low quality of service requirements and not sensitive to response latency, such as big data analysis, transcoding, AI training, etc. |
| Online Workload | Online Workload | - | Business with relatively high quality of service requirements and sensitive to response latency, such as web services, e-commerce, etc. |
| Open Authorization 2.0 | Open Authorization 2.0 | OAuth2.0 | OAuth 2.0 is industry-standard authorization protocol. OAuth 2.0 focuses on simplifying client developer work, while providing specific authorization flows for web applications, desktop applications, mobile phones and living room devices. |
| Operating System | Operating System | OS | A built-in program used to coordinate various hardware of computer and interact with users. Common ones include Windows, macOS and open-source Linux. |
| Organization | Organization | O | Field in X.509 certificate, used to identify organization to which certificate holder belongs. |
| Persistent Volume | Persistent Volume | PV | A block of storage in cluster, responsible for provisioning and management by administrator. |
| Persistent Volume Claim | Persistent Volume Claim | PVC | Kubernetes storage resource for file storage. |
| pipeline parallel | pipeline parallel | PP | A technique that splits model by layers into multiple parts and performs computation on different devices. |
| - | Pod | - | Smallest deployable computing unit created and managed in Kubernetes. |
| Prefill Stage | Prefill | - | Process from user completing prompt input to generating first token. |
| Prefill-Decode Disaggregation | Prefill-Decode Disaggregation | PD | An architecture that schedules Prefill and Decode two stages in large model inference process to execute on different hardware clusters to optimize resource allocation and improve system performance. |
| Prometheus | Prometheus | - | An open-source system monitoring and alerting tool set, used to collect and process real-time metrics. Periodically pulls monitoring data from target services or proxies through HTTP protocol and stores it in highly available time-series database. Users can use PromQL query language to query, aggregate and visualize this data, and trigger alerts based on preset rules. |
| Prompt | Prompt | - | Information user inputs to model, model generates output meeting expectations. |
| Public Key Infrastructure | Public Key Infrastructure | PKI | System used to manage digital certificates and public key-private key pairs. |
| Public-Key Cryptography Standards #1 | Public-Key Cryptography Standards #1 | PKCS#1 | RSA encryption standard, defines storage format of RSA private key. |
| Privacy-Enhanced Mail | Privacy-Enhanced Mail | PEM | A Base64-encoded certificate and key storage format. |
| Profile | Profile | Profile | Configuration item in signing policy configuration, defines issuance parameters for specific type of certificates. |
| - | PodGroup | - | Job developer declares a PodGroup, scheduler performs resource judgment and reservation in units of "groups", ensuring all Pods of job can start simultaneously (Gang Scheduling). |
| Quality of Service | Quality of Service | QoS | Kubernetes divides Pods into Guaranteed、Burstable and BestEffort three quality of service levels based on Pod resource requests and limits, used to determine resource allocation and eviction priority. Business can also customize QoS policies to distinguish priority and resource guarantee of different services or tasks. QoS helps improve stability of key businesses and resource utilization. |
| - | RayCluster | - | Basic Ray cluster, consisting of 1 head node and 0 to multiple worker nodes forming application cluster. |
| - | RayJob | - | Used to submit and execute single job. Each submitted job independently creates a Ray cluster, executes tasks after cluster is ready, and automatically destroys after task completion, achieving cluster-level isolation. |
| - | RayService | - | Deploys Ray Serve, creates independent Ray cluster during deployment, and supports capabilities such as service hot update, high availability, etc. |
| Resource Acquisition Is Initialization | Resource Acquisition Is Initialization | RAII | A C++ resource management paradigm that binds lifecycle of resources such as memory, file handles, locks to local objects, acquires resources through construction, automatically releases through destruction, reducing resource leaks and improving exception safety. |
| Remote Direct Memory Access | Remote Direct Memory Access | RDMA | Technology of directly accessing storage of another computer from storage of one computer. It enables network card to directly access application memory, supporting zero-copy network communication. |
| Remote Procedure Call | Remote Procedure Call | RPC | A computer communication protocol. This protocol allows program running on one computer to call subroutine in another address space (usually a computer on open network), and programmer can call like local program without extra programming for this interaction (i.e., no need to pay attention to details). |
| Resource | Resource | - | Built-in resources and custom resources in Kubernetes. |
| Resource Overselling | Resource Overselling | - | When online business in colocation is in business trough, remaining amount of resources requested is often relatively high, resource overselling is behavior of dynamically allocating this part of resources to colocation offline jobs. |
| ResourceClaim | ResourceClaim | - | A namespace-level API object in Kubernetes DRA framework, used to represent specific request for certain type of device resources, and binds to actual resource instance on specific node during scheduling and allocation process. |
| ResourceClaimTemplate | ResourceClaimTemplate | - | A namespace-level API object in Kubernetes DRA framework, used to define template specification for generating ResourceClaim. |
| ResourceSlice | ResourceSlice | - | A cluster-level API object in Kubernetes DRA framework, used to represent a set of structured resource instances available for allocation by certain DRA Driver on specific node. |
| Role-based access control | Role-based access control | RBAC | An access control method. Manages access permissions to system resources through user roles. In RBAC, permissions are associated with roles, users obtain corresponding permissions through roles they belong to. |
| - | Secret | - | An object containing small amount of sensitive information such as passwords, tokens or keys. |
| Security Engine | Security Engine | SEC | KAE hardware security acceleration engine module. |
| Service | Service | - | Method in Kubernetes to expose network application running on one or a group of Pods as network service. |
| Service Level Object | Service Level Object | SLO | Target ensuring provided service meets customer expectations. |
| - | ServiceMonitor | - | One of core abstractions of PrometheusOperator to monitoring system, through ServiceMonitor can conveniently perform metric monitoring. |
| Silence | Silence | - | Basic capability provided by alerting component, performs alert matching according to set silence rules, once there is successfully matched alert, that alert is silenced, i.e., not pushed. |
| - | Spring Cloud | - | A complete microservice solution based on Spring Boot framework. |
| - | StatefulSet | - | Workload API object used to manage stateful applications. |
| Subject Alternative Name | Subject Alternative Name | SAN | Extension field in certificate used to specify multiple domain names or IP addresses. |
| scheduler | scheduler | - | Kubernetes scheduler, responsible for scheduling Pods to appropriate nodes to run. |
| - | Segment | - | Represents a continuous address space that can be remotely read and written. |
| tensor parallel | tensor parallel | TP | A technique that splits model weight matrix into multiple parts and performs computation on different devices. |
| - | Tier Backend | - | Backend system supporting tiered storage in Mooncake, responsible for uniformly managing KV cache data on different storage tiers. |
| Transport Layer Security | Transport Layer Security | TLS | Used to provide encrypted communication on network. |
| Time To First Token | Time To First Token | TTFT | Latency from input to outputting first token in large model inference. |
| Universally Unique Identifier | Universally Unique Identifier | UUID | UUID is a standard for software construction, composed of timestamp, clock sequence and globally unique node identifier (such as hash value of hostname). |
| Video Random Access Memory | Video Random Access Memory | VRAM | High-speed memory dedicated to graphics card, used to temporarily store graphic data such as textures, frame buffers required for rendering. |
| Virtual Central Processing Unit | vCPU | - | Virtual central processor, processor resource used in virtual environment. It is part of physical CPU, can be independently used by virtual machine. Unlike actual physical CPU, vCPU divides one physical processor into multiple virtual processor cores through hyperthreading technology, achieving resource sharing and dynamic allocation. |
| VictoriaMetrics | VictoriaMetrics | VM | VictoriaMetrics is a high-performance, low-cost, horizontally scalable open-source time-series database and monitoring solution, specifically designed for large-scale metric data storage and query optimization, compatible with Prometheus ecosystem. |
| Visual Language Model | Visual Language Model | VLM | Deep learning model trained based on large amount of visual-text data. |
| - | vLLM | - | An efficient inference engine and framework designed for large language models, can optimize large model inference performance. |
| Volume | Volume | - | An abstract concept in Kubernetes, used to provide persistent storage for containers in Pods. |
| Widget | Widget | - | Monitoring component is a component containing name and data charts, displayed in card form. |
| - | - | xPyD | Architecture form containing x P nodes and y D nodes in PD architecture. |
| X.509 standard | - | X.509 | Public key certificate standard formulated by International Telecommunication Union (ITU), defines format and structure of digital certificates. |
| Equivalence Class Scheduling | - | - | Treats Pods with same resource requests, affinity, etc. conditions as "equivalence class", one scheduling decision can be applied to entire class, greatly reducing scheduling computation overhead of large-scale jobs. |
| Topology Awareness | - | - | Combines node network topology (NVLink, RDMA) with hardware information, prioritizes scheduling Pods requiring high-speed communication to same node or adjacent nodes. |