AI Inference Elastic Scaling Plugin Development Guide
Feature Introduction
Elastic Scaler is a core component of the AI Inference PD-Orchestrator, which adopts a plugin-based architecture to support user-defined scaling decision algorithms and custom resource management logic.
Elastic Scaler provides two types of plugin interfaces:
- Scaling Decision Algorithm Plugin (ScalingAlgorithm): Calculates the target replica count based on metric data and supports various custom algorithms.
- Resource Management Plugin (ResourceHandler): Supports integration of any custom resource type into the scaling system, handling resource replica count retrieval and updates.
For details, see Elastic Scaler User Guide.
Constraints and Limitations
This section outlines the capability boundaries and usage limitations of the current version of Elastic Scaler plugins, clarifying the applicable scope of plugins.
Functional Limitations
Scaling Decision Algorithm Plugin
- In the current version, the custom algorithm invocation implementation path has not yet been implemented.
- Custom algorithms can be successfully registered to the
DefaultAlgorithmManager, but they will not be invoked during actual scaling decisions. - Temporary workaround: You can wait for subsequent versions to improve the algorithm invocation path, or directly modify the
AlgorithmManager.CalculateDesiredReplicas()to implement algorithm dispatch logic.
Operational Limitations
Registration Restrictions
- When attempting to register an algorithm with the same name,
RegisterAlgorithm()will return an error. - When attempting to register a resource handler with the same apiVersion/kind,
RegisterResourceHandler()will panic. - Avoid registering the same plugin multiple times in different
init()functions.
Performance Considerations
- Computation logic in plugins should be as efficient as possible to avoid blocking the main controller loop.
- Complex algorithm computations should be processed asynchronously or use caching.
- Frequent API calls may impact K8s API Server performance.
Potential Risks
- Incorrect scaling algorithms may lead to excessive scaling up or down, affecting business stability.
- Incorrect resource handler implementations may lead to inconsistent target resource states.
- It is recommended to fully validate in a test environment before deploying to production.
Environment Preparation
This section describes the environment and tool preparation work that needs to be completed before starting Elastic Scaler plugin development.
Environment Requirements
This subsection describes the hardware and software environment configuration required for Elastic Scaler plugin development and debugging.
Hardware Requirements
The Elastic Scaler plugin development environment has no special hardware requirements. It is recommended to configure according to the following:
- CPU: 4 cores or more
- Memory: 8GB or more
- Disk: 20GB or more available space
Software Requirements
- Operating System: Linux
- Go Environment: Go 1.21 or higher
- Docker: Docker 20.10+ or compatible container runtime (such as nerdctl)
- Kubernetes Cluster: For deployment and testing
- kubectl: For interacting with the K8s cluster
Setting Up the Environment
Clone the code repository.
bashgit clone https://gitcode.com/openFuyao/elastic-scaler.git cd elastic-scalerConfigure the Go development environment.
bash# Check Go version go version # Set Go proxy (optional, to accelerate dependency downloads) go env -w GOPROXY=https://goproxy.cn,directInstall dependencies.
bash# Download dependencies go mod download # Verify dependencies go mod verify
Verifying Successful Environment Setup
Verify Go environment.
bashgo version # Should output something like: go version go1.21.x Linux/amd64Verify dependency integrity.
bashgo mod verify # Should output: all modules verifiedVerify compilation capability.
bashmake build # or go build -o bin/elastic-scaler ./cmd/elastic-scaler # Successful compilation indicates successful environment setupVerify K8s connection (optional, for testing).
bashkubectl cluster-info kubectl get nodes # Should display cluster information normally
Developing Scaling Decision Algorithm Plugins
This section describes how to implement and integrate custom scaling decision algorithm plugins based on the ScalingAlgorithm interface.
Use Case Overview
Scaling decision algorithm plugins are suitable for scenarios where scaling strategies need to be determined based on specific business logic or metric patterns, for example:
- Predictive scaling: Predicting future demand based on historical load data.
- Multi-metric joint decision: Considering multiple metrics such as CPU, memory, QPS simultaneously.
- Cost-optimized scaling: Minimizing resource costs while meeting SLAs.
- Business-aware scaling: Adjusting strategies based on business characteristics such as peak periods and promotional activities.
Component and Interface Description
Table 1 System Component Responsibilities
| System Component Name | Description |
|---|---|
| ElasticScaler CR | Defines scaling strategy, including algorithm name, configuration parameters, and metric sources. |
| AlgorithmManager | Manages all registered algorithm plugins and is responsible for algorithm dispatch. |
| ScalingAlgorithm | Algorithm plugin interface implemented by developers for specific scaling logic. |
Table 2 Decision Algorithm Plugin Interface Description
| Interface Name | Description |
|---|---|
| CalculateDesiredReplicas | Calculates desired replica count based on metric data and context. |
Development Steps
Create algorithm implementation file.
Create a new algorithm implementation file in the
pkg/elasticscaler/scaling/directory, for examplecustom_algorithm.go.gopackage scaling import ( "context" "fmt" escontext "gitcode.com/openFuyao/elastic-scaler/pkg/elasticscaler/context" ) // CustomAlgorithm implements ScalingAlgorithm interface. type CustomAlgorithm struct { // Algorithm configuration parameters threshold float64 cooldownSeconds int32 } var _ ScalingAlgorithm = &CustomAlgorithm{} // NewCustomAlgorithm returns a new CustomAlgorithm instance. func NewCustomAlgorithm() ScalingAlgorithm { return &CustomAlgorithm{ threshold: 0.8, cooldownSeconds: 60, } } // CalculateDesiredReplicas calculates desired replicas based on metrics. func (a *CustomAlgorithm) CalculateDesiredReplicas( ctx context.Context, processingCtx *escontext.ScalingAlgorithmContext, ) (int32, error) { if processingCtx == nil || processingCtx.ElasticScaler == nil { return 0, fmt.Errorf("processing context or Elastic is nil") } // Get current metrics from context // metrics := processingCtx.Metrics // currentReplicas := processingCtx.CurrentReplicas // TODO: Implement custom scaling algorithm logic // Example: Calculate desired replica count based on threshold // if metricValue > a.threshold { // return currentReplicas + 1 // } return 0, nil }Notice:
- Add sufficient error handling and boundary checks in the
CalculateDesiredReplicasmethod. - Use klog to log key computation steps and intermediate results for debugging.
- Ensure the returned replica count is within the [minReplicas, maxReplicas] range.
- Consider implementing a cooldown mechanism to avoid frequent scaling.
- Add sufficient error handling and boundary checks in the
Register the algorithm.
Register the algorithm to
DefaultAlgorithmManagerin the algorithm implementation'sinit()function.gofunc init() { if err := DefaultAlgorithmManager.RegisterAlgorithm("custom", NewCustomAlgorithm()); err != nil { // panic(err) fmt.Println("register custom algorithm failed", err) } }Table 3 Registration Method Description
Parameter Type Description name string Algorithm name used to reference in ElasticScaler CR. algorithm ScalingAlgorithm ScalingAlgorithm interface implementation. Return Value: error, returns an error if an algorithm with the same name is already registered.
Notice:
- Algorithm names must be unique, it is recommended to use names with business significance.
- When registration fails, you can choose to panic or log it, depending on your needs.
- Avoid registering the same algorithm repeatedly in multiple init() functions.
Configure ElasticScaler to use the custom algorithm.
Create an ElasticScaler CRD resource specifying the custom algorithm.
yamlapiVersion: elasticscaler.io/v1alpha1 kind: ElasticScaler metadata: name: custom-scaling-example namespace: ai-inference spec: # Scaling target targetRef: apiVersion: apps/v1 kind: Deployment name: vllm-inference # Minimum/maximum replica count minReplicas: 2 maxReplicas: 10 # Metric-driven trigger configuration trigger: type: MetricsTrigger metricsTrigger: # Use custom algorithm scalingAlgorithm: custom # Algorithm configuration algorithmConfig: threshold: 0.8 cooldownSeconds: 60 # Metric configuration metrics: - type: External external: metric: name: qps target: type: AverageValueValue averageValue: "100"
Debugging and Verification
This subsection describes the verification steps and troubleshooting approaches for custom scaling algorithm plugins in a cluster.
Verify Algorithm Registration
Execute the following command to start the Elastic Scaler controller.
bashkubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -fExecute the following command to check the logs to confirm successful algorithm registration.
bash# Should see output similar to register custom algorithm failed <nil>
Verify Algorithm Invocation
Execute the following command to create a test ElasticScaler resource.
bashkubectl apply -f test-elastic-scaler.yamlExecute the following command to check ElasticScaler status.
bashkubectl get elasticscalers -n ai-inference -o yamlExecute the following command to check the controller logs to confirm the algorithm is being invoked.
bashkubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f | grep CustomAlgorithm
Notice:
- Since the algorithm invocation path is not yet implemented in the current version, you may not see actual algorithm invocation logs.
- It is recommended to add klog logs in the algorithm implementation for future debugging.
- You can verify the correctness of the algorithm logic through unit tests.
Developing Resource Management Plugins
This section describes how to implement custom resource management plugins and integrate any Kubernetes resource into the Elastic Scaler scaling system.
Use Case Overview
Resource management plugins are suitable for scenarios where custom K8s resource types need to be integrated into the Elastic Scaler scaling system, for example:
- Custom workload resources: Such as custom inference service CRDs.
- Resources requiring special replica management: Such as complex resources that need to update multiple fields simultaneously.
- Cross-resource replica coordination: Such as managing replica counts for multiple related resources simultaneously.
Component and Interface Description
Table 4 System Component Responsibilities
| System Component Name | Description |
|---|---|
| ElasticScaler CR | Defines target resource reference (apiVersion, kind, name). |
| HandlerFactory | Manages all registered resource handlers and creates corresponding handler instances based on target resource type. |
| ResourceHandler | Resource handler interface implemented by developers for specific resource management logic. |
Table 5 Resource Management Plugin Interface Description
| Interface Name | Description |
|---|---|
| Supports | Checks if the handler supports the given resource type. |
| GetCurrentReplicas | Gets the current replica count of the target resource. |
| UpdateReplicas | Updates the replica count of the target resource. |
| GetPodList | Gets the Pod list managed by the target resource. |
Development Steps
Create resource handler implementation file.
Create a new resource handler implementation file in the
pkg/elasticscaler/resource/directory, for examplecustom_resource_handler.go.gopackage resource import ( "context" "fmt" "sigs.k8s.io/controller-runtime/pkg/client" ) // CustomResourceHandler implements ResourceHandler for custom resources. type CustomResourceHandler struct { Client client.Client } var _ ResourceHandler = &CustomResourceHandler{} // NewCustomResourceHandler returns a new CustomResourceHandler instance. func NewCustomResourceHandler(c client.Client) ResourceHandler { return &CustomResourceHandler{Client: c} } // Supports checks if the handler supports the given resource type. func (h *CustomResourceHandler) Supports(targetRef corev1.ObjectReference) bool { // TODO: Check if this resource type is supported // Example: Only support specific apiVersion and kind return targetRef.APIVersion == "custom.example.com/v1alpha1" && targetRef.Kind == "CustomResource" } // GetCurrentReplicas gets the current number of replicas. func (h *CustomResourceHandler) GetCurrentReplicas( ctx context.Context, targetRef corev1.ObjectReference, ) (int32, error) { // TODO: Read current replicas from your CRD structure (spec or status) // Example: Read from spec.replicas or status.replicas field return 0, fmt.Errorf("not implemented") } // UpdateReplicas updates the number of replicas. func (h *CustomResourceHandler) UpdateReplicas( ctx context.Context, targetRef corev1.ObjectReference, desired int32, ) error { // TODO: Update spec.replicas or similar field in your CRD // Example: Update resource through client.Update() or client.Patch() return fmt.Errorf("not implemented") } // GetPodList gets the list of pods managed by the target resource. func (h *CustomResourceHandler) GetPodList( ctx context.Context, targetRef corev1.ObjectReference, ) (*corev1.PodList, error) { // TODO: List Pods based on selector defined in CRD selector := labels.Everything() podList := &corev1.PodList{} if err := h.Client.List(ctx, podList, &client.ListOptions{ Namespace: targetRef.Namespace, LabelSelector: selector, }); err != nil { return nil, err } return podList, nil }Notice:
- In the
Supportsmethod, match apiVersion and kind precisely to avoid mismatches. UpdateReplicasshould use Patch instead of Update to reduce conflicts.GetPodListshould use selector labels defined in the resource to avoid listing unrelated Pods.- All methods should have sufficient error handling and logging.
- In the
Register resource handler.
Register to HandlerFactory in the resource handler implementation's
init()function.gofunc init() { RegisterResourceHandler(apiVersion, kind, func(c client.Client) ResourceHandler { return NewCustomResourceHandler(c) }) }Table 6 Registration Method Description
Parameter Type Description apiVersion string API version of the custom resource. kind string Kind of the custom resource. factory HandlerFactory HandlerFactory function used to create ResourceHandler instances. Notice:
- The combination of apiVersion and kind must be unique.
- It is recommended to use the full apiVersion (including group/version).
- Avoid registering the same resource type repeatedly in multiple init() functions.
- The factory function should be lightweight and avoid heavy operations during registration.
Configure ElasticScaler to use custom resources.
Create an ElasticScaler CRD resource specifying the custom resource.
yamlapiVersion: elasticscaler.io/v1alpha1 kind: ElasticScaler metadata: name: custom-resource-scaling namespace: ai-inference spec: # Scaling target (custom resource) targetRef: apiVersion: custom.example.com/v1alpha1 kind: CustomResource name: my-custom-resource namespace: ai-inference # Minimum/maximum replica count minReplicas: 2 maxReplicas: 10 # Metric-driven trigger configuration trigger: type: MetricsTrigger metricsTrigger: scalingAlgorithm: HPA # Metric configuration metrics: - type: Resource resource: metricsName: cpu target: type: Utilization averageUtilization: 70
Debugging and Verification
This subsection describes verification methods and key checkpoints for custom resource management plugins in a real environment.
Verify Handler Registration
Execute the following command to start the Elastic Scaler controller.
bashkubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -fCheck the logs to confirm successful handler registration (no panic indicates successful registration).
Verify Resource Management
Execute the following command to create a test custom resource.
bashkubectl apply -f test-custom-resource.yamlExecute the following command to create an ElasticScaler resource.
bashkubectl apply -f test-elastic-scaler.yamlExecute the following command to check ElasticScaler status.
bashkubectl get elasticscalers -n ai-inference -o yamlExecute the following command to check if the target resource replica count is correctly updated.
bashkubectl get customresources -n ai-inference -o yaml
Notice:
- Ensure the custom resource CRD is properly installed.
- Ensure ElasticScaler has permissions to read and write the target resource (RBAC configuration).
- It is recommended to fully validate in a test environment before deploying to production.
FAQ
This section summarizes common issues and reference solutions during Elastic Scaler plugin development and usage.
Algorithm Registration Failure
Symptom Description
Algorithm registration fails, and the controller log displays an error message.Possible Causes
- Attempting to register an algorithm with the same name.
- Algorithm name is empty.
- Algorithm instance is nil.
Solution
- Use a unique algorithm name, it is recommended to use a name with business significance.
- Check if an algorithm with the same name has already been registered elsewhere.
- Check the detailed error information in the controller logs.
- Ensure the algorithm instance is properly initialized.
Resource Handler Registration Failure
Symptom Description
Resource handler registration fails, and the controller panics.Possible Causes
- Attempting to register a resource handler with the same
apiVersion/kind. apiVersionorkindis empty.- Factory function is nil.
Solution
- Ensure the
apiVersionandkindcombination is unique. - Check if the same resource type has already been registered elsewhere.
- Avoid registering repeatedly in multiple
init()functions. - Ensure the factory function is properly implemented and not nil.
- Attempting to register a resource handler with the same
Algorithm Not Being Invoked
Symptom Description
Algorithm is registered successfully, but is not invoked during actual scaling decisions.Possible Causes
- The algorithm invocation implementation path has not yet been implemented in the current version.
- The core logic of
AlgorithmManager.CalculateDesiredReplicas()method has not yet been implemented.
Solution
- Wait for subsequent versions to improve the algorithm invocation path.
- Or directly modify the
AlgorithmManager.CalculateDesiredReplicas()to implement algorithm dispatch logic. - Reference code location:
pkg/elasticscaler/scaling/algorithm.go(lines 55-68).
Custom Resource Cannot Scale
Symptom Description
Custom resource handler is registered, but ElasticScaler cannot properly manage resource replica count.Possible Causes
ResourceHandlerinterface implementation is incomplete.Supports()method logic is incorrect.- Insufficient RBAC permissions.
- Custom resource CRD is not properly installed.
Solution
- Ensure complete implementation of the
ResourceHandlerinterface (all 4 methods). - Properly handle
GetCurrentReplicasandUpdateReplicas. Supports()method should precisely matchapiVersionandKind.- Ensure ElasticScaler has permissions to read and write the target resource (check RBAC configuration).
- Ensure the custom resource CRD is properly installed to the cluster.
How to Debug Algorithm Computation Logic
Symptom Description
Need to debug algorithm computation logic but unsure how to obtain intermediate results.Possible Causes
Algorithm implementation is complex and needs to view intermediate computation processes.Solution
- Add klog logs in the algorithm implementation.go
import "k8s.io/klog/v2" func (a *CustomAlgorithm) CalculateDesiredReplicas(...) (int32, error) { klog.Info("CustomAlgorithm: CalculateDesiredReplicas called") klog.Infof("Processing context: %+v", processingCtx) // Algorithm logic... klog.Infof("Calculated desired replicas: %d", desiredReplicas) return desiredReplicas, nil } - Check the
statusfield of the ElasticScaler CR. - Check the metric data in the controller logs.
- Use unit tests to verify algorithm logic.
- Add klog logs in the algorithm implementation.
How to Support Scale Subresource
Symptom Description
Need to support K8s Scale subresource (such as/scale).Possible Causes
Need to implement a more standardized resource replica management approach.Solution Refer to the implementation of
DefaultCustomResourceHandler:- Prefer using the Scale subresource (such as
/scale). - If the Scale subresource is not available, fall back to
spec.replicasorstatus.replicas. - Use
unstructured.Unstructuredto handle general resources. - Reference code location:
pkg/elasticscaler/resource/default_custom_resource_handler.go.
- Prefer using the Scale subresource (such as
Appendix
This section provides reference materials and extended reading links related to Elastic Scaler plugin development.
Reference Resources
elastic-scaler code repository: https://gitcode.com/openFuyao/elastic-scaler
pkg/elasticscaler/scaling/: Scaling algorithm implementation examples.pkg/elasticscaler/resource/: Resource handler implementation examples.pkg/elasticscaler/context/: Context interface definitions.
OFEP 0030 Proposal: https://gitcode.com/openFuyao/ofep/blob/main/ofeps/sig-ai-inference/0030-ofep-通用扩缩容决策框架设计提案.md
- Detailed design documentation and architecture design.
- Interface definitions and use cases.