AI Inference Elastic Scaling Plugin Development Guide

Feature Introduction

Elastic Scaler is a core component of the AI Inference PD-Orchestrator, which adopts a plugin-based architecture to support user-defined scaling decision algorithms and custom resource management logic.

Elastic Scaler provides two types of plugin interfaces:

Scaling Decision Algorithm Plugin (ScalingAlgorithm): Calculates the target replica count based on metric data and supports various custom algorithms.
Resource Management Plugin (ResourceHandler): Supports integration of any custom resource type into the scaling system, handling resource replica count retrieval and updates.

For details, see Elastic Scaler User Guide.

Constraints and Limitations

This section outlines the capability boundaries and usage limitations of the current version of Elastic Scaler plugins, clarifying the applicable scope of plugins.

Functional Limitations

Scaling Decision Algorithm Plugin

In the current version, the custom algorithm invocation implementation path has not yet been implemented.
Custom algorithms can be successfully registered to the DefaultAlgorithmManager, but they will not be invoked during actual scaling decisions.
Temporary workaround: You can wait for subsequent versions to improve the algorithm invocation path, or directly modify the AlgorithmManager.CalculateDesiredReplicas() to implement algorithm dispatch logic.

Operational Limitations

Registration Restrictions

When attempting to register an algorithm with the same name, RegisterAlgorithm() will return an error.
When attempting to register a resource handler with the same apiVersion/kind, RegisterResourceHandler() will panic.
Avoid registering the same plugin multiple times in different init() functions.

Performance Considerations

Computation logic in plugins should be as efficient as possible to avoid blocking the main controller loop.
Complex algorithm computations should be processed asynchronously or use caching.
Frequent API calls may impact K8s API Server performance.

Potential Risks

Incorrect scaling algorithms may lead to excessive scaling up or down, affecting business stability.
Incorrect resource handler implementations may lead to inconsistent target resource states.
It is recommended to fully validate in a test environment before deploying to production.

Environment Preparation

This section describes the environment and tool preparation work that needs to be completed before starting Elastic Scaler plugin development.

Environment Requirements

This subsection describes the hardware and software environment configuration required for Elastic Scaler plugin development and debugging.

Hardware Requirements

The Elastic Scaler plugin development environment has no special hardware requirements. It is recommended to configure according to the following:

CPU: 4 cores or more
Memory: 8GB or more
Disk: 20GB or more available space

Software Requirements

Operating System: Linux
Go Environment: Go 1.21 or higher
Docker: Docker 20.10+ or compatible container runtime (such as nerdctl)
Kubernetes Cluster: For deployment and testing
kubectl: For interacting with the K8s cluster

Setting Up the Environment

Clone the code repository.

bash

git clone https://gitcode.com/openFuyao/elastic-scaler.git
cd elastic-scaler

Configure the Go development environment.

bash

# Check Go version
go version

# Set Go proxy (optional, to accelerate dependency downloads)
go env -w GOPROXY=https://goproxy.cn,direct

Install dependencies.

bash

# Download dependencies
go mod download

# Verify dependencies
go mod verify

Verifying Successful Environment Setup

Verify Go environment.

bash

go version
# Should output something like: go version go1.21.x Linux/amd64

Verify dependency integrity.

bash

go mod verify
# Should output: all modules verified

Verify compilation capability.

bash

make build
# or
go build -o bin/elastic-scaler ./cmd/elastic-scaler
# Successful compilation indicates successful environment setup

Verify K8s connection (optional, for testing).

bash

kubectl cluster-info
kubectl get nodes
# Should display cluster information normally

Developing Scaling Decision Algorithm Plugins

This section describes how to implement and integrate custom scaling decision algorithm plugins based on the ScalingAlgorithm interface.

Use Case Overview

Scaling decision algorithm plugins are suitable for scenarios where scaling strategies need to be determined based on specific business logic or metric patterns, for example:

Predictive scaling: Predicting future demand based on historical load data.
Multi-metric joint decision: Considering multiple metrics such as CPU, memory, QPS simultaneously.
Cost-optimized scaling: Minimizing resource costs while meeting SLAs.
Business-aware scaling: Adjusting strategies based on business characteristics such as peak periods and promotional activities.

Component and Interface Description

Table 1 System Component Responsibilities

System Component Name	Description
ElasticScaler CR	Defines scaling strategy, including algorithm name, configuration parameters, and metric sources.
AlgorithmManager	Manages all registered algorithm plugins and is responsible for algorithm dispatch.
ScalingAlgorithm	Algorithm plugin interface implemented by developers for specific scaling logic.

Table 2 Decision Algorithm Plugin Interface Description

Interface Name	Description
CalculateDesiredReplicas	Calculates desired replica count based on metric data and context.

Development Steps

Create algorithm implementation file.

Create a new algorithm implementation file in the pkg/elasticscaler/scaling/ directory, for example custom_algorithm.go.

package scaling

import (
   "context"
   "fmt"

   escontext "gitcode.com/openFuyao/elastic-scaler/pkg/elasticscaler/context"
)

// CustomAlgorithm implements ScalingAlgorithm interface.
type CustomAlgorithm struct {
   // Algorithm configuration parameters
   threshold float64
   cooldownSeconds int32
}

var _ ScalingAlgorithm = &CustomAlgorithm{}

// NewCustomAlgorithm returns a new CustomAlgorithm instance.
func NewCustomAlgorithm() ScalingAlgorithm {
   return &CustomAlgorithm{
      threshold:      0.8,
      cooldownSeconds: 60,
   }
}

// CalculateDesiredReplicas calculates desired replicas based on metrics.
func (a *CustomAlgorithm) CalculateDesiredReplicas(
   ctx context.Context,
   processingCtx *escontext.ScalingAlgorithmContext,
) (int32, error) {
   if processingCtx == nil || processingCtx.ElasticScaler == nil {
      return 0, fmt.Errorf("processing context or Elastic is nil")
   }

   // Get current metrics from context
   // metrics := processingCtx.Metrics
   // currentReplicas := processingCtx.CurrentReplicas

   // TODO: Implement custom scaling algorithm logic
   // Example: Calculate desired replica count based on threshold
   // if metricValue > a.threshold {
   //     return currentReplicas + 1
   // }

   return 0, nil
}

Notice:
Add sufficient error handling and boundary checks in the CalculateDesiredReplicas method.
Use klog to log key computation steps and intermediate results for debugging.
Ensure the returned replica count is within the [minReplicas, maxReplicas] range.
Consider implementing a cooldown mechanism to avoid frequent scaling.

Register the algorithm.
Register the algorithm to DefaultAlgorithmManager in the algorithm implementation's init() function.
go
```
func init() {
   if err := DefaultAlgorithmManager.RegisterAlgorithm("custom", NewCustomAlgorithm()); err != nil {
      // panic(err)
      fmt.Println("register custom algorithm failed", err)
   }
}
```
Table 3 Registration Method Description
Parameter Type Description
name string Algorithm name used to reference in ElasticScaler CR.
algorithm ScalingAlgorithm ScalingAlgorithm interface implementation.
Return Value: error, returns an error if an algorithm with the same name is already registered.
Notice:
- Algorithm names must be unique, it is recommended to use names with business significance.
- When registration fails, you can choose to panic or log it, depending on your needs.
- Avoid registering the same algorithm repeatedly in multiple init() functions.

Parameter	Type	Description
name	string	Algorithm name used to reference in ElasticScaler CR.
algorithm	ScalingAlgorithm	ScalingAlgorithm interface implementation.

Configure ElasticScaler to use the custom algorithm.

Create an ElasticScaler CRD resource specifying the custom algorithm.

yaml

apiVersion: elasticscaler.io/v1alpha1
kind: ElasticScaler
metadata:
name: custom-scaling-example
namespace: ai-inference
spec:
# Scaling target
targetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: vllm-inference

# Minimum/maximum replica count
minReplicas: 2
maxReplicas: 10

# Metric-driven trigger configuration
trigger:
   type: MetricsTrigger
   metricsTrigger:
      # Use custom algorithm
      scalingAlgorithm: custom

      # Algorithm configuration
      algorithmConfig:
      threshold: 0.8
      cooldownSeconds: 60

      # Metric configuration
      metrics:
      - type: External
      external:
         metric:
            name: qps
         target:
            type: AverageValueValue
            averageValue: "100"

Debugging and Verification

This subsection describes the verification steps and troubleshooting approaches for custom scaling algorithm plugins in a cluster.

Verify Algorithm Registration

Execute the following command to start the Elastic Scaler controller.

bash

kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f

Execute the following command to check the logs to confirm successful algorithm registration.
bash
```
# Should see output similar to
register custom algorithm failed <nil>
```

Verify Algorithm Invocation

Execute the following command to create a test ElasticScaler resource.
bash
```
kubectl apply -f test-elastic-scaler.yaml
```
Execute the following command to check ElasticScaler status.
bash
```
kubectl get elasticscalers -n ai-inference -o yaml
```

Execute the following command to check the controller logs to confirm the algorithm is being invoked.

bash

kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f | grep CustomAlgorithm

Notice:
Since the algorithm invocation path is not yet implemented in the current version, you may not see actual algorithm invocation logs.
It is recommended to add klog logs in the algorithm implementation for future debugging.
You can verify the correctness of the algorithm logic through unit tests.

Developing Resource Management Plugins

This section describes how to implement custom resource management plugins and integrate any Kubernetes resource into the Elastic Scaler scaling system.

Use Case Overview

Resource management plugins are suitable for scenarios where custom K8s resource types need to be integrated into the Elastic Scaler scaling system, for example:

Custom workload resources: Such as custom inference service CRDs.
Resources requiring special replica management: Such as complex resources that need to update multiple fields simultaneously.
Cross-resource replica coordination: Such as managing replica counts for multiple related resources simultaneously.

Component and Interface Description

Table 4 System Component Responsibilities

System Component Name	Description
ElasticScaler CR	Defines target resource reference (apiVersion, kind, name).
HandlerFactory	Manages all registered resource handlers and creates corresponding handler instances based on target resource type.
ResourceHandler	Resource handler interface implemented by developers for specific resource management logic.

Table 5 Resource Management Plugin Interface Description

Interface Name	Description
Supports	Checks if the handler supports the given resource type.
GetCurrentReplicas	Gets the current replica count of the target resource.
UpdateReplicas	Updates the replica count of the target resource.
GetPodList	Gets the Pod list managed by the target resource.

Development Steps

Create resource handler implementation file.

Create a new resource handler implementation file in the pkg/elasticscaler/resource/ directory, for example custom_resource_handler.go.

package resource

import (
   "context"
   "fmt"

   "sigs.k8s.io/controller-runtime/pkg/client"
)

// CustomResourceHandler implements ResourceHandler for custom resources.
type CustomResourceHandler struct {
   Client client.Client
}

var _ ResourceHandler = &CustomResourceHandler{}

// NewCustomResourceHandler returns a new CustomResourceHandler instance.
func NewCustomResourceHandler(c client.Client) ResourceHandler {
   return &CustomResourceHandler{Client: c}
}

// Supports checks if the handler supports the given resource type.
func (h *CustomResourceHandler) Supports(targetRef corev1.ObjectReference) bool {
   // TODO: Check if this resource type is supported
   // Example: Only support specific apiVersion and kind
   return targetRef.APIVersion == "custom.example.com/v1alpha1" &&
      targetRef.Kind == "CustomResource"
}

// GetCurrentReplicas gets the current number of replicas.
func (h *CustomResourceHandler) GetCurrentReplicas(
   ctx context.Context,
   targetRef corev1.ObjectReference,
) (int32, error) {
   // TODO: Read current replicas from your CRD structure (spec or status)
   // Example: Read from spec.replicas or status.replicas field
   return 0, fmt.Errorf("not implemented")
}

// UpdateReplicas updates the number of replicas.
func (h *CustomResourceHandler) UpdateReplicas(
   ctx context.Context,
   targetRef corev1.ObjectReference,
   desired int32,
) error {
   // TODO: Update spec.replicas or similar field in your CRD
   // Example: Update resource through client.Update() or client.Patch()
   return fmt.Errorf("not implemented")
}

// GetPodList gets the list of pods managed by the target resource.
func (h *CustomResourceHandler) GetPodList(
   ctx context.Context,
   targetRef corev1.ObjectReference,
) (*corev1.PodList, error) {
   // TODO: List Pods based on selector defined in CRD
   selector := labels.Everything()
   podList := &corev1.PodList{}
   if err := h.Client.List(ctx, podList, &client.ListOptions{
      Namespace:     targetRef.Namespace,
      LabelSelector: selector,
   }); err != nil {
      return nil, err
   }
   return podList, nil
}

Notice:
In the Supports method, match apiVersion and kind precisely to avoid mismatches.
UpdateReplicas should use Patch instead of Update to reduce conflicts.
GetPodList should use selector labels defined in the resource to avoid listing unrelated Pods.
All methods should have sufficient error handling and logging.

Register resource handler.
Register to HandlerFactory in the resource handler implementation's init() function.
go
```
func init() {
   RegisterResourceHandler(apiVersion, kind, func(c client.Client) ResourceHandler {
      return NewCustomResourceHandler(c)
   })
}
```
Table 6 Registration Method Description
Parameter Type Description
apiVersion string API version of the custom resource.
kind string Kind of the custom resource.
factory HandlerFactory HandlerFactory function used to create ResourceHandler instances.
Notice:
- The combination of apiVersion and kind must be unique.
- It is recommended to use the full apiVersion (including group/version).
- Avoid registering the same resource type repeatedly in multiple init() functions.
- The factory function should be lightweight and avoid heavy operations during registration.

Parameter	Type	Description
apiVersion	string	API version of the custom resource.
kind	string	Kind of the custom resource.
factory	HandlerFactory	HandlerFactory function used to create ResourceHandler instances.

Configure ElasticScaler to use custom resources.

Create an ElasticScaler CRD resource specifying the custom resource.

yaml

apiVersion: elasticscaler.io/v1alpha1
kind: ElasticScaler
metadata:
name: custom-resource-scaling
namespace: ai-inference
spec:
# Scaling target (custom resource)
targetRef:
   apiVersion: custom.example.com/v1alpha1
   kind: CustomResource
   name: my-custom-resource
   namespace: ai-inference

# Minimum/maximum replica count
minReplicas: 2
maxReplicas: 10

# Metric-driven trigger configuration
trigger:
   type: MetricsTrigger
   metricsTrigger:
      scalingAlgorithm: HPA
      # Metric configuration
      metrics:
      - type: Resource
      resource:
         metricsName: cpu
         target:
            type: Utilization
            averageUtilization: 70

Debugging and Verification

This subsection describes verification methods and key checkpoints for custom resource management plugins in a real environment.

Verify Handler Registration

Execute the following command to start the Elastic Scaler controller.

bash

kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f

Check the logs to confirm successful handler registration (no panic indicates successful registration).

Verify Resource Management

Execute the following command to create a test custom resource.
bash
```
kubectl apply -f test-custom-resource.yaml
```
Execute the following command to create an ElasticScaler resource.
bash
```
kubectl apply -f test-elastic-scaler.yaml
```
Execute the following command to check ElasticScaler status.
bash
```
kubectl get elasticscalers -n ai-inference -o yaml
```
Execute the following command to check if the target resource replica count is correctly updated.
bash
```
kubectl get customresources -n ai-inference -o yaml
```

Notice:
Ensure the custom resource CRD is properly installed.
Ensure ElasticScaler has permissions to read and write the target resource (RBAC configuration).
It is recommended to fully validate in a test environment before deploying to production.

FAQ

This section summarizes common issues and reference solutions during Elastic Scaler plugin development and usage.

Algorithm Registration Failure
Symptom Description
Algorithm registration fails, and the controller log displays an error message.
Possible Causes
- Attempting to register an algorithm with the same name.
- Algorithm name is empty.
- Algorithm instance is nil.
Solution
1. Use a unique algorithm name, it is recommended to use a name with business significance.
2. Check if an algorithm with the same name has already been registered elsewhere.
3. Check the detailed error information in the controller logs.
4. Ensure the algorithm instance is properly initialized.
Resource Handler Registration Failure
Symptom Description
Resource handler registration fails, and the controller panics.
Possible Causes
- Attempting to register a resource handler with the same apiVersion/kind.
- apiVersion or kind is empty.
- Factory function is nil.
Solution
1. Ensure the apiVersion and kind combination is unique.
2. Check if the same resource type has already been registered elsewhere.
3. Avoid registering repeatedly in multiple init() functions.
4. Ensure the factory function is properly implemented and not nil.
Algorithm Not Being Invoked
Symptom Description
Algorithm is registered successfully, but is not invoked during actual scaling decisions.
Possible Causes
- The algorithm invocation implementation path has not yet been implemented in the current version.
- The core logic of AlgorithmManager.CalculateDesiredReplicas() method has not yet been implemented.
Solution
1. Wait for subsequent versions to improve the algorithm invocation path.
2. Or directly modify the AlgorithmManager.CalculateDesiredReplicas() to implement algorithm dispatch logic.
3. Reference code location: pkg/elasticscaler/scaling/algorithm.go (lines 55-68).
Custom Resource Cannot Scale
Symptom Description
Custom resource handler is registered, but ElasticScaler cannot properly manage resource replica count.
Possible Causes
- ResourceHandler interface implementation is incomplete.
- Supports() method logic is incorrect.
- Insufficient RBAC permissions.
- Custom resource CRD is not properly installed.
Solution
1. Ensure complete implementation of the ResourceHandler interface (all 4 methods).
2. Properly handle GetCurrentReplicas and UpdateReplicas.
3. Supports() method should precisely match apiVersion and Kind.
4. Ensure ElasticScaler has permissions to read and write the target resource (check RBAC configuration).
5. Ensure the custom resource CRD is properly installed to the cluster.
How to Debug Algorithm Computation Logic
Symptom Description
Need to debug algorithm computation logic but unsure how to obtain intermediate results.
Possible Causes
Algorithm implementation is complex and needs to view intermediate computation processes.
Solution
1. Add klog logs in the algorithm implementation.
  go
```
import "k8s.io/klog/v2"

func (a *CustomAlgorithm) CalculateDesiredReplicas(...) (int32, error) {
   klog.Info("CustomAlgorithm: CalculateDesiredReplicas called")
   klog.Infof("Processing context: %+v", processingCtx)
   // Algorithm logic...
   klog.Infof("Calculated desired replicas: %d", desiredReplicas)
   return desiredReplicas, nil
}
```
2. Check the status field of the ElasticScaler CR.
3. Check the metric data in the controller logs.
4. Use unit tests to verify algorithm logic.
How to Support Scale Subresource
Symptom Description
Need to support K8s Scale subresource (such as /scale).
Possible Causes
Need to implement a more standardized resource replica management approach.
Solution Refer to the implementation of DefaultCustomResourceHandler:
1. Prefer using the Scale subresource (such as /scale).
2. If the Scale subresource is not available, fall back to spec.replicas or status.replicas.
3. Use unstructured.Unstructured to handle general resources.
4. Reference code location: pkg/elasticscaler/resource/default_custom_resource_handler.go.

Appendix

This section provides reference materials and extended reading links related to Elastic Scaler plugin development.

Reference Resources

elastic-scaler code repository: https://gitcode.com/openFuyao/elastic-scaler
- pkg/elasticscaler/scaling/: Scaling algorithm implementation examples.
- pkg/elasticscaler/resource/: Resource handler implementation examples.
- pkg/elasticscaler/context/: Context interface definitions.
OFEP 0030 Proposal: https://gitcode.com/openFuyao/ofep/blob/main/ofeps/sig-ai-inference/0030-ofep-通用扩缩容决策框架设计提案.md
- Detailed design documentation and architecture design.
- Interface definitions and use cases.

View source on GitCode

AI Inference Elastic Scaling Plugin Development Guide ​

Feature Introduction ​

Constraints and Limitations ​

Functional Limitations ​

Operational Limitations ​

Environment Preparation ​

Environment Requirements ​

Hardware Requirements ​

Software Requirements ​

Setting Up the Environment ​

Verifying Successful Environment Setup ​

Developing Scaling Decision Algorithm Plugins ​

Use Case Overview ​

Component and Interface Description ​

Development Steps ​

Debugging and Verification ​

Verify Algorithm Registration ​

Verify Algorithm Invocation ​

Developing Resource Management Plugins ​

Use Case Overview ​

Component and Interface Description ​

Development Steps ​

Debugging and Verification ​

Verify Handler Registration ​

Verify Resource Management ​

FAQ ​

Appendix ​

Reference Resources ​

AI Inference Elastic Scaling Plugin Development Guide

Feature Introduction

Constraints and Limitations

Functional Limitations

Operational Limitations

Environment Preparation

Environment Requirements

Hardware Requirements

Software Requirements

Setting Up the Environment

Verifying Successful Environment Setup

Developing Scaling Decision Algorithm Plugins

Use Case Overview

Component and Interface Description

Development Steps

Debugging and Verification

Verify Algorithm Registration

Verify Algorithm Invocation

Developing Resource Management Plugins

Use Case Overview

Component and Interface Description

Development Steps

Debugging and Verification

Verify Handler Registration

Verify Resource Management

FAQ

Appendix

Reference Resources