Version: v26.03

AI Inference Elastic Scaling Plugin Development Guide

Feature Introduction

Elastic Scaler is a core component of the AI Inference PD-Orchestrator, which adopts a plugin-based architecture to support user-defined scaling decision algorithms and custom resource management logic.

Elastic Scaler provides two types of plugin interfaces:

  • Scaling Decision Algorithm Plugin (ScalingAlgorithm): Calculates the target replica count based on metric data and supports various custom algorithms.
  • Resource Management Plugin (ResourceHandler): Supports integration of any custom resource type into the scaling system, handling resource replica count retrieval and updates.

For details, see Elastic Scaler User Guide.

Constraints and Limitations

This section outlines the capability boundaries and usage limitations of the current version of Elastic Scaler plugins, clarifying the applicable scope of plugins.

Functional Limitations

Scaling Decision Algorithm Plugin

  • In the current version, the custom algorithm invocation implementation path has not yet been implemented.
  • Custom algorithms can be successfully registered to the DefaultAlgorithmManager, but they will not be invoked during actual scaling decisions.
  • Temporary workaround: You can wait for subsequent versions to improve the algorithm invocation path, or directly modify the AlgorithmManager.CalculateDesiredReplicas() to implement algorithm dispatch logic.

Operational Limitations

Registration Restrictions

  • When attempting to register an algorithm with the same name, RegisterAlgorithm() will return an error.
  • When attempting to register a resource handler with the same apiVersion/kind, RegisterResourceHandler() will panic.
  • Avoid registering the same plugin multiple times in different init() functions.

Performance Considerations

  • Computation logic in plugins should be as efficient as possible to avoid blocking the main controller loop.
  • Complex algorithm computations should be processed asynchronously or use caching.
  • Frequent API calls may impact K8s API Server performance.

Potential Risks

  • Incorrect scaling algorithms may lead to excessive scaling up or down, affecting business stability.
  • Incorrect resource handler implementations may lead to inconsistent target resource states.
  • It is recommended to fully validate in a test environment before deploying to production.

Environment Preparation

This section describes the environment and tool preparation work that needs to be completed before starting Elastic Scaler plugin development.

Environment Requirements

This subsection describes the hardware and software environment configuration required for Elastic Scaler plugin development and debugging.

Hardware Requirements

The Elastic Scaler plugin development environment has no special hardware requirements. It is recommended to configure according to the following:

  • CPU: 4 cores or more
  • Memory: 8GB or more
  • Disk: 20GB or more available space

Software Requirements

  • Operating System: Linux
  • Go Environment: Go 1.21 or higher
  • Docker: Docker 20.10+ or compatible container runtime (such as nerdctl)
  • Kubernetes Cluster: For deployment and testing
  • kubectl: For interacting with the K8s cluster

Setting Up the Environment

  1. Clone the code repository.

    bash
    git clone https://gitcode.com/openFuyao/elastic-scaler.git
    cd elastic-scaler
  2. Configure the Go development environment.

    bash
    # Check Go version
    go version
    
    # Set Go proxy (optional, to accelerate dependency downloads)
    go env -w GOPROXY=https://goproxy.cn,direct
  3. Install dependencies.

    bash
    # Download dependencies
    go mod download
    
    # Verify dependencies
    go mod verify

Verifying Successful Environment Setup

  1. Verify Go environment.

    bash
    go version
    # Should output something like: go version go1.21.x Linux/amd64
  2. Verify dependency integrity.

    bash
    go mod verify
    # Should output: all modules verified
  3. Verify compilation capability.

    bash
    make build
    # or
    go build -o bin/elastic-scaler ./cmd/elastic-scaler
    # Successful compilation indicates successful environment setup
  4. Verify K8s connection (optional, for testing).

    bash
    kubectl cluster-info
    kubectl get nodes
    # Should display cluster information normally

Developing Scaling Decision Algorithm Plugins

This section describes how to implement and integrate custom scaling decision algorithm plugins based on the ScalingAlgorithm interface.

Use Case Overview

Scaling decision algorithm plugins are suitable for scenarios where scaling strategies need to be determined based on specific business logic or metric patterns, for example:

  • Predictive scaling: Predicting future demand based on historical load data.
  • Multi-metric joint decision: Considering multiple metrics such as CPU, memory, QPS simultaneously.
  • Cost-optimized scaling: Minimizing resource costs while meeting SLAs.
  • Business-aware scaling: Adjusting strategies based on business characteristics such as peak periods and promotional activities.

Component and Interface Description

Table 1 System Component Responsibilities

System Component NameDescription
ElasticScaler CRDefines scaling strategy, including algorithm name, configuration parameters, and metric sources.
AlgorithmManagerManages all registered algorithm plugins and is responsible for algorithm dispatch.
ScalingAlgorithmAlgorithm plugin interface implemented by developers for specific scaling logic.

Table 2 Decision Algorithm Plugin Interface Description

Interface NameDescription
CalculateDesiredReplicasCalculates desired replica count based on metric data and context.

Development Steps

  1. Create algorithm implementation file.

    Create a new algorithm implementation file in the pkg/elasticscaler/scaling/ directory, for example custom_algorithm.go.

    go
    package scaling
    
    import (
       "context"
       "fmt"
    
       escontext "gitcode.com/openFuyao/elastic-scaler/pkg/elasticscaler/context"
    )
    
    // CustomAlgorithm implements ScalingAlgorithm interface.
    type CustomAlgorithm struct {
       // Algorithm configuration parameters
       threshold float64
       cooldownSeconds int32
    }
    
    var _ ScalingAlgorithm = &CustomAlgorithm{}
    
    // NewCustomAlgorithm returns a new CustomAlgorithm instance.
    func NewCustomAlgorithm() ScalingAlgorithm {
       return &CustomAlgorithm{
          threshold:      0.8,
          cooldownSeconds: 60,
       }
    }
    
    // CalculateDesiredReplicas calculates desired replicas based on metrics.
    func (a *CustomAlgorithm) CalculateDesiredReplicas(
       ctx context.Context,
       processingCtx *escontext.ScalingAlgorithmContext,
    ) (int32, error) {
       if processingCtx == nil || processingCtx.ElasticScaler == nil {
          return 0, fmt.Errorf("processing context or Elastic is nil")
       }
    
       // Get current metrics from context
       // metrics := processingCtx.Metrics
       // currentReplicas := processingCtx.CurrentReplicas
    
       // TODO: Implement custom scaling algorithm logic
       // Example: Calculate desired replica count based on threshold
       // if metricValue > a.threshold {
       //     return currentReplicas + 1
       // }
    
       return 0, nil
    }

    Notice:

    • Add sufficient error handling and boundary checks in the CalculateDesiredReplicas method.
    • Use klog to log key computation steps and intermediate results for debugging.
    • Ensure the returned replica count is within the [minReplicas, maxReplicas] range.
    • Consider implementing a cooldown mechanism to avoid frequent scaling.
  2. Register the algorithm.

    Register the algorithm to DefaultAlgorithmManager in the algorithm implementation's init() function.

    go
    func init() {
       if err := DefaultAlgorithmManager.RegisterAlgorithm("custom", NewCustomAlgorithm()); err != nil {
          // panic(err)
          fmt.Println("register custom algorithm failed", err)
       }
    }

    Table 3 Registration Method Description

    ParameterTypeDescription
    namestringAlgorithm name used to reference in ElasticScaler CR.
    algorithmScalingAlgorithmScalingAlgorithm interface implementation.

    Return Value: error, returns an error if an algorithm with the same name is already registered.

    Notice:

    • Algorithm names must be unique, it is recommended to use names with business significance.
    • When registration fails, you can choose to panic or log it, depending on your needs.
    • Avoid registering the same algorithm repeatedly in multiple init() functions.
  3. Configure ElasticScaler to use the custom algorithm.

    Create an ElasticScaler CRD resource specifying the custom algorithm.

    yaml
    apiVersion: elasticscaler.io/v1alpha1
    kind: ElasticScaler
    metadata:
    name: custom-scaling-example
    namespace: ai-inference
    spec:
    # Scaling target
    targetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: vllm-inference
    
    # Minimum/maximum replica count
    minReplicas: 2
    maxReplicas: 10
    
    # Metric-driven trigger configuration
    trigger:
       type: MetricsTrigger
       metricsTrigger:
          # Use custom algorithm
          scalingAlgorithm: custom
    
          # Algorithm configuration
          algorithmConfig:
          threshold: 0.8
          cooldownSeconds: 60
    
          # Metric configuration
          metrics:
          - type: External
          external:
             metric:
                name: qps
             target:
                type: AverageValueValue
                averageValue: "100"

Debugging and Verification

This subsection describes the verification steps and troubleshooting approaches for custom scaling algorithm plugins in a cluster.

Verify Algorithm Registration

  1. Execute the following command to start the Elastic Scaler controller.

    bash
    kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f
  2. Execute the following command to check the logs to confirm successful algorithm registration.

    bash
    # Should see output similar to
    register custom algorithm failed <nil>

Verify Algorithm Invocation

  1. Execute the following command to create a test ElasticScaler resource.

    bash
    kubectl apply -f test-elastic-scaler.yaml
  2. Execute the following command to check ElasticScaler status.

    bash
    kubectl get elasticscalers -n ai-inference -o yaml
  3. Execute the following command to check the controller logs to confirm the algorithm is being invoked.

    bash
    kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f | grep CustomAlgorithm

Notice:

  • Since the algorithm invocation path is not yet implemented in the current version, you may not see actual algorithm invocation logs.
  • It is recommended to add klog logs in the algorithm implementation for future debugging.
  • You can verify the correctness of the algorithm logic through unit tests.

Developing Resource Management Plugins

This section describes how to implement custom resource management plugins and integrate any Kubernetes resource into the Elastic Scaler scaling system.

Use Case Overview

Resource management plugins are suitable for scenarios where custom K8s resource types need to be integrated into the Elastic Scaler scaling system, for example:

  • Custom workload resources: Such as custom inference service CRDs.
  • Resources requiring special replica management: Such as complex resources that need to update multiple fields simultaneously.
  • Cross-resource replica coordination: Such as managing replica counts for multiple related resources simultaneously.

Component and Interface Description

Table 4 System Component Responsibilities

System Component NameDescription
ElasticScaler CRDefines target resource reference (apiVersion, kind, name).
HandlerFactoryManages all registered resource handlers and creates corresponding handler instances based on target resource type.
ResourceHandlerResource handler interface implemented by developers for specific resource management logic.

Table 5 Resource Management Plugin Interface Description

Interface NameDescription
SupportsChecks if the handler supports the given resource type.
GetCurrentReplicasGets the current replica count of the target resource.
UpdateReplicasUpdates the replica count of the target resource.
GetPodListGets the Pod list managed by the target resource.

Development Steps

  1. Create resource handler implementation file.

    Create a new resource handler implementation file in the pkg/elasticscaler/resource/ directory, for example custom_resource_handler.go.

    go
    package resource
    
    import (
       "context"
       "fmt"
    
       "sigs.k8s.io/controller-runtime/pkg/client"
    )
    
    // CustomResourceHandler implements ResourceHandler for custom resources.
    type CustomResourceHandler struct {
       Client client.Client
    }
    
    var _ ResourceHandler = &CustomResourceHandler{}
    
    // NewCustomResourceHandler returns a new CustomResourceHandler instance.
    func NewCustomResourceHandler(c client.Client) ResourceHandler {
       return &CustomResourceHandler{Client: c}
    }
    
    // Supports checks if the handler supports the given resource type.
    func (h *CustomResourceHandler) Supports(targetRef corev1.ObjectReference) bool {
       // TODO: Check if this resource type is supported
       // Example: Only support specific apiVersion and kind
       return targetRef.APIVersion == "custom.example.com/v1alpha1" &&
          targetRef.Kind == "CustomResource"
    }
    
    // GetCurrentReplicas gets the current number of replicas.
    func (h *CustomResourceHandler) GetCurrentReplicas(
       ctx context.Context,
       targetRef corev1.ObjectReference,
    ) (int32, error) {
       // TODO: Read current replicas from your CRD structure (spec or status)
       // Example: Read from spec.replicas or status.replicas field
       return 0, fmt.Errorf("not implemented")
    }
    
    // UpdateReplicas updates the number of replicas.
    func (h *CustomResourceHandler) UpdateReplicas(
       ctx context.Context,
       targetRef corev1.ObjectReference,
       desired int32,
    ) error {
       // TODO: Update spec.replicas or similar field in your CRD
       // Example: Update resource through client.Update() or client.Patch()
       return fmt.Errorf("not implemented")
    }
    
    // GetPodList gets the list of pods managed by the target resource.
    func (h *CustomResourceHandler) GetPodList(
       ctx context.Context,
       targetRef corev1.ObjectReference,
    ) (*corev1.PodList, error) {
       // TODO: List Pods based on selector defined in CRD
       selector := labels.Everything()
       podList := &corev1.PodList{}
       if err := h.Client.List(ctx, podList, &client.ListOptions{
          Namespace:     targetRef.Namespace,
          LabelSelector: selector,
       }); err != nil {
          return nil, err
       }
       return podList, nil
    }

    Notice:

    • In the Supports method, match apiVersion and kind precisely to avoid mismatches.
    • UpdateReplicas should use Patch instead of Update to reduce conflicts.
    • GetPodList should use selector labels defined in the resource to avoid listing unrelated Pods.
    • All methods should have sufficient error handling and logging.
  2. Register resource handler.

    Register to HandlerFactory in the resource handler implementation's init() function.

    go
    func init() {
       RegisterResourceHandler(apiVersion, kind, func(c client.Client) ResourceHandler {
          return NewCustomResourceHandler(c)
       })
    }

    Table 6 Registration Method Description

    ParameterTypeDescription
    apiVersionstringAPI version of the custom resource.
    kindstringKind of the custom resource.
    factoryHandlerFactoryHandlerFactory function used to create ResourceHandler instances.

    Notice:

    • The combination of apiVersion and kind must be unique.
    • It is recommended to use the full apiVersion (including group/version).
    • Avoid registering the same resource type repeatedly in multiple init() functions.
    • The factory function should be lightweight and avoid heavy operations during registration.
  3. Configure ElasticScaler to use custom resources.

    Create an ElasticScaler CRD resource specifying the custom resource.

    yaml
    apiVersion: elasticscaler.io/v1alpha1
    kind: ElasticScaler
    metadata:
    name: custom-resource-scaling
    namespace: ai-inference
    spec:
    # Scaling target (custom resource)
    targetRef:
       apiVersion: custom.example.com/v1alpha1
       kind: CustomResource
       name: my-custom-resource
       namespace: ai-inference
    
    # Minimum/maximum replica count
    minReplicas: 2
    maxReplicas: 10
    
    # Metric-driven trigger configuration
    trigger:
       type: MetricsTrigger
       metricsTrigger:
          scalingAlgorithm: HPA
          # Metric configuration
          metrics:
          - type: Resource
          resource:
             metricsName: cpu
             target:
                type: Utilization
                averageUtilization: 70

Debugging and Verification

This subsection describes verification methods and key checkpoints for custom resource management plugins in a real environment.

Verify Handler Registration

  1. Execute the following command to start the Elastic Scaler controller.

    bash
    kubectl logs -n ai-inference -l control-plane=elastic-scaler-controller-manager -f
  2. Check the logs to confirm successful handler registration (no panic indicates successful registration).

Verify Resource Management

  1. Execute the following command to create a test custom resource.

    bash
    kubectl apply -f test-custom-resource.yaml
  2. Execute the following command to create an ElasticScaler resource.

    bash
    kubectl apply -f test-elastic-scaler.yaml
  3. Execute the following command to check ElasticScaler status.

    bash
    kubectl get elasticscalers -n ai-inference -o yaml
  4. Execute the following command to check if the target resource replica count is correctly updated.

    bash
    kubectl get customresources -n ai-inference -o yaml

Notice:

  • Ensure the custom resource CRD is properly installed.
  • Ensure ElasticScaler has permissions to read and write the target resource (RBAC configuration).
  • It is recommended to fully validate in a test environment before deploying to production.

FAQ

This section summarizes common issues and reference solutions during Elastic Scaler plugin development and usage.

  1. Algorithm Registration Failure

    Symptom Description
    Algorithm registration fails, and the controller log displays an error message.

    Possible Causes

    • Attempting to register an algorithm with the same name.
    • Algorithm name is empty.
    • Algorithm instance is nil.

    Solution

    1. Use a unique algorithm name, it is recommended to use a name with business significance.
    2. Check if an algorithm with the same name has already been registered elsewhere.
    3. Check the detailed error information in the controller logs.
    4. Ensure the algorithm instance is properly initialized.
  2. Resource Handler Registration Failure

    Symptom Description
    Resource handler registration fails, and the controller panics.

    Possible Causes

    • Attempting to register a resource handler with the same apiVersion/kind.
    • apiVersion or kind is empty.
    • Factory function is nil.

    Solution

    1. Ensure the apiVersion and kind combination is unique.
    2. Check if the same resource type has already been registered elsewhere.
    3. Avoid registering repeatedly in multiple init() functions.
    4. Ensure the factory function is properly implemented and not nil.
  3. Algorithm Not Being Invoked

    Symptom Description
    Algorithm is registered successfully, but is not invoked during actual scaling decisions.

    Possible Causes

    • The algorithm invocation implementation path has not yet been implemented in the current version.
    • The core logic of AlgorithmManager.CalculateDesiredReplicas() method has not yet been implemented.

    Solution

    1. Wait for subsequent versions to improve the algorithm invocation path.
    2. Or directly modify the AlgorithmManager.CalculateDesiredReplicas() to implement algorithm dispatch logic.
    3. Reference code location: pkg/elasticscaler/scaling/algorithm.go (lines 55-68).
  4. Custom Resource Cannot Scale

    Symptom Description
    Custom resource handler is registered, but ElasticScaler cannot properly manage resource replica count.

    Possible Causes

    • ResourceHandler interface implementation is incomplete.
    • Supports() method logic is incorrect.
    • Insufficient RBAC permissions.
    • Custom resource CRD is not properly installed.

    Solution

    1. Ensure complete implementation of the ResourceHandler interface (all 4 methods).
    2. Properly handle GetCurrentReplicas and UpdateReplicas.
    3. Supports() method should precisely match apiVersion and Kind.
    4. Ensure ElasticScaler has permissions to read and write the target resource (check RBAC configuration).
    5. Ensure the custom resource CRD is properly installed to the cluster.
  5. How to Debug Algorithm Computation Logic

    Symptom Description
    Need to debug algorithm computation logic but unsure how to obtain intermediate results.

    Possible Causes
    Algorithm implementation is complex and needs to view intermediate computation processes.

    Solution

    1. Add klog logs in the algorithm implementation.
      go
      import "k8s.io/klog/v2"
      
      func (a *CustomAlgorithm) CalculateDesiredReplicas(...) (int32, error) {
         klog.Info("CustomAlgorithm: CalculateDesiredReplicas called")
         klog.Infof("Processing context: %+v", processingCtx)
         // Algorithm logic...
         klog.Infof("Calculated desired replicas: %d", desiredReplicas)
         return desiredReplicas, nil
      }
    2. Check the status field of the ElasticScaler CR.
    3. Check the metric data in the controller logs.
    4. Use unit tests to verify algorithm logic.
  6. How to Support Scale Subresource

    Symptom Description
    Need to support K8s Scale subresource (such as /scale).

    Possible Causes
    Need to implement a more standardized resource replica management approach.

    Solution Refer to the implementation of DefaultCustomResourceHandler:

    1. Prefer using the Scale subresource (such as /scale).
    2. If the Scale subresource is not available, fall back to spec.replicas or status.replicas.
    3. Use unstructured.Unstructured to handle general resources.
    4. Reference code location: pkg/elasticscaler/resource/default_custom_resource_handler.go.

Appendix

This section provides reference materials and extended reading links related to Elastic Scaler plugin development.

Reference Resources