Skip to main content
Version: v25.09

Mind Inference Service Management Plug-in

Feature Overview

Mind Inference Service (MIS) management plug-in (mis-management) is a unified management plug-in for the Ascend AI inference scenario and provides visualized and declarative full-lifecycle management capabilities for MISs. This plug-in provides the WebUI for you to easily register an MIS, download model weight files, deploy instances, and monitor the status, significantly lowering the MIS usage threshold.

mis_statement NOTE
Currently, mis-management supports text-based large language model (LLM) and vision-language model (VLM) MISs, but does not support vectorized EMB MISs.

Application Scenarios

  • AI engineers need to quickly deploy and manage MISs.
  • Users need to centrally manage multiple inference models and their running instances.

Supported Capabilities

  • Microservice management: You can create, view, modify, and delete MISs on the WebUI.
  • Configuration management: Multiple MIS configurations are supported to adapt to different hardware environments and inference requirements.
  • Automatic pulling: Model weight files can be downloaded in one-click mode, simplifying the deployment process.
  • Instance management: You can view, start, and delete MIS instances on the WebUI.
  • Status monitoring: You can view the operating status of model weight files and instances in real time.

Highlights

  • Visualized WebUI: You can perform all operations on the WebUI without compiling YAML configuration files.
  • Full lifecycle management: The entire process from microservice registration, weight download, instance deployment, to monitoring is included.
  • Deep integration with the MIS operator: Standardized management of MISs is implemented based on the MIS operator.
  • Unified storage management: Model weight files are managed in a unified manner using PVCs, and shared storage is supported.

Basic Concepts

  • Microservices: The inference model and containerized service units of the operating environment of the model are included.
  • Weight files: They are the parameter files required for the model inference. Generally, a weight file is large and needs to be managed separately.
  • Instances: Microservices are running on instances. The number of replicas and resource specifications can be configured.
  • Inference configuration: The inference parameter configuration is optimized for a specific hardware environment.

Implementation Principles

The following figure shows the architecture of each component in the plug-in.

Figure 1 MIS management plug-in architecture

mis_arch.png

The plug-in uses the frontend and backend separation architecture.

  • Frontend: The frontend is a WebUI implemented based on JavaScript and provides user interaction functions.
  • Backend: The backend is an API service written in Go. It processes service logic and calls Kubernetes APIs.
  • Storage: The mounted PVCs are used to implement persistent storage of configuration information and model weight files.

Working process:

  1. A user registers microservice information on the WebUI.
  2. The model weight files are automatically downloaded based on the configuration.
  3. The user deploys microservice instances. The system calls the MIS operator to create the corresponding Kubernetes resources.
  4. The instance status is monitored in real time and displayed on the WebUI.

This plug-in depends on the MIS operator to manage the lifecycle of MIS instances. The MIS operator must be deployed in a Kubernetes cluster in advance.

Installation

Prerequisites

  1. A Kubernetes cluster has been deployed.
  2. The MIS operator has been installed.
  3. PVC storage has been configured for model weight storage.
  4. The Ascend model repository (AscendHub) can be accessed.

Starting the Installation

Before installing the plug-in, ensure that the Kubernetes environment is ready.

  1. Create a namespace.

    kubectl create ns mis-management
  2. Build images.

    nerdctl build -t mis-management:latest -f build/Dockerfile .
  3. Use Helm for service installation.

    helm install -n mis-manager mis-management charts/mis-management
  4. Run the following command to apply the service configuration:

    kubectl apply -f charts/mis-management/templates/service.yaml
  5. After the installation is complete, check whether the service is running properly.

    kubectl get pods -n mis-management
    kubectl get svc -n mis-management

Performing Microservice Management

Prerequisites

You have performed the operations in Installation.

Context

The microservice repository is used to manage metadata information of MISs.

Constraints

Only the MISs provided by AscendHub are supported.

Adding a Microservice

  1. Log in to the openFuyao platform and choose Mind Inference Service from the navigation pane.
  2. Click Microservice Repository. The microservice list page is displayed.
  3. In the upper right corner of the page, click Add Microservice.
  4. In the displayed Add Microservice dialog box, enter the following information:
    • Microservice name (for example, qwen2.5-0.5b-instruct)
    • Image name (for example, swr.cn-south-1.myhuaweicloud.com/ascendhub/qwen2.5-0.5b-instruct:7.1.T9-aarch64)
    • Inference configuration (for example, atlas800ia2-1x32gb-bf16-vllm-default, atlas800ia2-1x32gb-bf16-vllm-latency, and atlas800ia2-1x32gb-bf16-vllm-throughput, separated by carriage returns.) For details, see the optimization configuration details in the inference image description in the MIS repository.
  5. If the value of Weight status is Not downloaded, click Download.
  6. Click OK to add the microservice.
  • Viewing an MIS: Click the name of a created microservice to view its basic information.

  • Editing an MIS: On the basic information page of the microservice, click the edit button mis_op_edit on the right of Image name and Inference configuration to modify the corresponding content.

  • Deleting an MIS: Click mis_op_detail.png and choose Delete in the Operation column. After confirming the risk warning, the MIS is deleted.

Using the Weight Repository

Prerequisites

You have performed the operations in Installation.

Context

The weight repository is used to manage the status and storage of model weight files.

Constraints

Ensure that the storage space is sufficient for model weight files.

Checking the Weight File Status

  1. Log in to the openFuyao platform and choose Mind Inference Service from the navigation pane.

  2. Click Weight Repository to go to the Weight Warehouse page. The list page displays information such as Microservice name, Inference configuration, Weight path, Image size, and Weight status.

  3. Check the weight status.

    Entry 1: Click mis_op_detail and choose Deploy Instance in the Operation column of Microservice Repository.

    Entry 2: Click Deploy Microservice Instance in the upper right corner of Instance Management on the left to view Weight status in the dialog box that is displayed.

    Entry 3: Click Weight Repository in the left to view Weight status.

Downloading the Weight File Again

  1. Download the weight file again in either of the following ways:

    • Method 1 (recommended): Click mis_op_del and choose Delete in the Operation column on the Weight Repository page, set the same parameters as those in the instance deployment scenario, and click Download.

    • Method 2: If the value of Weight status is Failed, click mis_op_detail and choose Re-download in the Operation column on the Weight Repository page.

  2. After the download is complete, the value of Weight status changes to Normal.

Deleting a weight file: Click mis_op_detail and choose Delete in the Operation column. After confirming the risk warning, the weight file is deleted.

mis_statement.gif NOTE
If an inference instance is deployed before the model weight files are downloaded, the deployment may fail.

Instance Management

Prerequisites

You have performed the operations in Installation.

Context

You can click Instance Management to manage the deployment and operating status of inference service instances.

Constraints

Microservice instances must be deployed using the MIS operator.

Deploying Instances

  1. Log in to the openFuyao platform and choose Mind Inference Service from the navigation pane.

  2. You can deploy instances from either of the following entry points:

    • Click Microservice Repository on the left. On the microservice list page that is displayed, click mis_op_detail and choose Deploy Instance in the Operation column.

    • Click Instance Management on the left. On the Instance Management page that is displayed, click Deploy Microservice Instance in the upper right corner.

  3. In the Deploy Instance dialog box that is displayed, perform the following operations:

    • Set Microservice instance name (for example, qwen2.5-0.5b-instruct-instance-1).
    • Select a value (for example, atlas800ia2-1x32gb-bf16-vllm-default) for Inference configuration.
    • Select a value (ClusterIP or NodePort) for Service type.
    • If you select NodePort, you need to specify Port, for example, 30001. The port number ranges from 30000 to 32767.
    • Set NPUs in a single instance (for example, 1). Multiple NPUs can be selected only for instances with parallel inference configurations.
    • Set Instances, for example, 1.
  4. If the value of Weight status is Not downloaded, click Download.

  5. Click OK to create an instance.

  6. View the status in the instance list. Wait until the status changes to Ready.

Follow-up Operations

After the instance is deployed, you can access the MIS API using the URL in the Access path column in the instance list on the Instance Management page. The access addresses for the access path localhost:30001/openai/v1/models are as follows:

# Address for accessing basic model information
http://{Host address}:30001/openai/v1/models

# Address for accessing the chat interface
http://{Host address}:30001/openai/v1/chat/completions

For details about the request parameter format, see the image description in the MIS repository.

  • Viewing the instance status: Click the name of a created instance to view its basic information.

  • Deleting an instance: Click the delete button mis_op_del in the Operation column. After the risk warning is confirmed, it is deleted.

FAQ

  • Microservice deployment failure

    Solutions:

    1. Ensure that the MIS operator has been correctly deployed.
    2. Ensure that the model weight files have been downloaded.
    3. Check whether the Kubernetes cluster resources are sufficient.
    4. Check the instance status details to locate the fault.
  • Model weight file download failure

    Possible causes:

    • The network connection is faulty. As a result, the model repository cannot be accessed.
    • The storage space is insufficient.
    • The image configuration is incorrect.

    Solutions:

    1. Check the network connection and firewall settings.
    2. Ensure that the PVC storage space is sufficient.
    3. Verify whether the image name and configuration are correct.
    4. Delete the corresponding weight file and download it again.

Appendixes

Supported Inference Configurations

Common inference configurations are as follows:

  • atlas800ia2-1x32gb-bf16-vllm-default: default configuration
  • atlas800ia2-1x32gb-bf16-vllm-latency: low-latency optimization configuration
  • atlas800ia2-1x32gb-bf16-vllm-throughput: high-throughput optimization configuration