Memory Borrowing

Feature Introduction

Based on UB memory pooling mechanism, when bare-metal container scenario node or numa memory usage reaches a preset value, memory borrowing is triggered to share some memory pressure to borrowed memory.

Application Scenarios

High memory density computing scenarios: Suitable for scenarios with large numbers of Pods or containers deployed on a single node, improving node memory utilization and reducing hardware costs through memory oversubscription and borrowing mechanisms.

Capability Scope

Architecture Support: Supports operating system openEuler 24.03 LTS and above, Kubernetes v1.31.1 and above, architecture aarch64.
Memory Management:
- Supports node-level and NUMA-level memory waterline monitoring and alerting.
- Supports transparent container memory borrowing, maximum 25% remote memory oversubscription ratio.
Specification Limitations:
- Transparent memory borrowing only supports 4K standard page scenarios.
- Memory sharing only applies to bare-metal container scenarios, not VM container scenarios.
- kube-matrix-agent single instance can manage up to 150 Pods, 300 containers, 300 processes.

Highlight Features

Transparent Memory Borrowing: Applications can enjoy memory pooling benefits without code modifications, such as automatic cold-hot data exchange, with extremely low performance overhead (single container performance degradation <5%).
Flexible Scheduling Strategy: Supports both NUMA-bound and non-NUMA-bound scenarios, providing NUMA-level and node-level waterline alerts respectively, adapting to different business needs.
Plugin Deployment: Quick deployment through Helm Chart, seamlessly integrating with existing K8s ecosystem.

Implementation Principle

Transparent Memory Borrowing

Figure 1 Container Transparent Memory Borrowing

Monitoring and Triggering: kube-matrix-agent monitors node/NUMA memory usage in real-time. When exceeding preset waterline (such as 92%), borrowing process is triggered.
Interface Invocation: Agent calls UBSE interface to request remote memory.
Cold-Hot Exchange: UBSE enables cold-hot page swapping for container processes, migrating cold data to remote memory and keeping hot data local.
Escape Strategy: Supports configuring watermark-escape-strategy label to distinguish NUMA-level or node-level alert strategies; supports configuring remote-mem-allocation-ratio to control oversubscription ratio.

Depends on UBS Engine: This solution strongly depends on underlying ubs-engine and its memory pooling components, which need to be pre-installed.
Hardware Constraints: Depends on UB chip interconnect technology to achieve high-bandwidth remote memory access; only supports 4K page tables, need to configure numa_remote=nofallback in Grub to prohibit applications from directly requesting remote memory.

Using Container Memory Borrowing

Prerequisites

Operating System: openEuler 24.03 LTS SP3 or higher
CPU Architecture: aarch64
Memory: Greater than or equal to 64GB
Disk: SSD, IOPS 500MB/s
Chip Interconnect: UB
User Permissions: Installation and management require root permissions
Software Requirements:
1. Kubernetes v1.31.1 and above
2. Refer to ubs-engine to install ubs-engine and its dependency components
3. Refer to Helm Installation Documentation to install Helm

Background Information

In high-performance computing or high-density deployment scenarios, local memory may become a bottleneck. By deploying UBS K8S Enable related components, you can leverage UB memory pooling technology to achieve cross-node memory borrowing, breaking through single-machine memory limitations while maintaining cloud-native characteristics of containerized applications.

Usage Limitations

Scenario Limitations: Only applies to bare-metal container scenarios, not VM container scenarios.
Page Size Limitation: Only supports 4K standard page scenarios.
Configuration Requirements: Need to modify /etc/default/grub to add numa_remote=nofallback parameter to prevent applications from directly requesting remote memory causing exceptions.
Oversubscription Ratio: Node memory oversubscription ratio maximum is 25%.

Operation Steps

Build Instructions.

1.1 Pull source code.

shell

git clone https://gitcode.com/openFuyao/ubs-k8s-enable.git

1.2 Install dependencies.
Before building, ensure the host has the following tools installed:

shell

docker
helm

Dockerfile uses BuildKit features, ensure BuildKit is enabled before executing docker build.

1.3 Execute build images.

shell

# Version number example, can be adjusted according to actual release version
export VERSION=1.0.0
export DOCKER_BUILDKIT=1

# Build matrixagent image
docker build -f build/matrixagent.dockerfile -t cr.openfuyao.cn/openfuyao/matrixagent:${VERSION} .

# Build matrixcontroller image
docker build -f build/matrixcontroller.dockerfile -t cr.openfuyao.cn/openfuyao/matrixcontroller:${VERSION} .

1.4 Export image packages.

shell

mkdir -p output

docker save cr.openfuyao.cn/openfuyao/matrixagent:${VERSION} | gzip -c > output/ubs-k8s.matrixagent.image.${VERSION}.aarch64.tgz
docker save cr.openfuyao.cn/openfuyao/matrixcontroller:${VERSION} | gzip -c > output/ubs-k8s.matrixcontroller.image.${VERSION}.aarch64.tgz

1.5 Package Helm Chart.

shell

helm package charts/matrixagent --destination output
helm package charts/matrixcontroller --destination output
mv output/matrixagent-*.tgz output/ubs-k8s.matrixagent.chart.${VERSION}.aarch64.tgz
mv output/matrixcontroller-*.tgz output/ubs-k8s.matrixcontroller.chart.${VERSION}.aarch64.tgz

Build artifacts are as follows:

    └── output
    ├── ubs-k8s.matrixagent.image.${VERSION}.aarch64.tgz
    ├── ubs-k8s.matrixagent.chart.${VERSION}.aarch64.tgz
    ├── ubs-k8s.matrixcontroller.image.${VERSION}.aarch64.tgz
    ├── ubs-k8s.matrixcontroller.chart.${VERSION}.aarch64.tgz

Deployment Steps.
Execute the following command to set version variables:

bash

export VERSION=1.0.0
export OCI_VERSION=0.0.0-latest

2.1 Obtain deployment files.
Choose one of the following methods to obtain required images and Helm Charts according to actual scenario.

Method 1: Use offline release packages.

Prepare the following files:

ubs-k8s.matrixagent.image.${VERSION}.aarch64.tgz
ubs-k8s.matrixagent.chart.${VERSION}.aarch64.tgz
ubs-k8s.matrixcontroller.image.${VERSION}.aarch64.tgz
ubs-k8s.matrixcontroller.chart.${VERSION}.aarch64.tgz

Method 2: Obtain from image repository and OCI repository.

Pull images:

bash

docker pull cr.openfuyao.cn/openfuyao/matrixcontroller:latest
docker pull cr.openfuyao.cn/openfuyao/matrixagent:latest

Pull Helm Charts:

bash

helm pull oci://cr.openfuyao.cn/charts/matrixagent --version ${OCI_VERSION}
helm pull oci://cr.openfuyao.cn/charts/matrixcontroller --version ${OCI_VERSION}

2.2 Import offline images.

bash

gunzip -c ubs-k8s.matrixagent.image.${VERSION}.aarch64.tgz | ctr -n k8s.io images import -
gunzip -c ubs-k8s.matrixcontroller.image.${VERSION}.aarch64.tgz | ctr -n k8s.io images import -

Note:
If using "Method 2" to pull images directly from image repository, skip this step.

2.3 Deploy services.
Choose one of the following methods to deploy services according to actual scenario.

Deploy using offline Chart.

bash

helm install matrixagent ubs-k8s.matrixagent.chart.${VERSION}.aarch64.tgz -n kube-system \
  --set images.matrixagent.tag=${VERSION}
helm install matrixcontroller ubs-k8s.matrixcontroller.chart.${VERSION}.aarch64.tgz -n kube-system \
  --set images.matrixcontroller.tag=${VERSION}

Deploy using OCI Chart.

bash

helm install matrixagent oci://cr.openfuyao.cn/charts/matrixagent --version ${OCI_VERSION} -n kube-system \
  --set images.matrixagent.tag=latest
helm install matrixcontroller oci://cr.openfuyao.cn/charts/matrixcontroller --version ${OCI_VERSION} -n kube-system \
  --set images.matrixcontroller.tag=latest

2.4 Verify results.
Execute the following command to view Pod status.

bash

kubectl get pods -A

Expected results are as follows:

Each node should have corresponding matrixagent related Pods with status Running.
Control node should have matrixcontroller related Pods with status Running.

Enable container memory borrowing demo.
Prerequisites
Complete installation of matrix-agent and matrix-controller.
3.1 Configure node labels.
Configure worker node labels through command line on K8s master node to identify whether nodes support NUMA binding, used to distinguish between NUMA-level or node-level waterline alerts when making escape strategy decisions. If this label is not configured, waterline-triggered container memory borrowing is not supported.

shell

  kubectl label nodes <node-name> watermark-escape-strategy=<strategy> 
  # Replace <strategy> with NUMA or node, replace <node-name> with node name that needs borrowing

3.2 Enable oversubscription node and memory oversubscription ratio.
Configure node labels through command line on K8s master node, node memory oversubscription ratio maximum 25%.

shell

kubectl label nodes <node-name> remote-mem-allocation-ratio=25 
# Constraint: value is an integer between (0,25], replace <node-name> with node name that needs borrowing

Configure memory oversubscription for containers through Pod yaml file.

shell

apiVersion: v1
kind: Pod
metadata:
  name: annotation-demo
  labels:
     remote-mem-allocation-ratio:  25 # Set maximum remote memory ratio for container processes, also identifies this Pod's related processes use remote memory; if this label is not configured or has abnormal value, this Pod does not support memory borrowing, recommended value 25
# Constraint: value is an integer between (0,200]

3.3 Configure waterline configMap.
Generate yaml for waterline configMap configuration.

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: watermark-config
  namespace: kube-system
data:
  returnLine : "80" # Memory return waterline
  firstLine : "85" # First waterline, used for container live migration escape, container live migration not delivered yet
  secondLine: "92" # Second waterline, used for container memory borrowing escape
# Constraint: waterline value is an integer between (0,100], difference between 2 waterlines cannot be less than 5

Configure configMap through command line on K8s master node.

shell

kubectl apply -f <yaml-name>
# Replace <yaml-name> with filename of yaml created in previous step

3.4 Deploy test Pod and container.
Edit file content: vi numa-stress.yaml
Modify file content as needed (the following example is for demonstration only, just pay attention to labels and resources fields, any pod meeting these two configurations can use container memory borrowing).

yaml

apiVersion: v1
kind: Pod
metadata:
  ......
  labels:
    remote-mem-allocation-ratio: "25"
    # Set maximum remote memory ratio for container processes, also identifies this Pod's related processes use remote memory, if ratio is not set defaults to node oversubscription ratio; if this label is not configured or has abnormal value, Pod does not support remote memory
spec:
  containers:
    - name: xxxx
      ......
      resources:
        requests:
          cpu: "2"
          memory: "8Gi"
          # Modify memory spec as needed, in non-NUMA-bound scenarios, Pod QoS must be burstable (i.e., resource requests < limits).
        limits:
          cpu: "2"
          memory: "10Gi"
          # Modify memory spec as needed, in non-NUMA-bound scenarios, Pod QoS must be burstable (i.e., resource requests < limits).
      ......

3.5 Create Pod.

shell

kubectl apply -f numa-stress.yaml

View Pod creation status, when Pod is successfully created, status changes to Running.

shell

kubectl get pod -A

3.6 Enter Pod to apply pressure.
Execute pressure command inside pod, as long as container internal memory pressure rises through application, making escape waterline reach secondLine set in watermark-config.

Observe memory borrowing results.

ssh to pod's node, execute the following command to view numastat for cold-hot swapping

shell

numastat -cvm

As shown below, the additional Node 4 is borrowed 2048M remote memory, MemUsed of 1743 indicates used memory size is 1743M.

$ numastat -cvm
Per-node system memory usage (in MBs):
                 Node 0 Node 1 Node 4  Total
                 ------ ------ ------ ------
MemTotal          64254  63967   2048 130269
MemFree            2605  60793    305  63704
MemUsed           61649   3174   1743  66565
......

Trigger memory return.
Execute the following command to delete Pod or remove memory pressure, triggering memory return.

shell

kubectl delete pod -n kube-system xxxx

Memory Borrowing ​

Feature Introduction ​

Application Scenarios ​

Capability Scope ​

Highlight Features ​

Implementation Principle ​

Transparent Memory Borrowing ​

Relationship with Related Features ​

Using Container Memory Borrowing ​

Prerequisites ​

Background Information ​

Usage Limitations ​

Operation Steps ​

Memory Borrowing

Feature Introduction

Application Scenarios

Capability Scope

Highlight Features

Implementation Principle

Transparent Memory Borrowing

Relationship with Related Features

Using Container Memory Borrowing

Prerequisites

Background Information

Usage Limitations

Operation Steps