Mooncake P2P Architecture Client Remote Invocation
Feature Introduction
This feature aims to implement cross-machine data read/write capabilities between Clients for the Mooncake Store V3 architecture (ISSUE 1209). The implementation completes cross-machine read/write through RPC metadata communication + TransferEngine data transmission. Core components include:
- ClientRpcService: RPC server, handling remote data read/write requests.
- DataManager: Data management layer, coordinating local/remote data operations.
- PeerClient: RPC client, initiating asynchronous cross-machine data read/write requests.
Application Scenarios
In large model inference services, using Mooncake V3 architecture for cross-machine read/write of remote data.
Capability Scope
- Implement cross-machine data read/write RPC between Clients: Supports secure cross-machine data read/write and correctly handles RPC timeouts (timeout period 10s). Ensures concurrent access safety through key-level read/write locks (Lock Striping). Adopts RAII pattern to guarantee operation atomicity.
- Support multiple storage media: Supports DRAM and non-DRAM (such as SSD) Tiers, with non-DRAM data transferred through temporary buffers.
Key Features
- Control plane/data plane decoupling: RPC is only responsible for metadata and scheduling, data is transmitted cross-machine through TransferEngine, reducing bandwidth pressure on RPC links.
- Unified entry and consistent concurrency control: Local/remote management is uniformly handled by DataManager, adopting key-level read/write locks (Lock Striping) to ensure concurrent safety.
- Multi-media transparency and robust failure handling: Supports DRAM/non-DRAM Tier, non-DRAM scenarios transfer through temporary DRAM buffers; correctly handles RPC timeout and object non-existence error returns.
Implementation Principle
Component View
Figure 1 Component View
Cross-machine Read Operation Flow
Figure 2 Cross-machine Read Operation Flow Diagram
Flow Description:
- Client A initiates async ReadRemoteData RPC request through PeerClient.
- Client B's ClientRpcService receives the request and forwards it to DataManager.
- DataManager obtains data handle from TieredBackend.
- If data is not in DRAM, it needs to be copied to temporary buffer first.
- Write data to Client A's target buffer via TransferEngine using RDMA WRITE.
- Return success result.
Cross-machine Write Operation Flow
Figure 3 Cross-machine Write Operation Flow Diagram
Flow Description:
- Client A initiates async WriteRemoteData RPC request through PeerClient.
- Client B's ClientRpcService receives the request and forwards it to DataManager.
- DataManager obtains the Key's mutex lock (write mutual exclusion).
- Allocate storage space from TieredBackend, obtain handle.
- If target is not DRAM, temporary buffer needs to be used.
- Read data from Client A via TransferEngine using RDMA READ.
- If target is not DRAM, copy data from temporary buffer to target storage.
- Call TieredBackend's Commit to submit metadata update.
- Return success result.
RAII Guarantee: If transfer or commit fails, the handle will automatically release allocated resources during destruction.
Relationship with Related Features
- Depends on TieredBackend feature to manage local storage.
- Depends on TransferEngine feature to implement cross-node data transmission.
Use P2P Architecture Client Remote Invocation
Prerequisites
- Hardware Requirements: RDMA/TCP network card required.
- Environment Requirements: mooncake-master service must be started first.
Background Information
- Usage Scenario: In P2P deployment mode, inference processes need to cross-machine read/write remote KVCache or intermediate results. Callers want to read/write remote data by Key as if accessing locally.
- Basic Principle: Cross-machine read/write is divided into two segments: metadata and control information is exchanged asynchronously via RPC between PC and RPC; actual data plane is handled by TransferEngine for cross-machine transfer (such as RDMA WRITE/READ), avoiding passing large data directly through RPC channel.
Usage Limitations
- Deployment and Connectivity Limitations: Only applicable to P2P deployment mode; cluster network must meet bidirectional connectivity and port opening requirements for TCP or RDMA channels.
Operation Steps
Configure DataManger lock shard count through environment variable MOONCAKE_DM_LOCK_SHARD_COUNT (default 1024 shards if not set). Configure RPC transport protocol to "rdma" or "tcp" through environment variable MC_RPC_PROTOCOL (default "tcp" if not set).
When deploying Master, configure startup parameters:
deployment_modeconfigure to P2P mode.
When deploying Client, configure startup parameters:
deployment_modeconfigure to P2P mode.client_rpc_portis the RPC server port number, default value is 12345.rpc_thread_numis the number of RPC threads, default value is 2.
Startup example:
./mooncake_master --deployment_mode P2P --rpc_port 50051
./mooncake_client \
--master_server_address 127.0.0.1:50051 \
--port 50052 --client_rpc_port 12345 \
--deployment_mode P2P \
--tiered_backend_config '{"tiers":[{"type":"DRAM","capacity":536870912,"priority":100,"allocator_type":"OFFSET"}]}'Subsequent Operations
Subsequent operations can refer to Mooncake-Store documentation.


