gRPC API Reference¶

4ward exposes two gRPC services on the same port (default: 9559).

Server¶

bazel run //grpc:fourward_server -- [flags]

Flag	Default	Description
`--port`	9559	gRPC listen port (use `0` to let the kernel assign an ephemeral port; pair with `--port-file` to discover it)
`--device-id`	1	P4Runtime device ID
`--drop-port`	`2^N - 1`	Override drop port (e.g., 511 for 9-bit ports)
`--cpu-port`	`2^N - 2`	Override CPU port (e.g., 510 for 9-bit ports; auto-enabled when `@controller_header` is present)
`--port-file`	—	After binding, atomically write the listening port to this path. File-exists ≡ ready to serve. Intended for embedders — see Embedding in C++.

P4Runtime service¶

Standard P4Runtime gRPC API (reports version 1.5.0). All six RPCs are implemented:

RPC	Description
`SetForwardingPipelineConfig`	Load or replace a P4 pipeline
`GetForwardingPipelineConfig`	Retrieve the loaded pipeline
`Write`	Insert, modify, or delete entities (table entries, action profiles, clone sessions, multicast groups)
`Read`	Query entities (streaming response)
`StreamChannel`	Bidirectional stream for master arbitration, PacketOut, and PacketIn
`Capabilities`	Report P4Runtime protocol version

Arbitration¶

Multi-controller arbitration is fully supported with role-based access control. Open a StreamChannel and send a MasterArbitrationUpdate to become primary for a role. The highest election_id wins.

Write atomicity¶

Mode	Behavior
`CONTINUE_ON_ERROR`	Attempt all updates; report per-update status
`ROLLBACK_ON_ERROR`	All-or-none (snapshot-based rollback)
`DATAPLANE_ATOMIC`	Same as rollback (the write lock ensures atomicity)

Dataplane service¶

Defined in dataplane.proto. For packet injection and result observation — not part of the P4Runtime spec.

`InjectPacket`¶

Inject a single packet and get the result inline.

Request:

message InjectPacketRequest {
  oneof ingress_port {
    uint32 dataplane_ingress_port = 1;  // e.g., 0
    bytes p4rt_ingress_port = 2;        // e.g., "Ethernet0"
  }
  bytes payload = 3;
}

The p4rt_ingress_port variant requires a loaded pipeline with @p4runtime_translation on the port type.

Response:

message InjectPacketResponse {
  repeated PacketSet possible_outcomes = 3;   // one entry per possible real execution
  TraceTree trace = 2;                        // P4RT-enriched when translation is available
}

message PacketSet {
  repeated OutputPacket packets = 1;
}

The possible_outcomes field captures the distinction between parallel and alternative forks. Each PacketSet is one possible set of output packets from a single real execution. Programs with only parallel forks (clone, multicast) have exactly one entry. Programs with action selectors have one entry per alternative.

`InjectPackets`¶

Client-streaming RPC for bulk packet injection. Packets are processed concurrently as they arrive from the stream. Results are not returned in the response — use SubscribeResults to collect them.

rpc InjectPackets(stream InjectPacketRequest) returns (InjectPacketsResponse);
message InjectPacketsResponse {}  // empty — results via SubscribeResults

Recommended pattern for DVaaS / bulk workloads:

Open a SubscribeResults stream.
Wait for the SubscriptionActive message — this confirms the subscription is registered and no results will be missed.
Send all packets via InjectPackets.
Collect results from the subscription (exactly one per injected packet).

Packets process concurrently across available cores, with trace tree fork branches (WCMP groups, multicast, clones) also parallelized within each packet.

`SubscribeResults`¶

Server-streaming RPC that delivers results from all packet sources (InjectPacket, InjectPackets, PacketOut, etc.).

// First message confirms the subscription.
SubscribeResultsResponse { active: {} }
// Subsequent messages carry results.
SubscribeResultsResponse {
  result: {
    input_packet: { ... }
    trace: { ... }
    possible_outcomes: [ { packets: [ ... ] } ]
  }
}

Matching results to injected packets¶

Each ProcessPacketResult in the SubscribeResults stream includes the full InputPacket (ingress port + payload). Match results to injected packets by comparing the payload bytes.

Tip

For DVaaS workloads, embed a unique tag in each test packet (e.g., in an unused header field or the payload body) to make matching unambiguous.

Ordering: With concurrent processing (InjectPackets), results may arrive in any order. Do not assume the result stream matches the injection order.

Completeness: You will receive exactly one ProcessPacketResult per injected packet. Count results to know when you're done.

Relationship to P4Runtime PacketIn: When a packet triggers copy-to-CPU (e.g., SAI P4's acl_copy or acl_trap), two things happen:

The CPU-bound clone appears as a PacketIn on the P4Runtime StreamChannel.
The complete result (all outputs including the clone, plus the trace tree) appears in SubscribeResults.

SubscribeResults gives the full picture for every packet. StreamChannel PacketIn only carries the CPU-port copies — it's the standard P4Runtime mechanism for packets punted to the controller.

SubscribeResults vs PacketIn

SubscribeResults delivers exactly N results for N injected packets, so you always know when you're done.

PacketIn on StreamChannel is convenient because it carries @controller_header metadata already parsed — but there's no end-of-batch marker. To know when you've seen all PacketIns, count CPU-port outputs in SubscribeResults — that tells you exactly how many PacketIns to expect.

Dual port encoding¶

Output packets carry both port representations when translation is available:

message OutputPacket {
  uint32 dataplane_egress_port = 1;  // always present
  bytes p4rt_egress_port = 3;        // only when translated
  bytes payload = 2;
}

Data plane performance¶

The numbers below cover data plane throughput (packet processing). Control plane operations (table writes, pipeline loading) have not been optimized — 4ward targets use cases where the control plane sets up state once, then the data plane processes many packets.

While 4ward optimizes for correctness and observability over raw speed, it is fast enough for production test workloads like DVaaS. The following numbers were measured on SAI P4 middleblock with 10k table entries and 500 ternary ACL entries, on an AMD Ryzen 9 7950X3D (16 cores, 128 MB L3) running OpenJDK 21.

Workload	Sequential, 1 core	Sequential, 16 cores	Batch, 1 core	Batch, 16 cores
L3 forwarding	2,500	2,600	2,600	29,000
WCMP ×16 members	2,000	2,300	1,700	13,000
WCMP ×16 + mirror	1,400	1,700	1,100	9,000

"Sequential" means one InjectPacket call at a time — send a packet, wait for the result, repeat. "Batch" uses the InjectPackets streaming RPC to send 1,000 packets concurrently. The "16 cores" columns show the effect of parallelism: even sequential calls benefit from multi-core because fork branches (WCMP members, clones) within a single packet are processed in parallel. Batch mode adds a second level of parallelism by processing multiple packets at once.

The three workloads exercise increasingly complex trace trees. L3 forwarding is a straight-line pipeline (VRF, LPM, nexthop, MAC rewrite) with no forks. WCMP ×16 adds a 16-member action selector, producing 16 trace tree branches per packet. WCMP ×16 + mirror adds an ingress clone on top, doubling to 32 branches.

BMv2 comparison¶

We ran a head-to-head benchmark against BMv2's simple_switch on the same SAI P4 program with the same table entries (10k LPM routes, 500 ternary ACL entries). BMv2 was compiled with -O2 and per-packet trace logging enabled — its analog of 4ward's trace trees.

Workload	BMv2	4ward, 1 core	4ward, 16 cores
L3 forwarding	4,500	2,500	29,000
WCMP ×16	4,400	2,000	13,000

BMv2 is faster on single-core sequential throughput — it's a mature C++ codebase and doesn't build trace trees. With concurrent processing, 4ward pulls well ahead. The WCMP ×16 sequential number is additionally lower because the two simulators do different amounts of work per packet: BMv2 hashes to one action selector member, while 4ward explores all 16 to build the complete trace tree — that's the whole point.

For full details on the benchmark methodology, build flags, and caveats, see PERFORMANCE.md.

Error codes¶

Situation	gRPC status
No pipeline loaded	`FAILED_PRECONDITION`
P4RT port requested without port translation	`FAILED_PRECONDITION`
Invalid request	`INVALID_ARGUMENT`
Entity already exists (INSERT)	`ALREADY_EXISTS`
Entity not found (MODIFY/DELETE)	`NOT_FOUND`
Not primary for role	`PERMISSION_DENIED`