gRPC API Reference¶
4ward exposes two gRPC services on the same port (default: 9559).
Server¶
| Flag | Default | Description |
|---|---|---|
--port |
9559 | gRPC listen port (use 0 to let the kernel assign an ephemeral port; pair with --port-file to discover it) |
--device-id |
1 | P4Runtime device ID |
--drop-port |
2^N - 1 |
Override drop port (e.g., 511 for 9-bit ports) |
--cpu-port |
2^N - 2 |
Override CPU port (e.g., 510 for 9-bit ports; auto-enabled when @controller_header is present) |
--port-file |
— | After binding, atomically write the listening port to this path. File-exists ≡ ready to serve. Intended for embedders — see Embedding in C++. |
P4Runtime service¶
Standard P4Runtime gRPC API (reports version 1.5.0). All six RPCs are implemented:
| RPC | Description |
|---|---|
SetForwardingPipelineConfig |
Load or replace a P4 pipeline |
GetForwardingPipelineConfig |
Retrieve the loaded pipeline |
Write |
Insert, modify, or delete entities (table entries, action profiles, clone sessions, multicast groups) |
Read |
Query entities (streaming response) |
StreamChannel |
Bidirectional stream for master arbitration, PacketOut, and PacketIn |
Capabilities |
Report P4Runtime protocol version |
Arbitration¶
Multi-controller arbitration is fully supported with role-based access
control. Open a StreamChannel and send a MasterArbitrationUpdate to
become primary for a role. The highest election_id wins.
Write atomicity¶
| Mode | Behavior |
|---|---|
CONTINUE_ON_ERROR |
Attempt all updates; report per-update status |
ROLLBACK_ON_ERROR |
All-or-none (snapshot-based rollback) |
DATAPLANE_ATOMIC |
Same as rollback (the write lock ensures atomicity) |
Dataplane service¶
Defined in dataplane.proto.
For packet injection and result observation — not part of the P4Runtime spec.
InjectPacket¶
Inject a single packet and get the result inline.
Request:
message InjectPacketRequest {
oneof ingress_port {
uint32 dataplane_ingress_port = 1; // e.g., 0
bytes p4rt_ingress_port = 2; // e.g., "Ethernet0"
}
bytes payload = 3;
}
The p4rt_ingress_port variant requires a loaded pipeline with
@p4runtime_translation on the port type.
Response:
message InjectPacketResponse {
repeated PacketSet possible_outcomes = 3; // one entry per possible real execution
TraceTree trace = 2; // P4RT-enriched when translation is available
}
message PacketSet {
repeated OutputPacket packets = 1;
}
The possible_outcomes field captures the distinction between
parallel and alternative forks.
Each PacketSet is one possible set of output packets from a single real
execution. Programs with only parallel forks (clone, multicast) have exactly
one entry. Programs with action selectors have one entry per alternative.
InjectPackets¶
Client-streaming RPC for bulk packet injection. Packets are processed
concurrently as they arrive from the stream. Results are not returned
in the response — use SubscribeResults to collect them.
rpc InjectPackets(stream InjectPacketRequest) returns (InjectPacketsResponse);
message InjectPacketsResponse {} // empty — results via SubscribeResults
Recommended pattern for DVaaS / bulk workloads:
- Open a
SubscribeResultsstream. - Wait for the
SubscriptionActivemessage — this confirms the subscription is registered and no results will be missed. - Send all packets via
InjectPackets. - Collect results from the subscription (exactly one per injected packet).
Packets process concurrently across available cores, with trace tree fork branches (WCMP groups, multicast, clones) also parallelized within each packet.
SubscribeResults¶
Server-streaming RPC that delivers results from all packet sources (InjectPacket, InjectPackets, PacketOut, etc.).
// First message confirms the subscription.
SubscribeResultsResponse { active: {} }
// Subsequent messages carry results.
SubscribeResultsResponse {
result: {
input_packet: { ... }
trace: { ... }
possible_outcomes: [ { packets: [ ... ] } ]
}
}
Matching results to injected packets¶
Each ProcessPacketResult in the SubscribeResults stream includes the
full InputPacket (ingress port + payload). Match results to injected
packets by comparing the payload bytes.
Tip
For DVaaS workloads, embed a unique tag in each test packet (e.g., in an unused header field or the payload body) to make matching unambiguous.
Ordering: With concurrent processing (InjectPackets), results may
arrive in any order. Do not assume the result stream matches the
injection order.
Completeness: You will receive exactly one ProcessPacketResult per
injected packet. Count results to know when you're done.
Relationship to P4Runtime PacketIn: When a packet triggers
copy-to-CPU (e.g., SAI P4's acl_copy or acl_trap), two things
happen:
- The CPU-bound clone appears as a PacketIn on the P4Runtime
StreamChannel. - The complete result (all outputs including the clone, plus the trace
tree) appears in
SubscribeResults.
SubscribeResults gives the full picture for every packet.
StreamChannel PacketIn only carries the CPU-port copies — it's the
standard P4Runtime mechanism for packets punted to the controller.
SubscribeResults vs PacketIn
SubscribeResults delivers exactly N results for N injected
packets, so you always know when you're done.
PacketIn on StreamChannel is convenient because it carries
@controller_header metadata already parsed — but there's no
end-of-batch marker. To know when you've seen all PacketIns, count
CPU-port outputs in SubscribeResults — that tells you exactly how
many PacketIns to expect.
Dual port encoding¶
Output packets carry both port representations when translation is available:
message OutputPacket {
uint32 dataplane_egress_port = 1; // always present
bytes p4rt_egress_port = 3; // only when translated
bytes payload = 2;
}
Data plane performance¶
The numbers below cover data plane throughput (packet processing). Control plane operations (table writes, pipeline loading) have not been optimized — 4ward targets use cases where the control plane sets up state once, then the data plane processes many packets.
While 4ward optimizes for correctness and observability over raw speed, it is fast enough for production test workloads like DVaaS. The following numbers were measured on SAI P4 middleblock with 10k table entries and 500 ternary ACL entries, on an AMD Ryzen 9 7950X3D (16 cores, 128 MB L3) running OpenJDK 21.
| Workload | Sequential, 1 core | Sequential, 16 cores | Batch, 1 core | Batch, 16 cores |
|---|---|---|---|---|
| L3 forwarding | 2,500 | 2,600 | 2,600 | 29,000 |
| WCMP ×16 members | 2,000 | 2,300 | 1,700 | 13,000 |
| WCMP ×16 + mirror | 1,400 | 1,700 | 1,100 | 9,000 |
"Sequential" means one InjectPacket call at a time — send a packet,
wait for the result, repeat. "Batch" uses the InjectPackets streaming
RPC to send 1,000 packets concurrently. The "16 cores" columns show the
effect of parallelism: even sequential calls benefit from multi-core
because fork branches (WCMP members, clones) within a single packet are
processed in parallel. Batch mode adds a second level of parallelism by
processing multiple packets at once.
The three workloads exercise increasingly complex trace trees. L3 forwarding is a straight-line pipeline (VRF, LPM, nexthop, MAC rewrite) with no forks. WCMP ×16 adds a 16-member action selector, producing 16 trace tree branches per packet. WCMP ×16 + mirror adds an ingress clone on top, doubling to 32 branches.
BMv2 comparison¶
We ran a head-to-head benchmark against BMv2's simple_switch on the
same SAI P4 program with the same table entries (10k LPM routes, 500
ternary ACL entries). BMv2 was compiled with -O2 and per-packet trace
logging enabled — its analog of 4ward's trace trees.
| Workload | BMv2 | 4ward, 1 core | 4ward, 16 cores |
|---|---|---|---|
| L3 forwarding | 4,500 | 2,500 | 29,000 |
| WCMP ×16 | 4,400 | 2,000 | 13,000 |
BMv2 is faster on single-core sequential throughput — it's a mature C++ codebase and doesn't build trace trees. With concurrent processing, 4ward pulls well ahead. The WCMP ×16 sequential number is additionally lower because the two simulators do different amounts of work per packet: BMv2 hashes to one action selector member, while 4ward explores all 16 to build the complete trace tree — that's the whole point.
For full details on the benchmark methodology, build flags, and caveats, see PERFORMANCE.md.
Error codes¶
| Situation | gRPC status |
|---|---|
| No pipeline loaded | FAILED_PRECONDITION |
| P4RT port requested without port translation | FAILED_PRECONDITION |
| Invalid request | INVALID_ARGUMENT |
| Entity already exists (INSERT) | ALREADY_EXISTS |
| Entity not found (MODIFY/DELETE) | NOT_FOUND |
| Not primary for role | PERMISSION_DENIED |