Adding a new OVN feature to the DDlog version of ovn-northd¶
This document describes the usual steps an OVN developer should go
through when adding a new feature to ovn-northd-ddlog
. In order to
make things less abstract we will use the IP Multicast
ovn-northd-ddlog
implementation as an example. Even though the
document is structured as a tutorial there might still exist
feature-specific aspects that are not covered here.
Overview¶
DDlog is a dataflow system: it receives data from a data source (a set of “input relations”), processes it through “intermediate relations” according to the rules specified in the DDlog program, and sends the processed “output relations” to a data sink. In OVN, the input relations primarily come from the OVN Northbound database and the output relations primarily go to the OVN Southbound database. The process looks like this:
from NBDB +----------+ +-----------------+ +-----------+ to SBDB
---------->|Input rels|-->|Intermediate rels|-->|Output rels|---------->
+----------+ +-----------------+ +-----------+
Adding a new feature to ovn-northd-ddlog
usually involves the
following steps:
- Update northbound and/or southbound OVSDB schemas.
- Configure DDlog/OVSDB bindings.
- Define intermediate DDlog relations and rules to compute them.
- Write rules to update output relations.
- Generate
Logical_Flow``s and/or other forwarding records (e.g., ``Multicast_Group
) that will control the dataplane operations.
Update NB and/or SB OVSDB schemas¶
This step is no different from the normal development flow in C.
Most of the times a developer chooses between two ways of configuring a new feature:
- Adding a set of columns to tables in the NB and/or SB database (or adding key-value pairs to existing columns).
- Adding new tables to the NB and/or SB database.
Looking at IP Multicast, there are two OVN Northbound
tables where
configuration information is stored:
Logical_Switch
, columnother_config
, keysmcast_*
.Logical_Router
, columnoptions
, keysmcast_*
.
These tables become inputs to the DDlog pipeline.
In addition we add a new table IP_Multicast
to the SB database.
DDlog will update this table, that is, IP_Multicast
receives
output from the above pipeline.
Configuring DDlog/OVSDB bindings¶
Configuring northd/automake.mk
¶
The OVN build process uses DDlog’s ovsdb2ddlog
utility to parse
ovn-nb.ovsschema
and ovn-sb.ovsschema
and then automatically
populate OVN_Northbound.dl
and OVN_Southbound.dl
. For each
OVN Northbound and Southbound table, it generates one or more
corresponding DDlog relations.
We need to supply ovsdb2ddlog
with some information that it can’t
infer from the OVSDB schemas. This information must be specified as
ovsdb2ddlog
arguments, which are read from
northd/ovn-nb.dlopts
and northd/ovn-sb.dlopts
.
The main choice for each new table is whether it is used for output.
Output tables can also be used for input, but the converse is not
true. If the table is used for output at all, we add -o <table>
to the option file. Our new table IP_Multicast
is an output
table, so we add -o IP_Multicast
to ovn-sb.dlopts
.
For input-only tables, ovsdb2ddlog
generates a DDlog input
relation with the same name. For output tables, it generates this
table plus an output relation named Out_<table>
. Thus,
OVN_Southbound.dl
has two relations for IP_Multicast
:
input relation IP_Multicast (
_uuid: uuid,
datapath: string,
enabled: Set<bool>,
querier: Set<bool>
)
output relation Out_IP_Multicast (
_uuid: uuid,
datapath: string,
enabled: Set<bool>,
querier: Set<bool>
)
For an output table, consider whether only some of the columns are
used for output, that is, some of the columns are effectively
input-only. This is common in OVN for OVSDB columns that are managed
externally (e.g. by a CMS). For each input-only column, we add --ro
<table>.<column>
. Alternatively, if most of the columns are
input-only but a few are output columns, add --rw <table>.<column>
for each of the output columns. In our case, all of the columns are
used for output, so we do not need to add anything.
Finally, in some cases ovn-northd-ddlog
shouldn’t change values in
. One such case is the seq_no
column in the
IP_Multicast
table. To do that we need to instruct ovsdb2ddlog
to treat the column as read-only by using the --ro
switch.
ovsdb2ddlog
generates a number of additional DDlog relations, for
use by auto-generated OVSDB adapter logic. These are irrelevant to
most DDLog developers, although sometimes they can be handy for
debugging. See the appendix for details.
Define intermediate DDlog relations and rules to compute them.¶
Obviously there will be a one-to-one relationship between logical
switches/routers and IP multicast configuration. One way to represent
this relationship is to create multicast configuration DDlog relations
to be referenced by &Switch
and &Router
DDlog records:
/* IP Multicast per switch configuration. */
relation &McastSwitchCfg(
datapath : uuid,
enabled : bool,
querier : bool
}
&McastSwitchCfg(
.datapath = ls_uuid,
.enabled = map_get_bool_def(other_config, "mcast_snoop", false),
.querier = map_get_bool_def(other_config, "mcast_querier", true)) :-
nb.Logical_Switch(._uuid = ls_uuid,
.other_config = other_config).
Then reference these relations in &Switch
and &Router
. For
example, in lswitch.dl
, the &Switch
relation definition now
contains:
relation &Switch(
ls: nb.Logical_Switch,
[...]
mcast_cfg: Ref<McastSwitchCfg>
)
And is populated by the following rule which references the correct
McastSwitchCfg
based on the logical switch uuid:
&Switch(.ls = ls,
[...]
.mcast_cfg = mcast_cfg) :-
nb.Logical_Switch[ls],
[...]
mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid).
Build state based on information dynamically updated by ovn-controller
¶
Some OVN features rely on information learned by ovn-controller
to
generate Logical_Flow
or other records that control the dataplane.
In case of IP Multicast, ovn-controller
uses IGMP to learn
multicast groups that are joined by hosts.
Each ovn-controller
maintains its own set of records to avoid
ownership and concurrency with other controllers. If two hosts that
are connected to the same logical switch but reside on different
hypervisors (different ovn-controller
processes) join the same
multicast group G, each of the controllers will create an
IGMP_Group
record in the OVN Southbound
database which will
contain a set of ports to which the interested hosts are connected.
At this point ovn-northd-ddlog
needs to aggregate the per-chassis
IGMP records to generate a single Logical_Flow
for group G.
Moreover, the ports on which the hosts are connected are represented
as references to Port_Binding
records in the database. These also
need to be translated to &SwitchPort
DDlog relations. The
corresponding DDlog operations that need to be performed are:
Flatten the
<IGMP group, ports>
mapping in order to be able to do the translation fromPort_Binding
to&SwitchPort
. For eachIGMP_Group
record in theOVN Southbound
database generate an individual record of typeIgmpSwitchGroupPort
for eachPort_Binding
in the set of ports that joined the group. Also, translate thePort_Binding
uuid to the correspondingLogical_Switch_Port
uuid:relation IgmpSwitchGroupPort( address: string, switch : Ref<Switch>, port : uuid ) IgmpSwitchGroupPort(address, switch, lsp_uuid) :- sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, .ports = pb_ports), var pb_port_uuid = FlatMap(pb_ports), sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), &SwitchPort( .lsp = nb.Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, .sw = switch).
Aggregate the flattened IgmpSwitchGroupPort (implicitly from all
ovn-controller
instances) grouping by adress and logical switch:relation IgmpSwitchMulticastGroup( address: string, switch : Ref<Switch>, ports : Set<uuid> ) IgmpSwitchMulticastGroup(address, switch, ports) :- IgmpSwitchGroupPort(address, switch, port), var ports = port.group_by((address, switch)).to_set().
At this point we have all the feature configuration relevant
information stored in DDlog relations in ovn-northd-ddlog
memory.
Pitfalls of projections¶
A projection is a join that uses only some of the data in a record. When the fields that are used have duplicates, the result can be many “copies” of a record, which DDlog represents internally with an integer “weight” that counts the number of copies. We don’t have a projection with duplicates in this example, but lswitch.dl has many of them, such as this one:
relation LogicalSwitchHasACLs(ls: uuid, has_acls: bool)
LogicalSwitchHasACLs(ls, true) :-
LogicalSwitchACL(ls, _).
LogicalSwitchHasACLs(ls, false) :-
nb::Logical_Switch(._uuid = ls),
not LogicalSwitchACL(ls, _).
When multiple projections get joined together, the weights can overflow, which causes DDlog to malfunction. The solution is to make the relation an output relation, which causes DDlog to filter it through a “distinct” operator that reduces the weights to 1. Thus, LogicalSwitchHasACLs is actually implemented this way:
output relation LogicalSwitchHasACLs(ls: uuid, has_acls: bool)
For more information, see Avoiding weight overflow in the DDlog tutorial.
Write rules to update output relations¶
The developer updates output tables by writing rules that generate
Out_*
relations. For IP Multicast this means:
/* IP_Multicast table (only applicable for Switches). */
sb::Out_IP_Multicast(._uuid = hash128(cfg.datapath),
.datapath = cfg.datapath,
.enabled = set_singleton(cfg.enabled),
.querier = set_singleton(cfg.querier)) :-
&McastSwitchCfg[cfg].
Note
OVN_Southbound.dl
also contains an IP_Multicast
relation with input
qualifier. This relation stores the
current snapshot of the OVSDB table and cannot be written to.
Generate Logical_Flow
and/or other forwarding records¶
At this point we have defined all DDlog relations required to generate
Logical_Flow``s. All we have to do is write the rules to do so.
For each ``IgmpSwitchMulticastGroup
we generate a Flow
that has
as action "outport = <Multicast_Group>; output;"
:
/* Ingress table 17: Add IP multicast flows learnt from IGMP (priority 90). */
for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) {
Flow(.logical_datapath = sw.dpname,
.stage = switch_stage(IN, L2_LKUP),
.priority = 90,
.__match = "eth.mcast && ip4 && ip4.dst == ${address}",
.actions = "outport = \"${address}\"; output;",
.external_ids = map_empty())
}
In some cases generating a logical flow is not enough. For IGMP we
also need to maintain OVN southbound Multicast_Group
records,
one per IGMP group storing the corresponding Port_Binding
uuids of
ports where multicast traffic should be sent. This is also relatively
straightforward:
/* Create a multicast group for each IGMP group learned by a Switch.
* 'tunnel_key' == 0 triggers an ID allocation later.
*/
sb::Out_Multicast_Group (.datapath = switch.dpname,
.name = address,
.tunnel_key = 0,
.ports = set_map_uuid2name(port_ids)) :-
IgmpSwitchMulticastGroup(address, &switch, port_ids).
We must also define DDlog relations that will allocate tunnel_key
values. There are two cases: tunnel keys for records that already
existed in the database are preserved to implement stable id
allocation; new multicast groups need new keys. This kind of
allocation can be tricky, especially to new users of DDlog. OVN
contains multiple instances of allocation, so it’s probably worth
reading through the existing cases and following their pattern, and,
if it’s still tricky, asking for assistance.
Appendix A. Additional relations generated by ovsdb2ddlog
¶
ovsdb2ddlog generates some extra relations to manage communication with the OVSDB server. It generates records in the following relations when rows in OVSDB output tables need to be added or deleted or updated.
In the steady state, when everything is working well, a given record
stays in any one of these relations only briefly: just long enough for
ovn-northd-ddlog
to send a transaction to the OVSDB server. When
the OVSDB server applies the update and sends an acknowledgement, this
ordinarily means that these relations become empty, because there are
no longer any further changes to send.
Thus, records that persist in one of these relations is a sign of a
problem. One example of such a problem is the database server
rejecting the transactions sent by ovn-northd-ddlog
, which might
happen if, for example, a bug in a .dl
file would cause some OVSDB
constraint or relational integrity rule to be violated. (Such a
problem can often be diagnosed by looking in the OVSDB server’s log.)
DeltaPlus_IP_Multicast
used by the DDlog program to track new records that are not yet added to the database:output relation DeltaPlus_IP_Multicast ( datapath: uuid_or_string_t, enabled: Set<bool>, querier: Set<bool> )
DeltaMinus_IP_Multicast
used by the DDlog program to track records that are no longer needed in the database and need to be removed:output relation DeltaMinus_IP_Multicast ( _uuid: uuid )
Update_IP_Multicast
used by the DDlog program to track records whose fields need to be updated in the database:output relation Update_IP_Multicast ( _uuid: uuid, enabled: Set<bool>, querier: Set<bool> )