Debugging the DDlog version of ovn-northd¶
This document gives some tips for debugging correctness issues in the
DDlog implementation of
ovn-northd. To keep things conrete, we
assume here that a failure occurred in one of the test cases in
ovn-e2e.at, but the same methodology applies in any other
environment. If none of these methods helps, ask for assistance or
submit a bug report.
Before trying these methods, you may want to check the northd log
error messages that might explain the failure.
Compare OVSDB tables generated by DDlog vs C¶
The first thing I typically want to check when
does not behave as expected is how the OVSDB tables computed by DDlog
differ from what the C implementation produces. Fortunately, all the
infrastructure needed to do this already exists in OVN.
First, let’s modify the test script, e.g.,
ovn.at to dump the
contents of OVSDB right before the failure. The most common issue is
a difference between the logical flows generated by the two
implementations. To make it easy to compare the generated flows, make
sure that the test contains something like this in the right place:
ovn-sbctl dump-flows > sbflows AT_CAPTURE_FILE([sbflows])
The first line above dumps the OVN logical flow table to a file named
sbflows. The second line ensures that, if the test fails,
sbflows get logged to
testsuite.log. That is not particularly
useful for us right now, but it means that if someone later submits a
bug report, that’s one more piece of data that we don’t have to ask
for them to submit along with it.
Next, we want to run the test twice, with the C and DDlog versions of
make check -j6 TESTSUITEFLAGS="-d 111 112" if 111
and 112 are the C and DDlog versions of the same test. The
this command line makes the test driver keep test directories around
even for tests that succeed, since by default it deletes them.
Now you can look at
sbflows in each test log directory. The
ovn-northd-ddlog developers have gone to some trouble to make the
DDlog flows as similar as possible to the C ones, right down to white
space and other formatting. Thus, the DDlog output is often identical
to C aside from logical datapath UUIDs.
Usually, this means that one can get informative results by running
diff -u tests/testsuite.dir/111/sbflows tests/testsuite.dir/111/sbflows
Running the input through the
uuidfilt utility from OVS will
generally get rid of the logical datapath UUID differences as well:
diff -u <(uuidfilt tests/testsuite.dir/111/sbflows) <(uuidfilt tests/testsuite.dir/111/sbflows)
If there are nontrivial differences, this often identifies your bug.
Often, once you have identified the difference between the two OVSDB dumps, this will immediately lead you to the root cause of the bug, but if you are not this lucky then the next method may help.
Record and replay DDlog execution¶
DDlog offers a way to record all input table updates throughout the execution of northd and replay them against DDlog running as a standalone executable without all other OVN components. This has two advantages. First, this allows one to easily tweak the inputs, e.g. to simplify the test scenario. Second, the recorded execution can be easily replayed anywhere without having to reproduce your OVN setup.
--ddlog-record option to record updates,
--ddlog-record=replay.dat to record to
(OVN’s built-in tests automatically do this.) The file contains the
log of transactions in the DDlog command format (see
To replay the log, you will need the standalone DDlog executable. By
default, the build system does not compile this program, because it
increases the already long Rust compilation time. To build it, add
NORTHD_CLI=1 to the
make command line, e.g.
You can modify the log before replaying it, e.g., adding
<table> commands to dump the contents of relations at various points
during execution. The <table> name must be fully qualified based on
the file in which it is declared, e.g.
southbound tables or
can also use
dump without an argument to dump the contents of all
The following command replays the log generated by OVN test number
112 and dumps the output of DDlog to
northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/112/northd/replay.dat > replay.dump
Or, to dump just the table contents following the run, without having
(cat tests/testsuite.dir/112/northd/replay.dat; echo 'dump;') | northd/ovn_northd_ddlog/target/release/ovn_northd_cli --no-delta --no-init-snapshot > replay.dump
Depending on whether and how you installed OVS and OVN, you might need
LD_LIBRARY_PATH to library build directories to get the
CLI to run, e.g.:
The replay output may be less informative than you expect because
DDlog does not, by default, keep around enough information to
include input relation and intermediate relations in the output.
These relations are often critical to understanding what is going
on. To include them, add the options
--output-internal-relations --output-input-relations=In_ to
DDLOG_EXTRA_FLAGS for building
./configure DDLOG_EXTRA_FLAGS='--output-internal-relations --output-input-relations=In_'
Debugging by Logging¶
One limitation of the previous method is that it allows one to inspect inputs and outputs of a rule, but not the (sometimes fairly complicated) computation that goes on inside the rule. You can of course break up the rule into several rules and dump the intermediate outputs.
There are at least two alternatives for generating log messages.
First, you can write rules to add strings to the Warning relation
ovn_north.dl. Code in
ovn-northd-ddlog.c will log
any given string in this relation just once, when it is first added to
the relation. (If it is removed from the relation and then added back
later, it will be logged again.)
Second, you can call using the
warn() function declared in
ovn.dl from a DDlog rule. It’s not straightforward to know
exactly when this function will be called, like it would be in an
imperative language like C, since DDlog is a declarative language
where the user doesn’t directly control when rules are triggered. You
might, for example, see the rule being triggered multiple times with
the same input. Nevertheless, this debugging technique is useful in
You will find many examples of the use of Warning and
ovn_northd.dl, where it is frequently used to report non-critical
TODO: update these instructions as DDlog’s internal handling of panic’s is improved.
DDlog is a safe language, so DDlog programs normally do not crash, except for the following three cases:
- A panic in a Rust function imported to DDlog as
- A panic in a C function imported to DDlog as
- A bug in the DDlog runtime or libraries.
Below we walk through the steps involved in debugging such failures.
In this scenario, there is an array-index-out-of-bounds error in the
ovn_scan_static_dynamic_ip6() function, which is written in Rust
and imported to DDlog as an
extern function. When invoked from a
DDlog rule, this function causes a panic in one of DDlog worker
Step 1: Check for error messages in the northd log. A panic can
generally lead to unpredictable outcomes, so one cannot count on a
clean error message showing up in the log (Other outcomes include
crashing the entire process and even deadlocks. We are working to
eliminate the latter possibility). In this case we are lucky to
observe a bunch of error messages like the following in the
2019-09-23T16:23:24.549Z|00011|ovn_northd|ERR|ddlog_transaction_commit(): error: failed to receive flush ack message from timely dataflow thread
These messages are telling us that something is broken inside the DDlog runtime.
Step 2: Record and replay the failing scenario. We use DDlog’s record/replay capabilities (see above) to capture the faulty scenario. We replay the recorded trace:
northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat
This generates a bunch of output ending with:
thread 'worker thread 2' panicked at 'index out of bounds: the len is 1 but the index is 1', /rustc/eae3437dfe991621e8afdc82734f4a172d7ddf9b/src/libcore/slice/mod.rs:2681:10 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.
We re-run the CLI again with backtrace enabled (as suggested by the error message):
RUST_BACKTRACE=1 northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat
This finally yields the following stack trace, which suggests array
bound violation in
0: backtrace::backtrace::libunwind::trace at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29 10: core::panicking::panic_bounds_check at src/libcore/panicking.rs:61 [SKIPPED] 11: ovn_northd_ddlog::__ovn::ovn_scan_static_dynamic_ip6 12: ovn_northd_ddlog::prog::__f [SKIPPED]
Finally, looking at the source code of
ovn_scan_static_dynamic_ip6, we identify the following line,
containing an unsafe array indexing operator, as the culprit:
Occasionally it’s desirable to a full and complete build of the
DDlog-generated code. To trigger that, delete the generated
ovn_northd_ddlog directory and the
ddlog.stamp witness file,
rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp
Submitting a bug report¶
If you are having trouble with DDlog and the above methods do not
help, please submit a bug report to
In addition to problem description, please provide as many of the following as possible:
- Are you running with the right DDlog for the version of OVN? OVN and DDlog are both evolving and OVN needs to build against a specific version of DDlog.
replay.datfile generated as described above
testsuite.log, if you are running the OVN test suite