Scalable Verification of Probabilistic Networks

McNetKAT

PLDI 2019

This webpage provides the instructions to build and install McNetKAT and reproduce the results from the paper "Scalable Verification of Probabilistic Networks".

Experimental Platform

We used a cluster of 24 Dell R620 servers, with each server having 2x Intel Xeon E5-2650 v2 processors and 64GB of RAM and running Ubuntu 16.04 (x86_64). While a cluster is needed for a few experiments, most of the experiments can be run with a single machine.

Installation

There are two options for installing and testing McNetKAT:

Using vagrant:

  1. Install vagrant and VirtualBox. On Ubuntu, you can do this by executing sudo apt install vagrant virtualbox. On macOS, you can do this by executing brew cask install virtualbox virtualbox-extension-pack && brew install vagrant provided you have homebrew installed.
  2. Get the source code with vagrant scripts:
  3. git clone --branch mcnetkat-pldi19 https://github.com/frenetic-lang/frenetic.git
    cd frenetic/vm/ 
  4. Start the VM and ssh into it. This will automatically install all the dependencies and build and install McNetKAT in the VM, so it's time for a coffee break...
  5. vagrant up
    vagrant ssh
    cd frenetic

From source (requires Linux):

Installing from source requires Linux and has been tested only under Ubuntu 14.04 and 16.04.

  1. Install OCaml and OPAM:
  2. sudo apt-get update
    sudo apt-get install -y python-software-properties build-essential m4
    sudo add-apt-repository -y -u ppa:avsm/ppa
    sudo apt-get install -y ocaml ocaml-native-compilers camlp4 camlp4-extra opam git libssl-dev aspcud
    
  3. Get the source code with vagrant scripts:
  4. git clone --branch mcnetkat-pldi19 https://github.com/frenetic-lang/frenetic.git
  5. Copy the script from the `frenetic/vm` folder to the same folder containing the `frenetic` folder:
    cp frenetic/vm/*.sh .
  6. Then execute the following to automatically build McNetKAT, PRISM, Bayonet, and various dependencies.
    source setup-opam.sh
    source setup-ocaml.sh
    source setup-third-party.sh
    source setup-mcnetkat.sh
    source setup-prism-bayonet.sh
    source setup-plotlibs.sh
    

Replicating results

Evaluation

General Tips & Tricks

  1. Log Files: The results of experiments are saved in individual *.log files, using one file per "data point". If a log file for a particular data point is already present, the experiment is skipped. To force the regeneration of a data point, either delete the corresponding log file or use the pattern
    F=1 ./run_bench.py
    When things go wrong, the log files can often give an indication why.
  2. Timeouts: By default, the timeout for most experiments is set to 1 hour. You can also overwrite this default by specifying a timeout in seconds using the following pattern:
    TIMEOUT=60 ./run_bench.py
    This runs the benchmark with a timeout of 1 minute per data point.
  3. Orphan Processes: If you interrupt an experiment before it has completed, or it crashes, you may be left with orphan processes running in the background and draining CPU and memory resources. This will interfere with other experiments, so make sure to kill these orphan processes! The script kill_all.sh in the frenetic root folder can help with this, although you may have to adjust the parameters to your cluster (or comment out the cluster specific part, if you don't have a cluster). If your system appears unresponsive, this is likely the issue!
  4. Performance: One key feature of McNetKAT is that it parallelizes across all available cores. This leads to significant performance improvements, provided a large number of cores is available, and there is enough memory to afford parallelization. On a VM with limited memory, McNetKAT may run out of memory for larger experiments.

Scalability on FatTree (Figure 7)

This experiment is run on a single machine.

  1. Navigate to the fattree/ecmp directory and delete old results:
    cd frenetic/experiments/fattree/ecmp
    rm *.log
  2. Generate data points:
    ./run_bench.py
  3. Plot result:
    ./plot.py
  4. View result:
    xpdf ecmp.pdf
    less ecmp.txt

Parallel Speedup (Figure 8)

This experiment assumes a cluster of machines with 16 physical cores each. If no cluster is available, or the number of physical cores differs, the experiment may be reproduced partially, as detailed below.

Setting up the cluster:

The master machine (from which the experiment is run) must have access to the other machines using ssh via passwordless key authentication. In particular, if foo is the hostname of a remote machine on the cluster, the following should succeed without the need to enter a passphrase, where bar is an appropriate user on the host foo:

ssh bar@foo
Instructions on how to achieve this can be found here.

Running the experiment:
  1. Navigate to the fattree/speedup directory and delete old results:
    cd frenetic/experiments/fattree/speedup
    rm *.log
  2. Adjust parameters to fit your setup:
    nano +11 ./run_bench.py
    The relevant parameters are the following:
    • J: The number of processes to use on the local machine. Set this to a value no larger than the number of physical cores.
    • RN: The number of remote machines to use.
    • RJ: The number of processes per remote machine to use. As with J, ensure this number is no larger than the number of physical cores.
    If you don't have a cluster available, simply comment out the lines in run_bench.py that have RN set to a value above 0 -- you can still evaluate the speedup on a single machine.
  3. Optionally, if you have a cluster: Given remote machines m1, m2, ..., mk with users u1, u2, ..., uk, export your setup as follow:
    export CLUSTER="u1@m1 u2@m2 ... uk@mk"
  4. Optionally, adjust the parameter k of the FatTrees to use:
    nano +38 run_bench.py
    By default, the experiment is run with k=14 and k=16.
  5. Generate data points:
    ./run_bench.py
  6. Plot result:
    ./plot.py
  7. View result:
    xpdf speedup.pdf
    less speedup.txt

Comparision with other tools (Figure 10)

This experiment is mostly run on a single machine. A cluster is needed only to generate the green PNK (cluster) line from Figure 10.

  1. Navigate to bayonet experiment directory and delete old results:
    cd frenetic/examples/bayonet
    rm *.log
  2. Generate input files for Bayonet and McNetKAT:
    ./generate.py
    If this completes successfully, you should see various *.dot and *.bayonet files in the directory.
  3. Optionally, adjust the paramters with which to invoke the various tools in run_bench.py. In particular, if you don't have a cluster, comment out the line with 'RN': '24' -- it instructs McNetKAT to spread the workload to 24 remote machines.
    nano +19 run_bench.py
    If you do have a cluster, follow the instructions on how to set up the cluster.
  4. Generate data points:
    ./run_bench.py
  5. Plot result:
    cd ~/frenetic/plots
    ./bayonet.py ../examples/bayonet
    
  6. View result:
    xpdf bayonet.pdf
    less bayonet.txt

Case Study: Data Center Fault-Tolerance

Comparing resilience of routing schemes (Figure 11)

  1. To perform the analysis corresponding to the case study show in Figure 11 of the paper, run the following from the frenetic directory:
  2. probnetkat.compare
  3. This will generate json files in ./examples/output/results/comparisons directory with data for the two tables. For the case with k failures, see the file k-1-16.json. The field resilience in each file corresponds to the equivalence test with teleportation in Figure 11 (b).
  4. To generate the table, run:
  5. sh ./plots/f10-tables.sh

Case study with unbounded failures (Figure 12)

  1. To perform the analysis corresponding to the case study show in Figure 12 of the paper, run the following script from the frenetic directory:
  2. sh ./experiments/f10/run-partial.sh
  3. It may take up to 10 minutes to complete the experiments.
  4. To plot and visualize the results, perform the following:
  5. sh ./plots/f10.sh
    xpdf experiments/f10/[name].pdf  # [name] = cdf_tput_vs_hop_count-inf-1-4, expected_hops_vs_fail_prob-inf, tput_vs_fail_prob-inf