Experimental Platform
We used a cluster of 24 Dell R620 servers, with each server having 2x Intel Xeon E5-2650 v2 processors and 64GB of RAM and running Ubuntu 16.04 (x86_64). While a cluster is needed for a few experiments, most of the experiments can be run with a single machine.Installation
There are two options for installing and testing McNetKAT:Using vagrant:
- Install vagrant
and
VirtualBox.
On Ubuntu, you can do this by executing
sudo apt install vagrant virtualbox
. On macOS, you can do this by executingbrew cask install virtualbox virtualbox-extension-pack && brew install vagrant
provided you have homebrew installed. - Get the source code with vagrant scripts:
- Start the VM and ssh into it. This will automatically install all the dependencies and build and install McNetKAT in the VM, so it's time for a coffee break...
git clone --branch mcnetkat-pldi19 https://github.com/frenetic-lang/frenetic.git
cd frenetic/vm/
vagrant up
vagrant ssh
cd frenetic
From source (requires Linux):
Installing from source requires Linux and has been tested only under Ubuntu 14.04 and 16.04.
- Install OCaml and OPAM:
- Get the source code with vagrant scripts:
- Copy the script from the `frenetic/vm` folder to the same folder containing the
`frenetic` folder:
cp frenetic/vm/*.sh .
-
Then execute the following to automatically build McNetKAT,
PRISM, Bayonet, and various dependencies.
source setup-opam.sh source setup-ocaml.sh source setup-third-party.sh source setup-mcnetkat.sh source setup-prism-bayonet.sh source setup-plotlibs.sh
sudo apt-get update
sudo apt-get install -y python-software-properties build-essential m4
sudo add-apt-repository -y -u ppa:avsm/ppa
sudo apt-get install -y ocaml ocaml-native-compilers camlp4 camlp4-extra opam git libssl-dev aspcud
git clone --branch mcnetkat-pldi19 https://github.com/frenetic-lang/frenetic.git
Replicating results
Evaluation
General Tips & Tricks
-
Log Files: The results of experiments are saved in
individual
*.log
files, using one file per "data point". If a log file for a particular data point is already present, the experiment is skipped. To force the regeneration of a data point, either delete the corresponding log file or use the pattern
When things go wrong, the log files can often give an indication why.F=1 ./run_bench.py
-
Timeouts:
By default, the timeout for most experiments is set to 1 hour. You can
also overwrite this default by specifying a timeout in seconds using the
following pattern:
This runs the benchmark with a timeout of 1 minute per data point.TIMEOUT=60 ./run_bench.py
-
Orphan Processes:
If you interrupt an experiment before it has completed, or it crashes, you
may be left
with orphan processes running in the background and draining CPU and memory
resources. This will interfere with other experiments, so make sure to
kill these orphan processes!
The script
kill_all.sh
in the frenetic root folder can help with this, although you may have to adjust the parameters to your cluster (or comment out the cluster specific part, if you don't have a cluster). If your system appears unresponsive, this is likely the issue! - Performance: One key feature of McNetKAT is that it parallelizes across all available cores. This leads to significant performance improvements, provided a large number of cores is available, and there is enough memory to afford parallelization. On a VM with limited memory, McNetKAT may run out of memory for larger experiments.
Scalability on FatTree (Figure 7)
This experiment is run on a single machine.
-
Navigate to the fattree/ecmp directory and delete old results:
cd frenetic/experiments/fattree/ecmp rm *.log
-
Generate data points:
./run_bench.py
-
Plot result:
./plot.py
-
View result:
xpdf ecmp.pdf less ecmp.txt
Parallel Speedup (Figure 8)
This experiment assumes a cluster of machines with 16 physical cores each. If no cluster is available, or the number of physical cores differs, the experiment may be reproduced partially, as detailed below.
Setting up the cluster:
The master machine
(from which the experiment is run) must have access to the other machines
using ssh via passwordless key authentication.
In particular, if foo
is the hostname of a remote
machine on the cluster, the following should succeed without the need to
enter a passphrase, where
bar
is an appropriate user on the host foo
:
ssh bar@foo
Instructions on how to achieve this can be found here.
Running the experiment:
-
Navigate to the fattree/speedup directory and delete old results:
cd frenetic/experiments/fattree/speedup rm *.log
-
Adjust parameters to fit your setup:
The relevant parameters are the following:nano +11 ./run_bench.py
- J: The number of processes to use on the local machine. Set this to a value no larger than the number of physical cores.
- RN: The number of remote machines to use.
- RJ: The number of processes per remote machine to use. As with J, ensure this number is no larger than the number of physical cores.
run_bench.py
that have RN set to a value above 0 -- you can still evaluate the speedup on a single machine. -
Optionally, if you have a cluster: Given remote machines
m1, m2, ..., mk
with usersu1, u2, ..., uk
, export your setup as follow:export CLUSTER="u1@m1 u2@m2 ... uk@mk"
-
Optionally, adjust the parameter
k
of the FatTrees to use:
By default, the experiment is run with k=14 and k=16.nano +38 run_bench.py
-
Generate data points:
./run_bench.py
-
Plot result:
./plot.py
-
View result:
xpdf speedup.pdf less speedup.txt
Comparision with other tools (Figure 10)
This experiment is mostly run on a single machine. A cluster is needed only
to generate the green PNK (cluster)
line from Figure 10.
-
Navigate to bayonet experiment directory and delete old results:
cd frenetic/examples/bayonet rm *.log
-
Generate input files for Bayonet and McNetKAT:
If this completes successfully, you should see various./generate.py
*.dot
and*.bayonet
files in the directory. -
Optionally, adjust the paramters with which to invoke the various tools in
run_bench.py
. In particular, if you don't have a cluster, comment out the line with'RN': '24'
-- it instructs McNetKAT to spread the workload to 24 remote machines.
If you do have a cluster, follow the instructions on how to set up the cluster.nano +19 run_bench.py
-
Generate data points:
./run_bench.py
-
Plot result:
cd ~/frenetic/plots ./bayonet.py ../examples/bayonet
-
View result:
xpdf bayonet.pdf less bayonet.txt
Case Study: Data Center Fault-Tolerance
Comparing resilience of routing schemes (Figure 11)
- To perform the analysis corresponding to the
case study show in Figure 11 of the paper, run
the following from the
frenetic
directory: - This will generate json files in
./examples/output/results/comparisons
directory with data for the two tables. For the case withk
failures, see the filek-1-16.json
. The fieldresilience
in each file corresponds to the equivalence test with teleportation in Figure 11 (b). - To generate the table, run:
probnetkat.compare
sh ./plots/f10-tables.sh
Case study with unbounded failures (Figure 12)
- To perform the analysis corresponding to the
case study show in Figure 12 of the paper, run the
following script from the
frenetic
directory: - It may take up to 10 minutes to complete the experiments.
- To plot and visualize the results, perform the following:
sh ./experiments/f10/run-partial.sh
sh ./plots/f10.sh
xpdf experiments/f10/[name].pdf # [name] = cdf_tput_vs_hop_count-inf-1-4, expected_hops_vs_fail_prob-inf, tput_vs_fail_prob-inf