Use of Measurement Tools South Carolina State University

Use of Measurement Tools South Carolina State University

Use of Measurement Tools South Carolina State University Matt Zekauskas, [email protected] 2017-05-19 This document is a result of work by the perfSONAR Project (http://www.perfsonar.net) and is licensed under CC BY-SA 4.0 ( https://creativecommons.org/licenses/by-sa/4.0/). WARNING WARNING WARNING This deck was built for perfSONAR 3.5. With the perfSONAR 4.0 release in April 2017, bwctl is replaced by a new uniform scheduler, pScheduler. The underlying tools, however, (iperf, nuttcp, owamp) are the same. See http://docs.perfsonar.net/pscheduler_intro.html 2016, http://www.perfsonar.net 1/29/20 2 Tool Usage All of the previous examples were discovered, debugged, and corrected through the aide of the tools on the pS Performance Toolkit Some are run in a diagnostic (e.g. one off) fashion, others are automated I will go over diagnostic usage of some of the tools:

OWAMP BWCTL 2016, http://www.perfsonar.net 1/29/20 3 Hosts Used: BWCTL Hosts (10G) wash-pt1.es.net (McLean VA) sunn-pt1.es.net (Sunnyvale CA) OWAMP Hosts (1G) wash-owamp.es.net (McLean VA) sunn-owamp.es.net (Sunnyvale CA) Path ~60ms RTT

traceroute to sunn-owamp.es.net (198.129.254.78), 30 hops max, 60 byte packets 1 198.124.252.125 (198.124.252.125) 0.163 ms 0.149 ms 0.138 ms 2 washcr5-ip-c-washsdn2.es.net (134.55.50.61) 0.655 ms washcr5-ip-a-washsdn2.es.net (134.55.42.33) washsdn2.es.net (134.55.50.61) 1.324 ms 3 chiccr5-ip-a-washcr5.es.net (134.55.36.45) 17.884 ms 17.939 ms 18.217 ms 4 kanscr5-ip-a-chiccr5.es.net (134.55.43.82) 28.980 ms 29.066 ms 29.295 ms 5 denvcr5-ip-a-kanscr5.es.net (134.55.49.57) 39.515 ms 39.601 ms 39.877 ms 6 sacrcr5-ip-a-denvcr5.es.net (134.55.50.201) 60.382 ms 60.210 ms 60.437 ms 7 sunncr5-ip-a-sacrcr5.es.net (134.55.40.6) 63.067 ms 68.035 ms 68.266 ms 8 sunn-owamp.es.net (198.129.254.78) 62.462 ms 62.445 ms 62.436 ms 2016, http://www.perfsonar.net 0.991 ms washcr5-ip-c- 1/29/20 4 Forcing Bad Performance (to illustrate behavior) Add 10% Loss to a specific host sudo /sbin/tc qdisc delete dev eth0 root sudo /sbin/tc qdisc add dev eth0 root handle 1: prio sudo /sbin/tc qdisc add dev eth0 parent 1:1 handle 10: netem loss 10% sudo /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst

198.129.254.78/32 flowid 1:1 Add 10% Duplication to a specific host sudo /sbin/tc qdisc delete dev eth0 root sudo /sbin/tc qdisc add dev eth0 root handle 1: prio sudo /sbin/tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 10% sudo /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 198.129.254.78/32 flowid 1:1 Add 10% Corruption to a specific host sudo sudo sudo sudo qdisc delete dev eth0 root qdisc add dev eth0 root handle 1: prio qdisc add dev eth0 parent 1:1 handle 10: netem corrupt 10% filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 198.129.254.78/32 flowid 1:1 Reorder packets: 50% of packets (with a correlation of 75%) will get sent immediately, others will be delayed by 75ms . sudo sudo

sudo sudo /sbin/tc /sbin/tc /sbin/tc /sbin/tc /sbin/tc /sbin/tc /sbin/tc /sbin/tc qdisc delete dev eth0 root qdisc add dev eth0 root handle 1: prio qdisc add dev eth0 parent 1:1 handle 10: netem delay 10ms reorder 25% 50% filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 198.129.254.78/32 flowid 1:1 Reset things sudo /sbin/tc qdisc delete dev eth0 root 2016, http://www.perfsonar.net 1/29/20 5

Its All About the Buffers A prequel to using BWCTL The Bandwidth Delay Product The amount of in flight data allowed for a TCP connection (BDP = bandwidth * round trip time) Example: 10Gb/s cross country, ~100ms 10,000,000,000 b/s * .1 s = 1,000,000,000 bits 1,000,000,000 / 8 = 125,000,000 bytes 125,000,000 bytes / (1024*1024) ~ 125MB Major OSs default to a base of 4M. For those playing at home, the maximum throughput with a TCP window of 4 MByte for RTTs (1500 MTU): 10ms = 3.25 Gbps 50ms = 655 Mbps 100ms = 325 Mbps Autotuning does help by growing the window when needed. Do make this work properly, the host needs tuning: https://fasterdata.es.net/host-tuning/ Ignore the math aspect, its really just about making sure there is memory to catch packets. As the speed increases, there are more packets. If there is not memory, we drop them, and that makes TCP sad. Memory on hosts, and network gear 2016, http://www.perfsonar.net 1/29/20 6 Lets Talk about IPERF Start with a definition: network throughput is the rate of successful message delivery over a communication channel Easier terms: how much data can I shovel into the network for some given amount of time

What does this tell us? Opposite of utilization (e.g. its how much we can get at a given point in time, minus what is utilized) Utilization and throughput added together are capacity Tools that measure throughput are a simulation of a real work use case (e.g. how well could bulk data movement perform) Ways to game the system Parallel streams Manual window size adjustments memory to memory testing no spinning disk 2016, http://www.perfsonar.net 1/29/20 7 Lets Talk about IPERF Couple of varieties of tester that BWCTL (the control/policy wrapper) knows how to talk with: Iperf2 Default for the command line (e.g. bwctl c HOST will invoke this) Some known behavioral problems (Older versions were CPU bound, hard to get UDP testing to be correct) Iperf3 Default for the perfSONAR regular testing framework, can invoke via command line switch (bwctl T iperf3 c HOST) New brew, has features iperf2 is missing (retransmissions, JSON output, daemon mode, etc.) Note: Single threaded, so performance is gated on clock speed. Parallel stream testing is hard as a result

(e.g. performance is bound to one core) Nuttcp Different code base, can invoke via command line switch (bwctl T nuttcp c HOST) More control over how the tool behaves on the host (bind to CPU/core, etc.) Similar feature set to iperf3 2016, http://www.perfsonar.net 1/29/20 8 What IPERF Tells Us Lets start by describing throughput, which is vague. Capacity: link speed Narrow Link: link with the lowest capacity along a path Capacity of the end-to-end path = capacity of the narrow link Utilized bandwidth: current traffic load Available bandwidth: capacity utilized bandwidth Tight Link: link with the least available bandwidth in a path Achievable bandwidth: includes protocol and host issues (e.g. BDP!) All of this is memory to memory, e.g. we are not involving a spinning disk (more later) 45 Mbps 10 Mbps

100 Mbps 45 Mbps source (Shaded portion shows background traffic) 2016, http://www.perfsonar.net sink Narrow Link Tight Link 1/29/20 9 Some Quick Words on BWCTL BWCTL is the wrapper around a couple of tools (we will show the throughput tools first) Policy specification can do things like prevent tests to subnets, or allow longer tests to others. See the man pages for more details

Some general notes: Use -c to specify a catcher (receiver) Use -s to specify a sender Will default to IPv6 if available (use -4 to force IPv4 as needed, or specify things in terms of an address if your host names are dual homed) The defaults are to be -f m (Megabits per second) and -t 10 (10 second test) The omit X flag can be used to parse off the TCP slow start data from the final results 2016, http://www.perfsonar.net 1/29/20 10 BWCTL Example (iperf2) [[email protected] ~]$ bwctl -T iperf -f m -t 10 -i 2 -c sunn-pt1.es.net bwctl: 83 seconds until test results available RECEIVER START bwctl: exec_line: /usr/bin/iperf -B 198.129.254.58 -s -f m -m -p 5136 -t 10 -i 2.000000 bwctl: run_tool: tester: iperf bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598657357.738868 -----------------------------------------------------------Server listening on TCP port 5136 Binding to local address 198.129.254.58 TCP window size: 0.08 MByte (default) -----------------------------------------------------------[ 16] local 198.129.254.58 port 5136 connected with 198.124.238.34 port 5136 [ ID] Interval Transfer

Bandwidth [ 16] 0.0- 2.0 sec 90.4 MBytes 379 Mbits/sec [ 16] 2.0- 4.0 sec 689 MBytes 2891 Mbits/sec [ 16] 4.0- 6.0 sec 684 MBytes 2867 Mbits/sec [ 16] 6.0- 8.0 sec 691 MBytes 2897 Mbits/sec [ 16] 8.0-10.0 sec 691 MBytes 2898 Mbits/sec [ 16] 0.0-10.0 sec 2853 MBytes 2386 Mbits/sec [ 16] MSS size 8948 bytes (MTU 8988 bytes, unknown interface) bwctl: stop_tool: 3598657390.668028 N.B. This is what perfSONAR Graphs the average of the complete test RECEIVER END 2016, http://www.perfsonar.net 1/29/20 11 BWCTL Example (iperf3) [[email protected] ~]$ bwctl -T iperf3 -f m -t 10 -i 2 -c sunn-pt1.es.net

bwctl: 55 seconds until test results available SENDER START bwctl: run_tool: tester: iperf3 bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598657653.219168 Test initialized Running client Connecting to host 198.129.254.58, port 5001 [ 17] local 198.124.238.34 port 34277 connected to 198.129.254.58 port 5001 [ ID] Interval Transfer Bandwidth Retransmits [ 17] 0.00-2.00 sec 430 MBytes 1.80 Gbits/sec 2 [ 17] 2.00-4.00 sec 680 MBytes 2.85 Gbits/sec 0 [ 17] 4.00-6.00 sec 669 MBytes 2.80 Gbits/sec 0 [ 17] 6.00-8.00 sec

670 MBytes 2.81 Gbits/sec 0 [ 17] 8.00-10.00 sec 680 MBytes 2.85 Gbits/sec 0 [ ID] Interval Transfer Bandwidth Retransmits Sent [ 17] 0.00-10.00 sec 3.06 GBytes 2.62 Gbits/sec 2 Received [ 17] 0.00-10.00 sec 3.06 GBytes 2.63 Gbits/sec N.B. This is what perfSONAR Graphs the average of the complete test iperf Done. bwctl: stop_tool: 3598657664.995604 SENDER END 2016, http://www.perfsonar.net 1/29/20 12

BWCTL Example (nuttcp) [[email protected] ~]$ bwctl -T nuttcp -f m -t 10 -i 2 -c sunn-pt1.es.net bwctl: exec_line: /usr/bin/nuttcp -vv -p 5001 -i 2.000000 -T 10 -t 198.129.254.58 bwctl: run_tool: tester: nuttcp bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598657844.605350 nuttcp-t: v7.1.6: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 198.129.254.58 nuttcp-t: time limit = 10.00 seconds nuttcp-t: connect to 198.129.254.58 with mss=8948, RTT=62.418 ms nuttcp-t: send window size = 98720, receive window size = 87380 nuttcp-t: available send window = 74040, available receive window = 65535 nuttcp-r: v7.1.6: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: interval reporting every 2.00 seconds nuttcp-r: accept from 198.124.238.34 nuttcp-r: send window size = 98720, receive window size = 87380 nuttcp-r: available send window = 74040, available receive window = 65535 131.0625 MB / 2.00 sec = 549.7033 Mbps 1 retrans 725.6250 MB / 2.00 sec = 3043.4964 Mbps 0 retrans 715.0000 MB / 2.00 sec = 2998.8284 Mbps 0 retrans 714.3750 MB /

2.00 sec = 2996.4168 Mbps 0 retrans 707.1250 MB / 2.00 sec = 2965.8349 Mbps 0 retrans nuttcp-t: 2998.1379 MB in 10.00 real seconds = 307005.08 KB/sec = 2514.9856 Mbps nuttcp-t: 2998.1379 MB in 2.32 CPU seconds = 1325802.48 KB/cpu sec nuttcp-t: retrans = 1 nuttcp-t: 47971 I/O calls, msec/call = 0.21, calls/sec = 4797.03 nuttcp-t: 0.0user 2.3sys 0:10real 23% 0i+0d 768maxrss 0+2pf 156+28csw N.B. This is what perfSONAR Graphs the average of the complete test nuttcp-r: 2998.1379 MB in 10.07 real seconds = 304959.96 KB/sec = 2498.2320 Mbps nuttcp-r: 2998.1379 MB in 2.36 CPU seconds = 1301084.31 KB/cpu sec nuttcp-r: 57808 I/O calls, msec/call = 0.18, calls/sec = 5742.21 nuttcp-r: 0.0user 2.3sys 0:10real 23% 0i+0d 770maxrss 0+4pf 9146+24csw bwctl: stop_tool: 3598657866.949026 SENDER END 2016, http://www.perfsonar.net 1/29/20 13 BWCTL Example (nuttcp, [1%] loss)

[[email protected] ~]$ bwctl -T nuttcp -f m -t 10 -i 2 -c sunn-pt1.es.net bwctl: exec_line: /usr/bin/nuttcp -vv -p 5004 -i 2.000000 -T 10 -t 198.129.254.58 bwctl: run_tool: tester: nuttcp bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598658394.807831 nuttcp-t: v7.1.6: socket nuttcp-t: buflen=65536, nstream=1, port=5004 tcp -> 198.129.254.58 nuttcp-t: time limit = 10.00 seconds nuttcp-t: connect to 198.129.254.58 with mss=8948, RTT=62.440 ms nuttcp-t: send window size = 98720, receive window size = 87380 nuttcp-t: available send window = 74040, available receive window = 65535 nuttcp-r: v7.1.6: socket nuttcp-r: buflen=65536, nstream=1, port=5004 tcp nuttcp-r: interval reporting every 2.00 seconds nuttcp-r: accept from 198.124.238.34 nuttcp-r: send window size = 98720, receive window size = 87380 nuttcp-r: available send window = 74040, available receive window = 65535 6.3125 MB / 2.00 sec = 26.4759 Mbps 27 retrans 3.5625 MB / 2.00 sec = 14.9423 Mbps 4 retrans 3.8125 MB / 2.00 sec = 15.9906 Mbps

7 retrans 4.8125 MB / 2.00 sec = 20.1853 Mbps 13 retrans 6.0000 MB / 2.00 sec = 25.1659 Mbps 7 retrans nuttcp-t: 25.5066 MB in 10.00 real seconds = 2611.85 KB/sec = 21.3963 Mbps nuttcp-t: 25.5066 MB in 0.01 CPU seconds = 1741480.37 KB/cpu sec nuttcp-t: retrans = 58 nuttcp-t: 409 I/O calls, msec/call = 25.04, calls/sec = 40.90 nuttcp-t: 0.0user 0.0sys 0:10real 0% 0i+0d 768maxrss 0+2pf 51+3csw nuttcp-r: 25.5066 nuttcp-r: 25.5066 nuttcp-r: 787 I/O nuttcp-r: 0.0user bwctl: stop_tool: N.B. This is what perfSONAR Graphs the average of the complete test MB in 10.30 real seconds = 2537.03 KB/sec = 20.7833 Mbps MB in 0.02 CPU seconds = 1044874.29 KB/cpu sec calls, msec/call = 13.40, calls/sec = 76.44 0.0sys 0:10real 0% 0i+0d 770maxrss 0+4pf 382+0csw 3598658417.214024

SENDER END 2016, http://www.perfsonar.net 1/29/20 14 BWCTL Example (nuttcp, re-ordering) [[email protected] ~]$ bwctl -T nuttcp -f m -t 10 -i 2 -c sunn-pt1.es.net bwctl: exec_line: /usr/bin/nuttcp -vv -p 5007 -i 2.000000 -T 10 -t 198.129.254.58 bwctl: run_tool: tester: nuttcp bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598658824.115013 nuttcp-t: v7.1.6: socket nuttcp-t: buflen=65536, nstream=1, port=5007 tcp -> 198.129.254.58 nuttcp-t: time limit = 10.00 seconds nuttcp-t: connect to 198.129.254.58 with mss=8948, RTT=62.433 ms nuttcp-t: send window size = 98720, receive window size = 87380 nuttcp-t: available send window = 74040, available receive window = 65535 nuttcp-r: v7.1.6: socket nuttcp-r: buflen=65536, nstream=1, port=5007 tcp nuttcp-r: interval reporting every 2.00 seconds nuttcp-r: accept from 198.124.238.34 nuttcp-r: send window size = 98720, receive window size = 87380 nuttcp-r: available send window = 74040, available receive window = 65535 3.4375 MB /

2.00 sec = 14.4176 Mbps 3 retrans 39.5625 MB / 2.00 sec = 165.9376 Mbps 472 retrans 45.5625 MB / 2.00 sec = 191.1028 Mbps 912 retrans 55.9375 MB / 2.00 sec = 234.6186 Mbps 1750 retrans 57.7500 MB / 2.00 sec = 242.2218 Mbps 2434 retrans nuttcp-t: 210.7074 MB in 10.00 real seconds = 21576.30 KB/sec = 176.7531 Mbps nuttcp-t: 210.7074 MB in 0.13 CPU seconds = 1622544.64 KB/cpu sec nuttcp-t: retrans = 6059 nuttcp-t: 3372 I/O calls, msec/call = 3.04, calls/sec = 337.20 nuttcp-t: 0.0user 0.1sys 0:10real 1% 0i+0d 768maxrss 0+2pf 72+10csw N.B. This is what perfSONAR Graphs the average of the complete test nuttcp-r: 210.7074 MB in 11.25 real seconds = 19175.61 KB/sec = 157.0866 Mbps nuttcp-r: 210.7074 MB in 0.20 CPU seconds = 1073614.78 KB/cpu sec nuttcp-r: 4692 I/O calls, msec/call = 2.46, calls/sec = 416.99 nuttcp-r: 0.0user 0.1sys 0:11real 1% 0i+0d 770maxrss 0+4pf 1318+12csw bwctl: stop_tool: 3598658835.981810 SENDER END

2016, http://www.perfsonar.net 1/29/20 15 BWCTL Example (nuttcp, duplication) [[email protected] ~]$ bwctl -T nuttcp -f m -t 10 -i 2 -c sunn-pt1.es.net bwctl: exec_line: /usr/bin/nuttcp -vv -p 5008 -i 2.000000 -T 10 -t 198.129.254.58 bwctl: run_tool: tester: nuttcp bwctl: run_tool: receiver: 198.129.254.58 bwctl: run_tool: sender: 198.124.238.34 bwctl: start_tool: 3598659020.747514 nuttcp-t: v7.1.6: socket nuttcp-t: buflen=65536, nstream=1, port=5008 tcp -> 198.129.254.58 nuttcp-t: time limit = 10.00 seconds nuttcp-t: connect to 198.129.254.58 with mss=8948, RTT=62.425 ms nuttcp-t: send window size = 98720, receive window size = 87380 nuttcp-t: available send window = 74040, available receive window = 65535 nuttcp-r: v7.1.6: socket nuttcp-r: buflen=65536, nstream=1, port=5008 tcp nuttcp-r: interval reporting every 2.00 seconds nuttcp-r: accept from 198.124.238.34 nuttcp-r: send window size = 98720, receive window size = 87380 nuttcp-r: available send window = 74040, available receive window = 65535 114.8125 MB / 2.00 sec = 481.5470 Mbps 22 retrans

726.5625 MB / 2.00 sec = 3047.4347 Mbps 0 retrans 711.5625 MB / 2.00 sec = 2984.4841 Mbps 0 retrans 716.3750 MB / 2.00 sec = 3004.7216 Mbps 0 retrans 713.5000 MB / 2.00 sec = 2992.6404 Mbps 0 retrans nuttcp-t: 2991.1407 MB in 10.00 real seconds = 306290.41 KB/sec = 2509.1311 Mbps nuttcp-t: 2991.1407 MB in 2.45 CPU seconds = 1250875.20 KB/cpu sec nuttcp-t: retrans = 22 nuttcp-t: 47859 I/O calls, msec/call = 0.21, calls/sec = 4785.86 nuttcp-t: 0.0user 2.4sys 0:10real 24% 0i+0d 768maxrss 0+2pf 155+30csw N.B. This is what perfSONAR Graphs the average of the complete test nuttcp-r: 2991.1407 MB in 10.08 real seconds = 303823.24 KB/sec = 2488.9200 Mbps nuttcp-r: 2991.1407 MB in 2.49 CPU seconds = 1231762.62 KB/cpu sec nuttcp-r: 58710 I/O calls, msec/call = 0.18, calls/sec = 5823.66 nuttcp-r: 0.0user 2.4sys 0:10real 24% 0i+0d 770maxrss 0+4pf 10146+24csw bwctl: stop_tool: 3598659043.778699 SENDER END

2016, http://www.perfsonar.net 1/29/20 16 What IPERF May Not be Telling Us Fasterdata Tunings Fasterdata recommends a set of tunings ( https://fasterdata.es.net/host-tuning/) that are designed to increase the performance of a single COTS host, on a shared network infrastructure What this means is that we dont recommend maximum tuning We are assuming (expecting? hoping?) the host can do parallel TCP streams via the data transfer application (e.g. Globus) Because of that you dont want to assign upwards of 256M of kernel memory to a single TCP socket a sensible amount is 32M/64M, and if you have 4 streams you are getting the benefits of 128M/256M (enough for a 10G cross country flow) We also strive for good citizenship its very possible for a single 10G machine to get 9.9Gbps TCP, we see this often. If its on a shared infrastructure, there is benefit to downtuning buffers. Can you ignore the above? Sure overtune as you see fit, KNOW YOUR NETWORK, USERS, AND USE CASES 2016, http://www.perfsonar.net 1/29/20

17 What BWCTL May Not be Telling Us Regular Testing Setup If we dont max tune, and run a 20/30 second single streamed TCP test (defaults for the toolkit) we are not going to see 9.9Gbps. Think critically: TCP ramp up takes 1-5 seconds (depending on latency), and any tiny blip of congestion will cut TCP performance in half. N.B. Iperf3 has the omit flag now, that allows you to ignore some amount of slow start It is common (and in my mind - expected) to see regular testing values on clean networks range between 1Gbps and 5Gbps, latency dependent Performance has two ranges really crappy, and expected (where expected has a lot of headroom). You will know when its really crappy (trust me). Diagnostic Suggestions You can max out BWCTL in this capacity Run long tests (-T 60), with multiple streams (-P 4), and large windows (-W 128M); go crazy It is also VERY COMMON that doing so will produce different results than your regular testing. Its a different set of test parameters, its not that the tools are deliberately lying. 2016, http://www.perfsonar.net 1/29/20 18 When at the end of the road Throughput is a number, and is not useful in many cases except to tell you where the performance fits

on a spectrum Insight into why the number is low or high has to come from other factors Recall that TCP relies on a feedback loop that relies on latency and minimal packet loss We need to pull another tool out of the shed 2016, http://www.perfsonar.net 1/29/20 19 OWAMP OWAMP = One Way Active Measurement Protocol E.g. one way ping Some differences from traditional ping: Measure each direction independently (recall that we often see things like congestion occur in one direction and not the other) Uses small evenly spaced groupings of UDP (not ICMP) packets Ability to ramp up the interval of the stream, size of the packets, number of packets OWAMP is most useful for detecting packet train abnormalities on an end to end basis Loss Duplication Orderness Latency on the forward vs. reverse path Number of Layer 3 hops

Does require some accurate time via NTP the perfSONAR toolkit does take care of this for you. 2016, http://www.perfsonar.net 1/29/20 20 What OWAMP Tells Us OWAMP is a necessity in regular testing if you arent using this you need to be Queuing often occurs in a single direction (think what everyone is doing at noon on a college campus) Packet loss (and how often/how much occurs over time) is more valuable than throughput This gives you a why to go with an observation. If your router is going to drop a 50B UDP packet, it is most certainly going to drop a 15000B/9000B TCP packet Overlaying data Compare your throughput results against your OWAMP do you see patterns? Alarm on each, if you are alarming (and we hope you are alarming ) 2016, http://www.perfsonar.net 1/29/20 21 What OWAMP Doesnt Tell Us

OWAMP Cant pick out a class of problems due to its short frequency/bandwidth E.g. dirty fibers/failing optics require a larger UDP stream (1-2 Gbps) Suggested to fill the pipe with something else, and then see how OWAMP behaves 2016, http://www.perfsonar.net 1/29/20 22 OWAMP (initial) [[email protected] ~]$ owping sunn-owamp.es.net Approximately 12.6 seconds until results available --- owping statistics from [wash-owamp.es.net]:8885 to [sunn-owamp.es.net]:8827 --SID: c681fe4ed67f1b3e5faeb249f078ec8a first: 2014-01-13T18:11:11.420 last: 2014-01-13T18:11:20.587 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31/31.1/31.7 ms, (err=0.00201 ms) one-way jitter = 0 ms (P95-P50) Hops = 7 (consistently) no reordering N.B. This is what perfSONAR Graphs the average of the complete test --- owping statistics from [sunn-owamp.es.net]:9027 to [wash-owamp.es.net]:8888 --SID: c67cfc7ed67f1b3eaab69b94f393bc46

first: 2014-01-13T18:11:11.321 last: 2014-01-13T18:11:22.672 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31.4/31.5/32.6 ms, (err=0.00201 ms) one-way jitter = 0 ms (P95-P50) Hops = 7 (consistently) no reordering 2016, http://www.perfsonar.net 1/29/20 23 OWAMP (w/ loss) [[email protected] ~]$ owping sunn-owamp.es.net Approximately 12.6 seconds until results available --- owping statistics from [wash-owamp.es.net]:8852 to [sunn-owamp.es.net]:8837 --SID: c681fe4ed67f1f0908224c341a2b83f3 first: 2014-01-13T18:27:22.032 last: 2014-01-13T18:27:32.904 100 sent, 12 lost (12.000%), 0 duplicates one-way delay min/median/max = 31.1/31.1/31.3 ms, (err=0.00502 ms) one-way jitter = nan ms (P95-P50) Hops = 7 (consistently) no reordering N.B. This is what perfSONAR Graphs the average of the complete test

--- owping statistics from [sunn-owamp.es.net]:9182 to [wash-owamp.es.net]:8893 --SID: c67cfc7ed67f1f09531c87cf38381bb6 first: 2014-01-13T18:27:21.993 last: 2014-01-13T18:27:33.785 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31.4/31.5/31.5 ms, (err=0.00502 ms) one-way jitter = 0 ms (P95-P50) Hops = 7 (consistently) no reordering 2016, http://www.perfsonar.net 1/29/20 24 OWAMP (w/ re-ordering) [[email protected] ~]$ owping sunn-owamp.es.net Approximately 12.9 seconds until results available --- owping statistics from [wash-owamp.es.net]:8814 to [sunn-owamp.es.net]:9062 --SID: c681fe4ed67f21d94991ea335b7a1830 first: 2014-01-13T18:39:22.543 last: 2014-01-13T18:39:31.503 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31.1/106/106 ms, (err=0.00201 ms) one-way jitter = 0.1 ms (P95-P50) Hops = 7 (consistently) 1-reordering = 19.000000% 2-reordering = 1.000000% no 3-reordering

N.B. This is what perfSONAR Graphs the average of the complete test --- owping statistics from [sunn-owamp.es.net]:8770 to [wash-owamp.es.net]:8939 --SID: c67cfc7ed67f21d994c1302dff644543 first: 2014-01-13T18:39:22.602 last: 2014-01-13T18:39:31.279 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31.4/31.5/32 ms, (err=0.00201 ms) one-way jitter = 0 ms (P95-P50) Hops = 7 (consistently) no reordering 2016, http://www.perfsonar.net 1/29/20 25 Packet Re-Ordering Re-ordering can occur in networks when: Assymetry in paths leads to information arriving outside of sent order (LAG links, route asymmetry, queuing/processing delays) What does a re-ordered packet mean? Stalls the window from advancing If we have to ACK the same packet 3 times, we run the risk of the entire window being re-sent

General rule when TCP thinks it needs to SACK or TrippleDuplicateAck, it will take a long time to recover 2016, http://www.perfsonar.net 1/29/20 26 Packet Re-Ordering In the next example, a series of packets was out of ordered (1%, and delayed by 10% of the path length) This causes TCP to stall, and takes a while to recover from a small event 2016, http://www.perfsonar.net 1/29/20 27 Packet Re-Ordering 2016, http://www.perfsonar.net 1/29/20 28

OWAMP (w/ duplication) [[email protected] ~]$ owping sunn-owamp.es.net Approximately 12.6 seconds until results available --- owping statistics from [wash-owamp.es.net]:8905 to [sunn-owamp.es.net]:8933 --SID: c681fe4ed67f228b6b36524c3d3531da first: 2014-01-13T18:42:20.443 last: 2014-01-13T18:42:30.223 100 sent, 0 lost (0.000%), 11 duplicates one-way delay min/median/max = 31.1/31.1/33 ms, (err=0.00201 ms) one-way jitter = 0.1 ms (P95-P50) Hops = 7 (consistently) no reordering N.B. This is what perfSONAR Graphs the average of the complete test --- owping statistics from [sunn-owamp.es.net]:9057 to [wash-owamp.es.net]:8838 --SID: c67cfc7ed67f228bb9a5a9b27f4b2d47 first: 2014-01-13T18:42:20.716 last: 2014-01-13T18:42:29.822 100 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 31.4/31.5/31.9 ms, (err=0.00201 ms) one-way jitter = 0 ms (P95-P50) Hops = 7 (consistently) no reordering 2016, http://www.perfsonar.net 1/29/20

29 What OWAMP Tells Us 2016, http://www.perfsonar.net 1/29/20 30 Expectation Management Installing perfSONAR, even on a completely clean network, will not yet you instant line rate results. Machine architecture, as well as OS tuning plays a huge role in the equation perfSONAR is a stable set of software choices that ride of COTS hardware some hardware works better than others Equally, perfSONAR (and fasterdata.es.net) recommend friendly tunings that will not blow the barn doors off of the rest of the network The following will introduce some expectation management tips. 2016, http://www.perfsonar.net 1/29/20 31 BWCTL Invoking Other Tools

BWCTL has the ability to invoke other tools as well Forward and reverse Traceroute/Tracepath Forward and reverse Ping Forward and reverse OWPing The BWCTL daemon can be used to request and retrieve results for these tests Both are useful in the course of debugging problems: Get the routes before a throughput test Determine path MTU with tracepath Getting the reverse direction without having to coordinate with a human on the other end (huge win when debugging multiple networks). Note that these are command line only not used in the regular testing interface. 2016, http://www.perfsonar.net 1/29/20 32 BWCTL Invoking Other Tools (Traceroute) [[email protected] ~]$ bwtraceroute -T traceroute -4 -s sacr-pt1.es.net bwtraceroute: Using tool: traceroute bwtraceroute: 37 seconds until test results available SENDER START traceroute to 198.124.238.34 (198.124.238.34), 30 hops max, 60 byte packets 1 sacrcr5-sacrpt1.es.net (198.129.254.37) 0.490 ms 0.788 ms 1.114 ms 2 denvcr5-ip-a-sacrcr5.es.net (134.55.50.202) 21.304 ms 21.594 ms 21.924 ms 3 kanscr5-ip-a-denvcr5.es.net (134.55.49.58) 31.944 ms 32.608 ms 32.838 ms

4 chiccr5-ip-a-kanscr5.es.net (134.55.43.81) 42.904 ms 43.236 ms 43.566 ms 5 washcr5-ip-a-chiccr5.es.net (134.55.36.46) 60.046 ms 60.339 ms 60.670 ms 6 wash-pt1.es.net (198.124.238.34) 59.679 ms 59.693 ms 59.708 ms SENDER END [[email protected] ~]$ bwtraceroute -T traceroute -4 -c sacr-pt1.es.net bwtraceroute: Using tool: traceroute bwtraceroute: 35 seconds until test results available SENDER START traceroute to 198.129.254.38 (198.129.254.38), 30 hops max, 60 byte packets 1 wash-te-perf-if1.es.net (198.124.238.33) 0.474 ms 0.816 ms 1.145 ms 2 chiccr5-ip-a-washcr5.es.net (134.55.36.45) 19.133 ms 19.463 ms 19.786 ms 3 kanscr5-ip-a-chiccr5.es.net (134.55.43.82) 28.515 ms 28.799 ms 29.083 ms 4 denvcr5-ip-a-kanscr5.es.net (134.55.49.57) 39.077 ms 39.348 ms 39.628 ms 5 sacrcr5-ip-a-denvcr5.es.net (134.55.50.201) 60.013 ms 60.299 ms 60.983 ms 6 sacr-pt1.es.net (198.129.254.38) 59.679 ms 59.678 ms 59.668 ms SENDER END 2016, http://www.perfsonar.net 1/29/20 33 BWCTL Invoking Other Tools (Tracepath) [[email protected] ~]$ bwtraceroute -T tracepath -4 -s sacr-pt1.es.net bwtraceroute: Using tool: tracepath bwtraceroute: 36 seconds until test results available SENDER START 1?: [LOCALHOST]

pmtu 9000 1: sacrcr5-sacrpt1.es.net (198.129.254.37) 1: sacrcr5-sacrpt1.es.net (198.129.254.37) 2: denvcr5-ip-a-sacrcr5.es.net (134.55.50.202) 3: kanscr5-ip-a-denvcr5.es.net (134.55.49.58) 4: chiccr5-ip-a-kanscr5.es.net (134.55.43.81) 5: washcr5-ip-a-chiccr5.es.net (134.55.36.46) 6: wash-pt1.es.net (198.124.238.34) Resume: pmtu 9000 hops 6 back 59 0.489ms 0.463ms 21.426ms 31.957ms 42.947ms 60.092ms 59.753ms reached SENDER END [[email protected] ~]$ bwtraceroute -T tracepath -4 -c sacr-pt1.es.net bwtraceroute: Using tool: tracepath bwtraceroute: 36 seconds until test results available SENDER START 1?: [LOCALHOST] pmtu 9000 1: wash-te-perf-if1.es.net (198.124.238.33) 1: wash-te-perf-if1.es.net (198.124.238.33) 2: chiccr5-ip-a-washcr5.es.net (134.55.36.45) 3: kanscr5-ip-a-chiccr5.es.net (134.55.43.82)

4: denvcr5-ip-a-kanscr5.es.net (134.55.49.57) 5: sacrcr5-ip-a-denvcr5.es.net (134.55.50.201) 6: sacr-pt1.es.net (198.129.254.38) Resume: pmtu 9000 hops 6 back 59 1.115ms 0.616ms 17.646ms 28.573ms 39.164ms 60.077ms 59.780ms reached SENDER END 2016, http://www.perfsonar.net 1/29/20 34 BWCTL Invoking Other Tools (Ping) [[email protected] ~]$ bwping -T ping -4 -s sacr-pt1.es.net bwping: Using tool: ping bwping: 41 seconds until test results available SENDER START PING 198.124.238.34 (198.124.238.34) from 198.129.254.38 : 56(84) bytes of data. 64 bytes from 198.124.238.34: icmp_seq=1 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=2 ttl=59 time=59.6 ms

64 bytes from 198.124.238.34: icmp_seq=3 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=4 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=5 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=6 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=7 ttl=59 time=59.7 ms 64 bytes from 198.124.238.34: icmp_seq=8 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=9 ttl=59 time=59.6 ms 64 bytes from 198.124.238.34: icmp_seq=10 ttl=59 time=59.6 ms --- 198.124.238.34 ping statistics --10 packets transmitted, 10 received, 0% packet loss, time 9075ms rtt min/avg/max/mdev = 59.671/59.683/59.705/0.244 ms SENDER END 2016, http://www.perfsonar.net 1/29/20 35 BWCTL Invoking Other Tools (OWPing) [[email protected] ~]$ bwping -T owamp -4 -s sacr-pt1.es.net SENDER START Approximately 13.4 seconds until results available --- owping statistics from [198.129.254.38]:5283 to [198.124.238.34]:5121 --SID: c67cee22d85fc3b2bbe23f83da5947b2 first: 2015-01-13T08:17:58.534 last: 2015-01-13T08:18:17.581 10 sent, 0 lost (0.000%), 0 duplicates

one-way delay min/median/max = 29.9/29.9/29.9 ms, (err=0.191 ms) one-way jitter = 0.1 ms (P95-P50) Hops = 5 (consistently) no reordering SENDER END [[email protected] ~]$ bwping -T owamp -4 -c sacr-pt1.es.net bwping: Using tool: owamp bwping: 41 seconds until test results available SENDER START Approximately 13.4 seconds until results available --- owping statistics from [198.124.238.34]:5124 to [198.129.254.38]:5287 --SID: c681fe26d85fc3f24790a7572840013f first: 2015-01-13T08:19:00.975 last: 2015-01-13T08:19:10.582 10 sent, 0 lost (0.000%), 0 duplicates one-way delay min/median/max = 29.8/29.9/29.9 ms, (err=0.191 ms) one-way jitter = 0 ms (P95-P50) Hops = 5 (consistently) no reordering SENDER END 2016, http://www.perfsonar.net 1/29/20 36

Common Pitfalls it should be higher! There have been some expectation management problems with the tools that we have seen Some feel that if they have 10G, they will get all of it Some may not understand the makeup of the test Some may not know what they should be getting Lets start with an ESnet to ESnet test, between very well tuned and recent pieces of hardware 5Gbps is awesome for: A 20 second test 60ms Latency Homogenous servers Using fasterdata tunings On a shared infrastructure 2016, http://www.perfsonar.net 1/29/20 37 Common Pitfalls it should be higher! Another example, ESnet (Sacremento CA) to Utah, ~20ms of latency Is it 5Gbps? No, but still outstanding given the environment: 20 second test Heterogeneous hosts Possibly different configurations (e.g. similar tunings of the OS, but not exact in terms of things like BIOS, NIC, etc.) Different congestion levels on the ends

2016, http://www.perfsonar.net 1/29/20 38 Common Pitfalls it should be higher! Similar example, ESnet (Washington DC) to Utah, ~50ms of latency Is it 5Gbps? No. Should it be? No! Could it be higher? Sure, run a different diagnostic test. Longer latency still same length of test (20 sec) Heterogeneous hosts Possibly different configurations (e.g. similar tunings of the OS, but not exact in terms of things like BIOS, NIC, etc.) Different congestion levels on the ends Takeaway you will know bad performance when you see it. This is consistent and jives with the environment. 2016, http://www.perfsonar.net 1/29/20 39

Common Pitfalls it should be higher! Another Example the 1st half of the graph is perfectly normal Latency of 10-20ms (TCP needs time to ramp up) Machine placed in network core of one of the networks congestion is a fact of life Single stream TCP for 20 seconds The 2nd half is not (e.g. packet loss caused a precipitous drop) You will know it, when you see it. 2016, http://www.perfsonar.net 1/29/20 40 Common Pitfalls the tool is unpredictable Sometimes this happens: Is it a problem? Yes and no. Cause: this is called overdriving and is common. A 10G host and a 1G host are testing to each other 1G to 10G is smooth and expected (~900Mbps, Blue) 10G to 1G is choppy (variable between 900Mbps and 700Mbps, Green) 2016, http://www.perfsonar.net 1/29/20

41 Common Pitfalls the tool is unpredictable A NIC doesnt stream packets out at some average rate - its a binary operation: Send (e.g. @ max rate) vs. not send (e.g. nothing) 10GE 10G of traffic needs buffering to support it along the path. A 10G switch/router can handle it. So could another 10G host (if both are tuned of course) A 1G NIC is designed to hold bursts of 1G. Sure, they can be tuned to expect more, but may not have enough physical memory DTN traffic with wire-speed bursts 10GE Background traffic or competing bursts Ditto for switches in the path At some point things downstep to a slower speed, that drops packets on the ground, and TCP reacts like it were any other loss

event. 2016, http://www.perfsonar.net 10GE 1/29/20 42 Common Pitfalls Summary When in doubt test again! Diagnostic tests are informative and they should provide more insight into the regular stuff (still do regular testing, of course) Be prepared to divide up a path as need be A poor carpenter blames his tools The tools are only as good as the people using them, do it methodically Trust the results remember that they are giving you a number based on the entire environment If the site isnt using perfSONAR step 1 is to get them to do so http://www.perfsonar.net Get some help [email protected] 2016, http://www.perfsonar.net 1/29/20

43 Use of Measurement Tools Event Presenter, Organization, Email Date This document is a result of work by the perfSONAR Project (http://www.perfsonar.net) and is licensed under CC BY-SA 4.0 ( https://creativecommons.org/licenses/by-sa/4.0/).

Recently Viewed Presentations

  • Chapter 6 Phase Diagrams and Phase Transformations

    Chapter 6 Phase Diagrams and Phase Transformations

    Binary Phase Diagrams ที่ภาวะสมดุล ธาตุบริสุทธิ์สองธาตุ จะสามารถละลายเข้าด้วยกันในอัตราส่วนต่างๆได้ไม่จำกัด ส่วนมากแล้วการผสมธาตุมัก ...
  • Effective Business Communication - University of Oklahoma

    Effective Business Communication - University of Oklahoma

    Use graphics effectively. Graphics stick in people's mind, but too much is clutter. Use motion sparingly and sound never. Rehearse. Try to use the same equipment and room you will use. It's not what you said but what they heard....
  • HIV/AIDS Prevalence and Mortality Report, 2018

    HIV/AIDS Prevalence and Mortality Report, 2018

    Introduction (I) These three introduction slides provide a general context for the data used to create this slide set. If you have questions about any of the slides, please refer to the HIV/AIDS Surveillance Technical Notes on our website.. This...
  • DHL PROVIEW proview.dhl.com DHL ProView ,                                         ! DHL

    DHL PROVIEW proview.dhl.com DHL ProView , ! DHL

    Электронные решения. DHL . ProView - этоонлайнсервис отслеживания Ваших грузов, с возможностью отображения статусаи получения уведомлений на различных стадиях доставки!
  • Chile Earthquake and Tsunami

    Chile Earthquake and Tsunami

    Chile Earthquake and Tsunami February 27, 2010 * * Introduction: How an earthquake happens? In geologic terms, Plate is one of the very large pieces of rock that form the earth's surface and move slowly. Subduction zone is the place...
  • ACCOUNTING 865 - STUDENT FUNDS Student Funds -

    ACCOUNTING 865 - STUDENT FUNDS Student Funds -

    AESOP / VERITIME. Please check People Locator in VeriTime each morning and afternoon. Make sure all issues have been addressed before leaving for the day. Please Reconcile your campus absences each afternoon to make sure that the absences didn't change...
  • 1 2 Barbados Blackbelly Information:  Developed on Island

    1 2 Barbados Blackbelly Information: Developed on Island

    is a cross between a Hampshire ram and a Cotswold ewe. Characteristics: have thick, woolly ears which are moderate in size and fitting patterns emphasize an exaggerated top knot. wool on the face and leg has become more common within...
  • Marsupial herbivore evolution and the failure of morphological

    Marsupial herbivore evolution and the failure of morphological

    Scaled the molecular-dated marsupial tree to the treelength of the morphol352 ML tree Simulated 60,000 character "pseudomorphological" dataset, Sim352 in Seq-gen (JC, equivalent to Mk4). 1000 boots, 352 chs Vombatidae Phascolarctidae Pseudocheiridae Burramyidae Phalangeridae Macropodidae Acrobatidae Tarsipedidae Petauridae ...