Thank you for doing all this testing! We have tried very hard to make independent, auditable tools available to the general public to do exactly what you are doing against more devices and qos/aqm/fq/shaping systems. We had certainly hoped more vendors would try them before shipping product that featured "Now with traffic shaping" prominently on the box.
Since you have switched to a pc, which looks to be running linux (?), you can get graphable results from the https://github.com/tohojo/netperf-wrapper tool. You do (sometimes) need a custom compiled netperf, and you have to install python-matplotlib and python-qt4 - but it lets you collect tons of data and compare them later, graphically (or via text as you are now). netperf-wrapper can also be made to work on OSX using macports.
As for the differences you get on the home router driving the test, and a PC:
Note - netperf.bufferbloat.net is a round robin server for the servers in the eu, usa east coast, and usa west coast. We have unpublished servers all over the world, which we can't afford the bills on opening up to all - but if you want access, let me know offline. But if you want to ensure you are always testing the same server, use netperf-eu or netperf-east or netperf-west for your testing.
A) Anything with a weaker CPU will have more trouble driving a test than a PC. Also, using the westwood tcp variant was quite common on home routers until recently - it is less aggressive than cubic, and you can figure out which tcp is in use with cat /proc/sys/net/ipv4/tcp_congestion_control
B) The bufferbloat effort has also resulted in tremendous improvements in Linux cubic TCP, with improvements landing in every release since 3.3. What is in 3.19 is pretty amazing compared to 3.3. Google, redhat, and the entire netdev team for linux have done a wonderful job overall. I am so glad Van Jacobson finally got his hands on some big data and he (and so many others) have helped create more full understanding of TCP - and why it needed optimization - across the industry.
C) The range of results you are getting are about what we get with fq_codel on other benchmarks at your rates, so you are golden. I note that the increase in RTT delay is mostly due to the congested upstream path. Were you running at a higher rate overall (say 20mbit/20mbit) you would see a much smaller increase in delay on the measurement flows, a nearly unnoticeable one at higher rates - only the fat flows would have any inherent delay to them (fq_codel aims for 5ms, and usually bounces around 20ms) - and that can only be measured (at the moment) via analyzing separate packet captures. - and doesn't matter for most traffic.
Now that your test rig is stable, I'd love to see you retry streamboost. (and try netperf-wrapper!)
D) It is still unclear if you need to optimize for DSL framing on your link, and/or whether you are achieving an optimum. You can try increasing the up (or the down) settings until you start seeing latencies climb (and it usually is quite abrupt, going from 15mbit to 16mbit might quintuple the latencies)
E) One further bit of fun you can have is enabling ECN on your device(s) - netperf-west has ecn enabled - and you can have full throughput with zero packet loss, if you so choose. sqm-scripts half enables ECN by default (can be both ways if you wish) it is a single sysctl on linux,mac,and windows, documented here:
http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf