[Previous entry: "Fahrenheit 9/11"] [Next entry: "Even Microsoft does Mesh"]
07/07/2004: "Do Mesh Networks Scale? Another (biased) View"
Glenn reports that Francis daCosta of Mesh Dynamics says that mesh networks won't scale, at least, not without using multiple radios (on multiple channels).
Francis makes several statements, lets analyze:
1- Radio is a shared medium and forces everyone to stay silent while one person holds the stage. Wired networks, on the other hand, can and do hold multiple simultaneous conversations.
Nothing about radio forces everyone (presumably all stations) to stay silent while one "person" (station?) holds the stage. This is a property of several MAC protocols used in radio-based networks, including CSMA/CA (used in 802.11), but is not a fundamental property of radio networks.
(Further, "interference" is a receiver artifact, but we won't go into that now.)
These aren't even new results. Gupta, Gray and Kumar published an empirical paper several years ago named An experimental scaling law for ad hoc networks that showed this in a real world experiment. Their results were that 802.11 ad-hoc networks scale at c/n^1.68, where 'c' is capacity and 'n' is the number of nodes in the network. The problem is indemic to the 802.11 MAC protocol, which is not adaptive, and requires that all stations be able to hear each other.
2- In a single radio ad hoc mesh network, the best you can do is (1/2)^^n at each hop. So in a multi hop mesh network, the Max available bandwidth available to you degrades at the rate of 1/2, 1/4, 1/8. By the time you are 4 hops away the max you can get is 1/16 of the total available bandwidth.
Agreed (for 802.11 in ad-hoc), but this was shown in the Gupta, Gray and Kumar paper cited above. Old news.
But this is not true once you are willing to throw out the 802.11 MAC protocol.
Tim Shepard's 1995 thesis "Decentralized Channel Management in Scalable Multihop Spread-Spectrum Packet Radio Networks" (and the more concise paper based on it) - which demonstrates that one can build a practical network whose capacity increases the more stations you add.
The rate of increase is square root of N, for N stations, and this contrasts sharply with daCosta's viewpoint. Of course, as I've already stated, daCosta is speaking about 802.11 without naming it.
The key ideas of Dr. Shepard's thesis are to build a network of cooperative repeaters, use no more power than necessary, and to schedule the transmissions from each node. He even proposes a novel method of scheduling the stations in a completely distributed fashion.
Many have noted that I am a fan of the chipsets from Atheros, but few have ever queried 'Why'. (Nigel Ballard did once, at a PTP meeting where I spoke the day after leaving Vivato.)
The reason 'Why' is that the Atheros chipsets do not implement the 802.11 MAC protocol. It is straight-forward to implement your own MAC layer on top of the Atheros design, and the madwifi driver proves it. While there are other chipsets with a similar architecture today.
You connect the dots.
3- That does not sound too bad when you are putting together a wireless sensor network with limited bandwidth and latency considerations. It is DISASTROUS if you wish to provide the level of latency/throughput people are accustomed to with their wired networks. Consider the case of just 10 client stations at each node of a 4 hop mesh network. The clients at the last rung will receive -at best- 1/(16,0000) of the total bandwidth at the root.
Unfortunately, this point fails as well. Tim's thesis quite clearly shows that it can work in a 1,000 node network. There are still limits, especially in the area of latency (read the thesis!), but work proceeds on asyncronous radio relays
4- Why has this not been noticed as yet? Because first there are not a lot of mesh networks around and second, they have not been tested under high usage situations. Browsing and email don’t count. Try video - where both latency and bandwidth matter - or VOIP where the bandwidth is a measly 64Kbps but where latency matters. Even in a simple 4 hop ad hoc mesh network with 10 clients, VOIP phones wont work well beyond the first or second hop – the latency and jitter caused by CSMA/CA contention windows (how wireless systems avoid collisions) will be unbearable.
Fortunately these effects are very easy to model in a simulation environment. Most of the IETF work on mobile ah-hoc routing shows results based on the simulation environments "ns", "glomosim" and the commercial "opnet". Some of these packages contain (or have access to) quite accurate models of the wireless channel, and even the 802.11 MAC and PHYs
daCosta is going to propose multiple radios as a solution to the problems he shows. (See, I can predict the future too!)
Obligory disclosure: Francis and I have known each other for a few years now, and we may develop some hardware for him if things work out. Its early in the relationship, and given my personal/professional history over the last 5 years, I'm far less willing to enter into partnerships than I was. There are problems with this "multi-channel" approach, but it might provide some benefit. Its also likely that my mention here will scorch the relationship when/if Francis ever reads thsi.
The real solution, however is to not use a MAC protocol that was designed for something very different.
Coverage on a few more points:
I think many of the ideas in this are wrong-headed.
It promulgates the "spectrum as property" lie, when the real solution is to treat spectrum as a commons.
It proposes a 500mW (27dBm) power limit for unlicensed operation, but this is only 3dBm lower than the highest current limits in all of the ISM and U-NII bands, and it can be quite challenging to find (or design) a Wi-Fi card that will generate more than 200mW (23dBm). Some solutions exist. I do agree with Sascha that the transmit power should be adjustable, (and therefore, I've enabled it on this unit.) I don't have the "full-auto" stuff working yet though. Natch.
It provokes conspiracy theories about the demise of CoMeta and its investor's motives. The simple truth is that it was a group of old-farts with a pile of money looking to cash in on the WiFi craze. So they went off tilting at windmills following the Don Quixote de la Mancha of this modern age, Dr. Brilliant.
There is an old addage known as Hanlon's Razor: Never attribute to malice that which is adequately explained by incompetence. Intel, ATT and IBM all had different agendas. AT&T thought it would get to sell a lot of leased lines and DSL. IBM Global Services was to provide all the logistics. It was a money grab by extremely dumb players, pure and simple. What makes it worse is that Larry Brilliant had failed before at exactly the same game when Aerzone went thud, followed by the rest of Softnet.
It views WCA and the "wireless industry" as the enemy, insisting that they have deliberately slowed innovation, and this, dear reader, is some serious tilting at windmils:
Look there, friend Sancho Panza, where 30 or more monstrous giants rise up, all of whom I mean to engage in battle and slay, and with whose spoils we shall begin to make our fortunes. For this is righteous warfare, and it is God's good service to sweep so evil a breed from off the face of the earth."
Sascha may not be familiar with The Inventors Dilemma. It is an excellent book written by Clayton Christensen (in 1997) about how successful businesses decline and fail. The basic hypothesis is simple: Successful businesses grow and eventually dominate their target market. When they fail, it is usually because a newer company with different technology eats their lunch.
But, according to Christensen, this happens because the newer technology is "disruptive". It doesn't compete head-to-head with the incumbent giant, but initially gains its success in an adjacent market before taking the incumbent head-on. The interesting part of this is that the incumbent always takes decisions, which are in themselves rational, sensible and defensible in respect of its primary market and this naturally leads to its demise. It can never get into the position of being able to compete directly with the "disruptive" technology.
Finally, I'm of the opinion that Sascha is chasing rabbits with his advocacy of A-HSLS. He may have read this presentation (warning, its PowerPoint, PDF is here), and then failed to notice that TBRPF and OLSR are only mentioned in passing.
TBRPF is the 'mesh' protocol used by many people who are attempting to market "mesh" equipment. It has an advantage in being an IETF standard, but as far as I know, no open or free implementations exist.
I think OLSR (which is also an IETF RFC standard) is a better protocol, if only because several implementations exist, all of them free (as in speech) software.
Of course, the router heads at Cisco have proposed manet extentions to OSPF. If you've followed this far, you may find this presentation by Fred Baker of Cisco to be good reading.
I do, however agree that ETX is a good choice for a routing metric for wireless networks. I just happen to know someone who is implementing ETX in OLSR.
Community wireless" or free networks will happen, and they will grow in a 'bottom up' fashion. They will not, however, be a purely "wireless cloud" covering the skyline. There is far too much utility in getting the data off the wireless network and onto a wired network.
OK, so to review:
1) 802.11 won't scale. Mesh based networks based on 802.11's MAC won't scale. I agree.
There are commodity "Wi-Fi" chipsets that allow one to completely bypass the 802.11 MAC, however.
2) Mesh can scale.
There are information-theoretic proofs that it will. These go far beyond both daCosta's and Meinrath's presented analysis.
3) Community wireless networks will happen, and there is nothing that the carriers or telcos can do that will stop it.
p.s. this article may be full of hints about what I'm working on at 2am. It may also be the case that I am full of crap.