I miss grad school. I was only a grad student for about six months before I realized that it was impeding my focus (I have a tendency to spread myself a bit too thin) and yet I miss it tremendously.
The grad school experience varies widely by discipline and by advisor (and by student, I assume–grad school generally only works for people who are self-motivated). I was incredibly fortunate, however–my experience was one of limitless experimentation and learning (thanks mostly to my advisor, who gave me the ideal balance of guidance and autonomy). I was basically free to explore any idea I wanted, with the assumption that it would eventually lead back to my area of research in antennas, which was (in broad terms) applying math and computer science advancements from image processing to the field of antenna measurement. (Of course, I dropped out before I even came close to publishing said research, so I managed to avoid most of the hard parts of grad school; I know that a lot of my friends wouldn’t describe the process in such glowing terms.)
Anyway, it recently occurred to me that the part of grad school I really miss (besides my advisor, and generally being around so many smart people all day) doesn’t require being in school. There’s nothing stopping me from studying problems, reviewing other people’s research, and writing about what I find. So I started thinking about research-level problems. One area that interests me is the field of mobile ad-hoc networks (MANETs), ie. computer networks composed of mobile nodes. MANETs are interesting because the topology of the network is far more dynamic than in traditional computer networks (ie. it’s harder for nodes in the network to manage routes to the other nodes). An easy way to get around that problem is simply using a managed network; for example smart phones form a mobile network but they’re not ad-hoc because they rely on the telecom carriers’ networks. However, there are cases where managed networks are not an option (eg. underground mines or caves–see below) or not convenient (consider densely populated urban areas that already experience heavy radio interference).
In those cases, we use MANETs. There are a variety of MANET protocols that exist, but the problem I had in mind wouldn’t properly suit any of them. Specifically, I was considering a fleet (or swarm, if-you-will :) ) of quad-copters exploring an underground mine. It must be said that I know absolutely nothing about mining, but it seems to me that there must be situations every now and then where a mine needs to be evacuated for safety reasons and we would want to make sure everyone got out ok–the perfect job for robo-swarm! (If I’m going to pick my own problems to solve, why not pick interesting scenarios, too?) In that situation, radio contact with above-ground folks would be next to non-existent. Our little drones would need to fly down the mine shaft and somehow collectively explore the space. We would want to maximize their utility down there, so they should try not to duplicate each others’ efforts. We also need to assume that some of the robots won’t make it back upstairs, so the information they collect should be shared with the rest of the group as much as possible (that way, we could program them to periodically send one bot up to the surface for progress reports, too).
In other words, our swarm of mobile nodes want to collect data cooperatively and in a fault-tolerant fashion. “Collecting data cooperatively” sounds like a distributed database problem, so… we’re building some sort of distributed database that can function in a MANET. (Bear with me–there’s a fun little animation at the end!) We want each node to do other stuff while all this happens in the background, so the implementation should be really light-weight. The following is what I came up with.
Meat and Potatoes, ie. “The Soapbox Protocol”
Note: I know the following protocol “specification” is a bit short on details, but I think it gets the idea across.
Gossip protocols are a class of data transmission protocols which mimic the transmission of gossip in social networks. They transmit information by arranging exchanges between randomly (or pseudo-randomly) selected pairs. The soapbox protocol is a sort of modified gossip protocol in which agents communicate via multicast, without ever receiving direct acknowledgment for a message from their peers–it mimics a fictional social network in which every participant has their own soapbox, from which they broadcast their own ideas while also re-gurgitating any ideas they hear. It intends to enable the synchronization of data between many agents over high-latency, lossy and/or otherwise unpredictable communication channels.
The protocol can be thought of as coordinated data collection. Agents individually collect or produce data, broadcasting their results periodically. When other agents are within range, they will record the data broadcast by their peers in a local repository. In order to prevent network congestion, some very basic rules are required:
- An agent must wait a minimum n s (pre-determined by the network administrator as a function of bandwidth and expected average packet size) after receiving a transmission before broadcasting its own, as well as an additional “jitter time”, r s. The wait time n s allows for the receipt of other messages from different agents that may have accidentally broadcast at the same time, and the jitter time r s makes it less likely for multiple nodes within range of each other to broadcast at the same instant. (The physical layer is responsible for keeping distinct transmissions separate; we’re just trying to reduce the likelihood of congestion.)
- Certain lower priority broadcasts require a wait time of 2n s.
Broadcasting and recording primary-source data between agents is nice, but by itself it doesn’t achieve the goal of synchronizing data over a lossy network, with agents moving in and out of range of each other. In order to do that, the agents will need to re-broadcast the data in their repository in an intelligent way:
- If it’s been at least 2n+r s since the last broadcast was sent or received, the agent must send an update broadcast to peers within range. Update broadcasts contain all new data discovered by an agent since its last update broadcast (including fresh data received from peers), as well as the “heads” for every other agent represented in the repository. “Heads” are a list of agent IDs in the repository and the number of data items listed for each.
- If it’s been at least n+r s since the last broadcast was sent or received, and there are items in the patch (see #3), the agent must broadcast it, and then clear the patch.
- Upon receiving a broadcast, if there are fresh data items, the agent should insert them into the repository. If any of the heads in the broadcast are behind those in the repository, the list of items required to bring the broadcasting peer up to date must be added to the patch.
Implementation, ie. shiny things to look at
I built a little simulation (click) of nodes implementing the soapbox protocol. The specifics of how the data is actually being synced could vary depending on a few factors; for example, it would be possible to support a very large node count using bloom filters instead of “heads” (I have a PoC of that if anyone is interested) and I have considered more collaborative operations between the nodes on a common (distributed) key-value store using vector clocks which I’d be happy to expand upon if anyone cares :)