Jiaming Xu, an associate professor at Duke’s Fuqua School of Business says federated learning is a new approach that holds the promise of transforming Artificial Intelligence (AI) systems training.
Xu says to think about federated learning, consider the octopus. It has nine brains. A doughnut-shaped central brain occupies its head, but the base of each tentacle has a mini-brain. Together they comprise two-thirds of an octopus’s thinking capacity. The brainpower of this eight-armed cephalopod is distributed. Each arm makes independent decisions and trades information with the main brain. The big brain doesn’t always know what the tentacles are doing, but the back-and-forth trains each limb to perform more efficiently.
“Federated learning is a very new concept, and the theories that undergird it are definitely going to be very important down the road,” Xu said.
Xu is the recipient of a National Science Foundation career award on Federated Learning: Statistical Optimality and Provable Security.
Traditional computer learning (or machine learning, as it is also called) requires that all data be centralized in one datacenter, according to Xu. Instead federated learning, also known as collaborative learning, trains the algorithm of a central model using input received from decentralized sources. Under federated learning, edge devices play key roles. These are any data-collecting instruments such as smartphones, climate sensors, semi-autonomous cars, satellites, bank fraud detection systems, and medical wearables, and all of these could remotely share their data to train the central learning model in a cycle that repeats as needed.
Xu says what makes federated learning so compelling to scientists, doctors, and companies is that the data itself never actually leaves the edge devices. This is hugely attractive to an array of industries, especially health care, because of privacy laws like HIPAA, the Health Insurance Portability and Accountability Act, and the threat of hacking.
Researchers like Xu also point out that federated learning has other benefits—it requires less communication and power, happens when devices are idle, and can be put to use immediately. He says it’s only now in the earliest stages of finding real-world applications because cellphones and similar devices have immensely more computing power than in the past.
Xu says nonetheless, federated learning will not be a perfect solution. When the server and the edge devices communicate, the potential for hacking still exists. Xu says the possibility remains that an eavesdropper could infer the private data based on the parameters that are sent.
To help find privacy solutions, Xu has developed querying strategies and analytical techniques that can be part of theft-proof frameworks for federated learning. He shares his findings in two papers Learner-Private Convex Optimization which was published in IEEE Transactions on Information Theory and Optimal Query Complexity for Private Sequential Learning Against Eavesdropping which was presented at the 24th International Conference on Artificial Intelligence and Statistics. Both are coauthored with Kuang Xu, an associate professor of operations, information, and technology at the Stanford Graduate School of Business, and Dana Yang, an assistant professor of statistics and data science at Cornell.
“A great deal of research is being done on federated learning now,” Xu said. “Businesses are investigating it, but many obstacles are going to have to be removed before these systems work in practice.”
Thwarting Nefarious Eavesdroppers
Xu says when Google coined the term federated learning, it wasn’t an entirely new concept. To speed AI training, companies had already begun divvying up computation loads across computer servers.
Federated learning takes that to another level, according to Xu. Here’s how it works: At the outset, a local copy of an application on a central server exists on all edge devices. Over time, each device has experiences, trains itself, and becomes smarter. At a designated moment, upon being queried by the central server, the devices transfer the results of their training—not the raw data itself—to the server. It averages and aggregates the results and updates itself. Users then download the newer, smarter version that was created with their own data, and the cycle can repeat when required. In short, federated training brings the learning to the remote devices and keeps sensitive information such as emails, photos, and financial and health data safe in the locations where they were gathered.
In his paper on Optimal Query Complexity, Xu and his coauthors contemplate the possibility of nefarious eavesdropping.
“Because a learner [the central computer] has to communicate frequently with data owners [the edge devices] in order to perform analysis, their queries can be subject to eavesdropping by a third-party adversary,” the authors write. “That adversary, in turn, could use the observed queries to reconstruct the learned model, thus allowing them to free-ride at the learner’s expense, or worse, leverage such information for future sabotages.”
The issue for Xu and his coauthors became how to stop a third-party from seeing the responses of an edge device.
“We developed a strategy that allows you to query a number as fast as possible but at the same time does so without revealing information to an adversary, Xu said. “This stops it from locating with certain accuracy the true value of the information in the response.”
Xu and his collaborators imagined a private sequential learning problem—in layman’s terms, a guessing game—that uses a binary search model. Party A asks Party B to guess a number between 0.0 and 1.0. Party B replies, “Is the number bigger than 0.3?” Party A says “Yes, the number is between 0.3 and 1.0.” Party B then tries to narrow down the correct answer by asking, “Is the number between 0.3 and 0.4?” But at the same time, in Xu’s proposed solution Party B also asks a blizzard of other questions, such as “Is the number between 0.4 and 0.5?” “Is the number between 0.6 and 0.7?” and so forth. As a result, the eavesdropper would be unable to tell which query is leading the questioner to the correct answer.
To understand how it works, Xu suggests the analogy of an oil company that has drilled many wells, struck oil with only one, and wants to prevent other companies from knowing it has done so.
“To confuse your competitors, they would see you sinking many wells but be unable to tell which one succeeded,” Xu said.
This learning problem game has an added wrinkle. Besides creating smokescreens, Xu and his coauthors had another equally important goal—they wanted the training to require as few queries as possible.
“In federated learning, communications bandwidth is a scarce resource. Thus, efficient use of the queries is of fundamental importance,” Xu and his coauthors write. “Studying the trade-offs between accuracy, privacy, and query complexity under the binary search model can provide valuable insights on the algorithm design in federated learning.”
One driving insight is that the portion of the querying process that demands the most privacy protection comes after the learner has received a reasonably accurate guess. In this way, the optimal querying process is divided into two phases. First, in the pure-learning phase, the primary objective is to narrow the search to a smaller interval that contains the true number. At this point, privacy is not a top priority. Then, in the private-refinement phase, the learner narrows down the guess within the interval and allocates significantly more querying towards obfuscation.
How to “optimally obfuscate” a learner’s queries also forms the subject of Learner-Private Convex Optimization. This paper seeks a solution using a real-world problem-solving technique called convex optimization. This mathematical methodology determines how to make an optimal choice in the face of conflicting requirements and is a framework often used in federated learning.
This is similar to the guessing game example, but now the key is to construct many intervals that are sufficiently apart but equally likely to contain the optimal choice from the eavesdropper’s standpoint. Only one of these intervals contains the true optimal choice, while in every other interval the learner randomly generates a fake proxy. In this way, the eavesdropper cannot distinguish the true optimal choice from many fake proxies.
Xu and his coauthors explain how private convex optimization would benefit companies by using the example of autonomous driving. “The goal is to protect the privacy of a flagship manufacturer (learner) from model stealing attacks of competing companies (eavesdropping adversary). The risk-averse nature of autonomous driving algorithms forces the adversary to ensure that the stolen model performs reliably under all circumstances....Without a worst-case guarantee, the adversary cannot act upon the stolen model...[and the] strategy renders the adversary powerless,” they write. As in the Optimal Query Optimization paper, the result is the same—private data is protected—but the mathematical strategy that achieves the goal is different.
The Cross-Device Dilemma
There are two types of federated learning—cross-device and cross-silo. Cross-device learning is most likely to happen with consumer devices and potentially involves millions of users. Cross-silo would typically have far fewer participants, each with massive data, such as financial institutions or pharmaceutical companies.
A common model may work well in cross-silo settings, but it will be harder to implement when it involves millions of smartphones, each owned by users with different habits. An example of this is Gboard. This Android keyboard uses federated learning to predict the next word to be typed in searches and when writing messages. The phone learns new phrases and words, stores the information and its context, and makes it available for federated training.
But there’s a problem, according to Xu, that relates to personalization.
“Your habit of typing a particular word may be different from the way I do it. If you just train a common model for everyone, it probably won’t work for everyone,” Xu said. “You want to make the model so that it is predictable for all individual users.”
How to divide users into appropriate training groups is the subject of Global Convergence of Federated Learning for Mixed Regression, a paper submitted at the 36th Conference on Neural Information Processing Systems. Xu coauthored it with Lili Su, an assistant professor of electrical and computer engineering at Northeastern University, and Pengkun Yang, an assistant professor in the Center for Statistical Science at Tsinghua University.
To solve the problem, they turn to a concept called clustering. It presumes that not every client is the same (say, some cars always drive in snow and others always drive in rain) and can be divided into a defined number of groups based on such characteristics (cars that drive in snow or rain). The server doesn’t know which group a particular car should go in, yet it must train various models in the face of that uncertainty.
“It’s a chick-or-the-egg problem,” says Xu. “If you knew the true group partition—which client belonged in which group—then you could train separate models for each group. The challenge here is that at the beginning you don’t know the true nature of each individual and group.”
To get out of this predicament, Xu and his coauthors designed a new algorithmic approach that allows the server to estimate which group an individual should go in and then train a federated learning model for that group accordingly.
Xu believes companies need to be paying attention to federated learning implications now.
“If your company isn’t investing in privacy-preserving technologies, then customers may go to competitors who are,” Xu said. “Over the next 10 years it’s going to be harder and harder for you to use you own internal privacy technology, because you may no longer be able to collect data to use in your own internal machine learning. You could lose a lot of business opportunities because of this.”