Machine learning (ML) algorithms based on deep neural networks (DNNs) have now become extremely good at solving a variety of signal processing or optimization challenges such as image recognition, speech processing, complex game solvers, etc. However, powerful DNNs require significant computing power for training but also for the inference step, in which the trained system makes decisions based on new inputs. It is crucial to reduce the energy consumption of DNN implementations, so that powerful ML algorithms can be used in low-power embedded devices, without requiring costly (in terms of energy and latency) communication back to the cloud.
It turns out that vanilla DNNs are quite fault tolerant. For instance, in an implementation that relies on stochastic computing, accepting occasional timing violations in the circuit can further reduce energy consumption (see paper). Some recent work I did with Jean-Charles Vialatte also showed that convolutional neural networks are naturally robust to computation failures. However, their fault-tolerance degrades when we push the performance to its limit. An interesting objective for future research is therefore to identify ways to add some redundancy to the inference computations so as to make the robustness independent of the performance target.
This figures show the fault-tolerance efficiency of some CNN models, which gives a measure of the fraction of computations that are spent for “useful” computations versus computations that are needed only to provide robustness. For example, if the efficiency is 1, this means that the amount of computation needed is the same as for a reliable implementation, and if it is 0.8, it means that a reliable implementation would only need to perform 80% of the computations. Each curve corresponds to a constant performance target (in this case we are talking about classification error), and the parameter “p” is the probability that a neuron's output is replaced with a random value. This figure shows us two things: 1) for these “vanilla” CNNs, there is a threshold effect on the amount of faults that can be tolerated, i.e. efficiency is either close to 1 if the faultiness is below some threshold, and then quickly goes to 0 when we pass that threshold, and 2) the value of this threshold depends on the performance target that we set.