The cutting edge of real-time AI

Table Of Contents

Machine learning is traditionally processor-intensive; ML algorithms require large numbers of parallel operations. As a result, the models usually run in data centres at the core of the network. However, this has a direct impact in terms of latency, security and cost. In this blog, we explore why running ML models at the edge of the network is a natural evolution in AI.

Introduction to Machine Learning

Machine learning lies at the heart of most AI applications, and involves teaching a computer to identify patterns in data. More specifically, the goal is to create a trained model. This can be done with supervised learning, where the computer is shown examples to learn from. Alternatively, the process can be unsupervised — the computer simply looks for interesting patterns in the data. Techniques involving continuous or ongoing learning, where the computer learns from its mistakes, also exist but are outside the scope of this article.

Running your ML model.

Once the ML model has been created, it can be used to deliver your project. Models can be used for forecasting future events, identifying anomalies, and image or speech recognition. In nearly all cases, models rely on large deep tree structures and need significant computing power to run. This is especially true for models engaged in image and voice recognition, which are usually based on artificial neural networks. Neural networks create dense meshes and hence need to be run on highly parallelised hardware, often based on GPUs. Until recently, such power has only been available in the cloud on AWS, Google Cloud, or Azure.

To get some idea of the power required, look at the specifications of AWS P3 instances, optimised for ML.

TypeGPUsvCPUsRAMGPU RAMStorage BWNetwork BWP3.2XL1861GB16GB1.5Gbps~10GbpsP3.8XL432244GB64GB7Gbps10GbpsP3.16XL864488GB128GB14Gbps25GbpsP3.24XL896768GB256GB19Gbps100Gbps

As you can see, these machines are seriously powerful; they have huge amounts of RAM, with extremely fast network and storage access. Above all, they have significant CPU and GPU processing power, which makes running ML models at the network edge a real challenge.

The drawbacks of centralised AI

To date, most well-known applications of AI have relied on the cloud because it is so hard to run ML models at the edge. However, this dependence on cloud computing imposes some limitations on using AI. Here is a list of some of the operational drawbacks to centralised AI.

Some applications can’t run in the cloud

To operate AI in the cloud, you need to have a decent and stable network connection. As a result, there are some AI applications that have to run locally due to a lack of network connectivity. In other words, these applications only work if you are able to run your ML models at the edge.

An obvious example here is self-driving vehicles; these need to do a number of tasks that rely on machine learning. The most important of these tasks is object detection and avoidance. This requires quite demanding ML models that need a reasonable degree of computing power. However, even networked cars have only low bandwidth connections, and these connections are inconsistent.

This limitation also applies when creating smart IoT monitoring systems for mining and other heavy industries. There is often a fast network locally, but Internet connectivity may be reliant on a satellite uplink.

Latency matters

Many ML applications need to work in real-time. Self-driving cars, as mentioned above, are such an application, but there are also applications such as real-time facial recognition. This can be used for door entry systems or for security purposes; for example, police forces often use this technology to monitor football crowds in an attempt to identify known trouble-makers.

AI is also increasingly being used to create smart medical devices. Some of these need to work in real-time to deliver real benefits, but the average round trip time to connect to a data centre is in the order of 10–100ms. Real-time applications are therefore hard to achieve without moving your ML models nearer to the network edge.

Security may be an issue

A lot of ML applications deal with secure or sensitive data. It is clearly possible to send this data across the network, and store it in the cloud, securely. However, local policies often forbid that. The obvious example here is medical AI applications.

Health data is especially sensitive, and there are strict rules in many countries about sending it to a cloud server. Overall, it is always easier to ensure the security of a device that is only connected to a local network.


Running ML in the cloud does not come cheap. For a start, ML-optimised cloud instances are pretty expensive — the lowest spec instance in the table given above costs about US$3 an hour. Then all the extra costs cloud providers charge need to be considered, such as fees for storage can and network access. Realistically, running an AI application could easily end up costing US$3,000 a month.

The impact of moving AI to the edge

As we have seen, there are some strong arguments for moving ML models to the edge of the network, but which AI applications really benefit from doing so? The list of drawbacks above gives some strong hints. Check whether any of the following apply to the project:

  • No access to a fast, stable network connection
  • Product operates in a restricted environment
  • The project requires delivery of real-time AI
  • Limited budget availability

Given these factors, what specific AI projects might be made easier by running the ML models at the edge?

Virtual assistants

As it has so often, Apple set a trend with the launch of Siri in 2010. This paved the way for many other virtual assistants, most famously Amazon’s Alexa and the Google Assistant. Virtual assistants make Sci-Fi style voice control into a reality, and work as follows:

  1. Start by saying a wake word or launching the assistant. For free-standing devices like Amazon Echo, the device is constantly listening for the wake word and processes this locally, using simple speech pattern matching. This is why Alexa only recognises certain wake words (Alexa, Amazon, Echo and Computer);
  2. The device now connects to a cloud-based server and sends the recording of what it has heard;
  3. The cloud server runs a voice-to-text ML model to convert the recorded speech into a block of natural language text;
  4. The text is parsed using natural language processing to extract the meaning;
  5. The server works out what was asked for and sends the appropriate commands or content back to the device.

It is easy to see how this experience could be enhanced by moving the ML models to the edge: the voice assistant would be more responsive, wouldn’t need an Internet connection, and voice control could be embedded.

Facial recognition

Facial recognition is one of the fastest-growing applications of AI. The technology is still evolving, and there have been a few issues along the way. For instance, two years ago, Amazon’s Rekognition was mired in controversy and accusations of racism. The system incorrectly identified 28 ethnic minority US congress members as known criminals after being trained on a set of 25,000 mugshots.

In 2019, an early trial of facial recognition technology by the Metropolitan Police, the largest police force in the UK, showed the technology to be inaccurate 81% of the time. However, the latest facial recognition systems are becoming far more accurate. Earlier this year, the Met announced it was adopting the technology to scan for known troublemakers at large events.

Many use cases calling for facial recognition need the technology to work in near real-time. As a result, applications rely on moving ML models to the edge of the network. The system adopted by the Met is based on NEC NeoFace Watch, which is completely stand-alone and works in real-time. NEC targets its technology at a number of other markets, including retail, corporate events, festivals and other mega-events, and transportation.

Real-time monitoring

Heavy industry and mining rely on extremely large and expensive machinery. Companies can potentially lose millions if this machinery suffers an unplanned breakdown. For instance, many mining operations are reliant on huge high-power pumps that keep the workings free from water and pump the mined slurry to the processing plant. The whole operation comes to a halt if one of these pumps suffers a catastrophic failure. As a result, mining companies invest significant resources into AI systems designed to predict potential failures before they happen.

Currently, these systems are often based on transmitting data from IoT sensors attached to the equipment. This data is then processed at a central location and any warning necessary is sent back to the appropriate operator. However, mines and construction sites can be tens of kilometres across, often in hostile terrain, so being able to integrate the ML model directly into the edge device would simplify the whole process.

What is needed to run AI and ML models at the edge?

Moving AI to the network edge requires three things: suitable hardware, new tools and a new paradigm for creating ML models. Let’s look at each of these requirements.

Optimised hardware

As already discussed, ML models often rely on large numbers of parallel operations. Bluntly, they need raw computing power. However, there is always a trade-off between computing power and the actual power drawn by the device. For ML models to move to the edge, devices that draw as little power as possible are required. This is even more true when the device needs to be embedded. Fortunately, there is now a wide range of high-performance, low-power MCUs available from suppliers like Mouser. We will look at these in more detail later in this series of blog posts.

Suitable tools

The next thing needed is a suitable toolchain for running ML models on microcontrollers. The overwhelming majority of ML frameworks are designed to run on 64bit Intel family CPUs or on GPUs. By contrast, all the suitable microcontrollers have a 32bit reduced instruction-set architecture, like the ARM Cortex series MCUs. Don’t despair though; there are libraries, such as TensorFlow Lite that allow ML models to be run on MCUs.

Model once, run anywhere

The final piece of the puzzle is a different paradigm for creating and running ML models. This can be summed up with the phrase “Model once, run anywhere.” Essentially, this means exactly what it says on the tin: create your model, typically using a high-power ML — optimised machine, then use your toolchain to convert it into code that can run on any microcontroller. Unfortunately, this does of course eliminate the ability to benefit from continual learning or reinforcement learning.

The trade-off

The following table captures some of the trade-offs made when ML models are run at the edge. Hopefully it offers some pointers that will help in deciding whether or not to move your next AI project to the edge.

FeatureIn the data centreAt the edgeReal-timeNoYesContinual learningYesNoEmbeddableNoYesNetwork needed?YesNoReinforcement learningYesNoFull range of models?YesNo


This blog promised to show you why moving ML models to the edge is so desirable. Hopefully you have been convinced that doing so enables new use cases for AI, which in turn promises to bring about an embeddable AI revolution. The next article will look in more detail at the tools that make Machine Learning on MCU’s possible. Meanwhile, to learn more about AI-capable MCUs, check out the Mouser TensorFlow Lite microsite.

You can read the orginal version of this article at, where you will find more related contents.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store